Linux and Name Resolution

8th October 2018 1 By Jonny

Nothing kubernetes related here, for a change. This is more for my own benefit and something to refer back to in the future. This post will detail how the Linux TCP stack performs name resolution, and is inspired by a recent customer case, which then meant I did a fair amount of background reading to help understand what the customer was experiencing.

My DNS server(s) have gone offline …

The initial case description wasn’t exactly clear – it seemed that all the configured DNS servers weren’t responding, and that this would (understandably) have a knock on effect on client applications. It would later appear that only the first listed nameserver was offline, although this was still having a negative effect on client applications.

Apologies for stating the blindingly obvious, but when a client application (and I’ll probably just use ‘ping’ as my example client application) makes a request for a particular server name, the Linux TCP stack will use the contents of the /etc/resolv.conf file to help provide this information. It should be the glibc resolver library that is responsible for performing this name resolution. As an example, I’ll issue the command ‘ping www.google.com’ to hopefully demonstrate how this process should work.

[jonny@olympus ~]$ ping www.google.com
PING www.google.com (216.58.193.68) 56(84) bytes of data.
64 bytes from sea15s07-in-f68.1e100.net (216.58.193.68): icmp_seq=1 ttl=50 time=142 ms

A simple ping provides a lot of useful information straight away. For a start, the server name www.google.com has been identified (resolved) to a server with an IP address of 216.58.193.68. The ping command has sent a 64 byte packet to this address to verify that it is responding, and we can see that the actual server name for this address is sea15s07-in-f68.1e100.net (Google have global points of presence and make use of CDNs). We can then see that the first ping has been responded to with a ‘time to live’ of 50 and a total round trip time of 142ms.

Taking this one step further, I’ll change the command to measure how long the command takes to complete (and only send one ping packet) as follows:

[jonny@olympus ~]$ time ping -c1 www.google.com
PING www.google.com (216.58.216.132) 56(84) bytes of data.
64 bytes from sea15s01-in-f132.1e100.net (216.58.216.132): icmp_seq=1 ttl=50 time=141 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 141.219/141.219/141.219/0.000 ms
real 0m0.587s
user 0m0.001s
sys 0m0.003s

The good news here is that there was 0% packet loss (Google will be pleased), and that the round trip time was 141.219ms (with only one packet being sent, the extended statistics are not very informative). The ‘time’ command also breaks down that the time taken for the command to complete spent 0.587s in real time to complete, of which 0.001s was in user mode, and 0.003s in system (kernel) mode. The actual timings here aren’t of particular interest to me at the moment though.

For completeness, it’s possible to run the ping command through strace to see the exact system calls that are being made when the command is run. That level of detail is left to the reader to perform though. Taking a look at my /etc/resolv.conf file has the following contents:

[jonny@olympus ~]$ cat /etc/resolv.conf 
Generated by NetworkManager
search ipa.champion
nameserver 192.168.11.121
nameserver 192.168.11.254

First off, this file has been generated by NetworkManager, so if I make a change directly to this file it will not be persistent as it is a managed file. The other lines in the file list the search domain to be used and the two DNS servers that I want my operating system to use.

I am not a number …

Unless you’re a networked computer, in which case a number is exactly what you are. Again, stating the blindingly obvious, but every networked computer has an IP address identifying it. It’s totally impractical for people to remember and know these numbers which is why we have the Domain Name Service to manage the names to IP addresses for us. In practice, for Linux, this means that the TCP stack will query the servers we have listed in /etc/resolv.conf to determine IP addresses for server names we want to connect to. In my example, the name resolver library will query the server on 192.168.11.121 to provide an IP address for www.google.com. Without wanting to go into too much detail about root servers, hierarchies, and recursive caching servers etc. we can see that my local DNS server on 192.168.11.121 returns the information to ping that www.google.com should be listening on 216.58.216.132 (which does indeed successfully ping).

DNS server offline

All of the examples above demonstrate what happens when everything is working normally as expected. The DNS server is online and responsive (and can recursively query where necessary), the Internet routes are available, and the target server is configured to respond to pings. The original customer issue dealt with an offline DNS server. What happens here? In order to demonstrate this, I’ll insert an invalid DNS server address to the nameserver list, so that my /etc/resolv.conf file now looks like:

[jonny@olympus ~]$ cat /etc/resolv.conf 
Generated by NetworkManager
search ipa.champion
nameserver 192.168.11.199
nameserver 192.168.11.121
nameserver 192.168.11.254

Running the same timed ping command as previously returns the following output:

[jonny@olympus ~]$ time ping -c1 www.google.com
PING www.google.com (216.58.193.68) 56(84) bytes of data.
64 bytes from sea15s07-in-f68.1e100.net (216.58.193.68): icmp_seq=1 ttl=50 time=141 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 141.721/141.721/141.721/0.000 ms
real 0m10.363s
user 0m0.001s
sys 0m0.005s

The output is pretty much the same as before, except now we’ve clocked up 10.363 seconds in real time. The other values remain approximately the same as they were previously. Where has the extra 10 seconds been accumulated from?

In this particular example, there is no device listening on 192.168.11.199. Taking a LAN packet capture shows that the client server performs an ARP request to determine which MAC address is allocated to the 192.168.11.199 IP address. As no device is currently assigned to this IP address, the request times out, and the DNS resolution request will also time out after the timeout value is reached. By default, the timeout value is 5 seconds. The eagle-eyed amongst you will have spotted that 5 seconds is not equal to 10 seconds. It is equal to 2 x 5 seconds though. My immediate reaction is that there must be 2 lookups happening here.

Indeed, looking at the LAN packet trace there are 2  DNS lookups performed which accounts for the time delay of 10 seconds here (5 seconds per lookup). It may be of interest to note that if the first DNS server IP address is replaced with an IP address of a responsive server, albeit not running a DNS server, then name resolution returns to a value approaching normal. In this instance the server is reachable, refuses to serve DNS queries, so the client requester can move onto the next server in the list without needing to time out.

Where we’re going, we don’t need cache

Going back to the original issue reported by the customer – one of the questions raised, was how long the Linux server would cache DNS responses for, how the cache could be emptied, and why, in the above the example, the cache didn’t seem to work?

The belief was that the DNS resolver library would cache successful name requests – typically for the time specified in the ‘time to live’ returned by the server when a request was received.

SPOILER ALERT: The Linux resolver library does not cache DNS replies (successful or unsuccessful).

The very simple reason for 10 seconds being added to the total time is simply that the Linux resolver performs no caching whatsoever. Every time a name to IP address (or vice versa) translation/mapping is required, a query is made to the DNS servers. The resolver library isn’t even ‘smart’ enough to know to mark a server as offline. As such, the library will keep using the first listed entry again and again. And again.

In some ways this is very desirable. The resolver library’s function is not to provide high availability for DNS, nor is it to mask DNS server issues. Sticking with the UNIX philosophy of performing one job, but performing it well, the resolver library is responsible for just that … resolving names to IP addresses. It provides a consistent and well understood method of performing this task. Using my resolv.conf file as an example, when a name resolution request is made, the following steps are performed:

  1. What is the MAC address of the computer with IP 192.168.11.199?
  2. (No response is provided to this query)
  3. 5 seconds later, what is the MAC address of the computer with IP 192.168.11.121?
  4. This is answered (if not already maintained in the local ARP cache)
  5. IP address request sent for www.google.com to 192.168.11.121
  6. Server 192.168.11.121 responds that www.google.com has IP 216.58.216.132
  7. ping packet(s) sent and received from 216.58.216.132
  8. What is the MAC address of the computer with IP 192.168.11.199?
  9. (Still no response provided)
  10. 5 seconds later, what is the MAC address of the computer with IP 192.168.11.121 (if not already in the local ARP cache)
  11. Reverse lookup of 216.58.216.132 requested to 192.1.68.11.121
  12. Answer provided.

There is a bit more going on – particularly on the DNS server, which isn’t authoritative for the .com (or .google.com) domain. However, this does provide a decent breakdown as to what has happened, and why this takes as long as it does. The resolver library will query each listed nameserver in the /etc/resolv.conf file in turn. By default, a time out of 5 seconds is configured between queries to different listed nameservers. The default settings are also to try the nameservers in the same sequence every time and allow 2 retries. What this means in practice is that a server with 3 configured nameservers who all happen to be offline could experience the following:

  1. nameserver 1 queried
  2. 5 seconds later, nameserver 2 queried
  3. 5 seconds later, nameserver 3 queried
  4. 5 seconds later, nameserver 1 queried
  5. 5 seconds later, nameserver 2 queried
  6. 5 seconds later, nameserver 3 queried
  7. 5 seconds later, nameserver 1 queried
  8. 5 seconds later, nameserver 2 queried
  9. 5 seconds later, nameserver 3 queried
  10. Finally all attempts are exhausted and the effort is marked as failed.

It can take 40 seconds for the entire process to fail. For every name resolution attempt. Having no cache means that the library has to go back to the nameservers for every attempt, which on the face of it seems like a bad design choice, especially given that DNS returns TTL values expressly for caching.

The default setting is also to always read the nameservers sequentially. It is possible to roundrobin these for every name request, which would (probably) reduce the impact of an offline DNS server, but would mean that 1 in 3 requests would be delayed if one of the DNS servers was offline.

However, whilst caching can prove helpful in some scenarios, it is not universally helpful either. Caching slows down DNS propagation times which can have serious impact on services (especially in container environments … successfully shoe-horned a container reference in after all!). Caching also makes troubleshooting a bit more difficult than it needs to be as it can make it difficult to discern if the problem lies locally or on a remote server. 

In some scenarios it may be beneficial to deploy a caching solution. Traditionally, this might have been through the ‘name service caching daemon’ (ncsd). However, there are bugs in ncsd, and it is fair to say that most effort is now spent on local DNS caches using either dnsmasq or unbound.

Next time …

In the next exciting installment of name resolution, I’ll describe deploying unbound as a recursive caching DNS server for the local server. I should be able to compare the performance of name requests using unbound compared to not having it present and demonstrate some of it’s advantages.