Because of the open, distributed design of the Domain Name System, and its use of the User Datagram Protocol (UDP), DNS is vulnerable to various forms of attack. Public or "open" recursive DNS resolvers are especially at risk, since they do not restrict incoming packets to a set of allowable source IP addresses. We are mostly concerned with two common types of attacks:
Each class of attack is discussed further below.
There are several variants of DNS spoofing attacks that can result in cache poisoning, but the general scenario is as follows:
For an excellent introduction to Kaminsky attacks, see An Illustrated Guide to the Kaminsky DNS Vulnerability.
DNS resolvers are subject to the usual DoS threats that plague any networked system. However, amplification attacks are of particular concern because DNS resolvers are attractive targets to attackers who exploit the resolvers' large response-to-request size ratio to gain additional free bandwidth. Resolvers that support EDNS0 (Extension Mechanisms for DNS) are especially vulnerable because of the substantially larger packet size that they can return.
In an amplification scenario, the attack proceeds as follows:
1See the paper DNS Amplification Attacks for examples, and a good discussion of the problem in general.
Until a standard system-wide solution to DNS vulnerabilities is universally implemented, such as the DNSSEC2 protocol, open DNS resolvers need to independently take some measures to mitigate against known threats. Many techniques have been proposed; see IETF RFC 5452: Measures for making DNS more resilient against forged answers for an overview of most of them. In Google Public DNS, we have implemented, and we recommend, the following approaches:
2Google Public DNS supports EDNS0, which means that we accept and forward DNSSEC-formatted messages; however, we do not yet validate responses.
Some DNS cache corruption can be due to unintentional, and not necessarily malicious, mismatches between requests and responses (e.g. perhaps because of a misconfigured nameserver, a bug in the DNS software, and so on). At a minimum, DNS resolvers should put in checks to verify the credibility and relevance of nameservers' responses. We recommend (and implement) all of the following defenses:
Google Public DNS rejects all of the following:
Once a resolver does enforce basic sanity checks, an attacker has to flood the victim resolver with responses in an effort to match the query ID, UDP port (of the request), IP address (of the response), and query name of the original request before the legitimate nameserver does.
Unfortunately, this is not difficult to achieve, as the one uniquely identifying field, the query ID, is only 16 bits long (i.e. for a 1/65,536 chance in getting it right). The other fields are also limited in range, making the total number of unique combinations a relatively low number. See IETF RFC 5452, Section 7 for a calculation of the combinatorics involved.
Therefore, the challenge is to add as much entropy to the request packet as possible, within the standard format of the DNS message, to make it more difficult for attackers to successfully match a valid combination of fields within the window of opportunity. We recommend, and have implemented, all the techniques discussed in the following sections.
As a basic step, never allow outgoing request packets to use the default UDP port 53, or to use a predictable algorithm for assigning multiple ports (e.g. simple incrementing). Use as wide a range of ports from 1024 to 65535 as allowable in your system, and use a reliable random number generator to assign ports. For example, Google Public DNS uses ~15 bits, to allow for approximately 32,000 different port numbers.
Note that if your servers are deployed behind firewalls, load-balancers, or other devices that perform network address translation (NAT), those devices may de-randomize ports on outgoing packets. Make sure you configure NAT devices to disable port de-randomization.
Some resolvers, when sending out requests to root, TLD, or other nameservers, select the nameserver's IP addressed based on the shortest distance (latency). We recommend that you randomize destination IP addresses to add entropy to the outgoing requests. In Google Public DNS, we simply pick a nameserver randomly among configured nameservers for each zone, somewhat favoring fast and reliable nameservers.
If you are concerned about latency, you can use round-trip time (RTT) banding, which consists of randomizing within a range of addresses that are below a certain latency threshold (e.g. 30 ms, 300 ms, etc.).
The DNS standards require that nameservers treat names with case-insensitivity. That is, the names example.com and EXAMPLE.COM should resolve to the same IP address3. However, in the response, most nameservers echo back the name as it appeared in the request, preserving the original case.
Therefore, another way to add entropy to requests is to randomly vary the case of letters in domain names queried. This technique, also known as "0x20" because bit 0x20 is used to set the case of of US-ASCII letters, was first proposed in the IETF internet draft Use of Bit 0x20 in DNS Labels to Improve Transaction Identity. With this technique, the nameserver response must match not only the query name but the case of every letter in the name string; for example, wWw.eXaMpLe.CoM or WwW.ExamPLe.COm. This may add little or no entropy to queries for the top-level and root domains, but it's effective for most hostnames.
One significant challenge we discovered when implementing this technique is that some nameservers do not follow the expected response behavior:
For both of these types of nameservers, altering the case of the query name would produce undesirable results: for the first group, the response would be indistinguishable from a forged response; for the second group, the response could be totally invalid.
Our current solution to this problem is to create a whitelist of nameservers which we know apply the standards correctly, and to only apply the case randomization technique in requests to those servers. We also list the appropriate exception subdomains for each of them, based on analyzing our logs. If a response that appears to come from those servers does not contain the correct case, we reject the response. The whitelisted nameservers comprise more than 70% of our traffic.
3RFC 1034, Section 3.5 says:
Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.
If a resolver cannot directly resolve a name from the cache, or cannot directly query an authoritative nameserver, then it must follow referrals from a root or TLD nameserver. In most cases, requests to the root or TLD nameservers will result in a referral to another nameserver, rather than an attempt to resolve the name to an IP address. For such requests, it should therefore be safe to attach a random label to a query name to increase the entropy of the request, while not risking a failure to resolve a non-existent name. That is, sending a request to a referring nameserver for a name prefixed with a nonce label, such as entriih-f10r3.www.google.com, should return the same result as a request for www.google.com.
Although in practice such requests make up less than 3% of outgoing requests, assuming normal traffic (since most queries can be answered directly from the cache or by a single query), these are precisely the types of requests that an attacker tries to force a resolver to issue. Therefore, this technique can be very effective at preventing Kaminsky-style exploits.
Implementing this technique requires that nonce labels only be used for requests that are guaranteed to result in referrals; that is, responses that do not contain records in the answers section. However, we encountered several challenges when attempting to define the set of such requests:
To address these challenges, we created a "blacklist" file containing exceptions for which we cannot append nonce labels. The file is populated with hostnames for which TLD nameservers return non-referring responses, according to our server logs. We continually review the exceptions list to ensure that it stays valid over time.
DNS resolvers are vulnerable to "birthday attacks", so called because they exploit the mathematical "birthday paradox", in which the likelihood of a match does not require a large number of inputs. Birthday attacks involve flooding the victim server not only with forged responses but also with initial queries, counting on the resolver to issue multiple requests for a single name resolution. The greater the number of issued outgoing requests, the greater the probability that the attacker will match one of those requests with a forged response: an attacker only needs on the order of 300 in-flight requests for a 50% success chance at matching a forged response, and 700 requests for close to 100% success.
To guard against this attack strategy, you should be sure to discard all duplicate queries from the outbound queue. For example, Google Public DNS, never allows more than a single outstanding request for the same query name, query type, and destination IP address.
Preventing denial-of-service attacks poses several particular challenges for open recursive DNS resolvers:
The best approach for combating DoS attacks is to impose a rate-limiting or "throttling" mechanism. Google Public DNS implements two kinds of rate control:
If queries from a specific source IP address exceed the maximum QPS, or exceed the average bandwidth or amplification limit consistently (the occasional large response will pass), we return (small) error responses or no response at all.