Lies, damn lies and DNS performance statistics

To paraphrase Mark Twain (and Benjamin Disraeli if internet search results can be trusted), there are three kinds of DNS lies: lies, damn lies and DNS performance statistics.

Most networking professionals know to have a healthy skepticism about information put out by the marketing departments of networking vendors. And so they should. It is the job of every marketer to put their company and products in the best possible light, and sometimes this means they have to stretch the truth a bit.

It is good to keep this in mind when reviewing the claims now being made by various commercial DNS vendors regarding the performance of their caching resolvers. Performance is the new battle ground for these vendors primarily because it is an increasingly common problem for service providers, especially for mobile operators who are experiencing huge growth in 4G adoption.

So what do you need to watch out for, when reviewing DNS performance statistics claims? Here are a few things to consider.

Beware of 100% cache hit numbers

On every commercial DNS datasheet you will find a performance number measured in queries per second. But what does this mean? Is this the maximum performance of the server in a real POP with real traffic? Is it the maximum performance in the lab? What is the mixture of cache hits (which are easy to resolve) and cache misses (which are considerably harder)?

It turns out it is really hard to quote real-world performance numbers in a standard way so that performance numbers can be compared across vendors and products. DNS performance is dependent on too many variables – the latency of the Internet connection, the proximity of the resolver to the client, the proximity and availability of the nameservers for the domains being queried, and the load on the network, just to name a few. So what do vendors do? They publish the only numbers that can be easily replicated by others – 100% cache hits on a quiet network.

But 100% cache hit numbers are largely irrelevant in the real world. Service providers need to provision their DNS so that they can accommodate average and peak traffic while maintaining sufficient headroom for future growth in load and to absorb spikes in traffic from unexpected events like server or network failures that shift increasing load to the remaining online servers. Most service providers networks run between 70% and 90% cache hits. Depending on the architecture of the DNS software being used, the capacity of the server can be very different under these real world loading conditions than under 100% cache hit loading conditions.

Beware of hardware inflated numbers

It turns out that scaling DNS performance in software is a hard problem. As a result, most DNS implementations simply don’t scale by adding more CPU cores to the server.

This has led some vendors to offer hardware-accelerated resolvers. The idea is to put a piece of networking hardware in the server so that the hardware will see the incoming queries and outgoing responses. The hardware can cache the response so that the next query for the same domain name can be answered out of the hardware cache – very fast.

This is a marketer’s dream! The 100% cache hit number goes way up because the hardware can respond so much faster than the software resolver behind it. And that makes the product look great.

Unfortunately, it only looks great on paper. Hardware acceleration does not speed up cache misses, when the much slower resolver software must issue queries over the internet to obtain the answer. As a result, in real-world traffic, the overall performance of the server is determined by the speed of the software when resolving cache misses.

Performance is not DDoS protection

Some DNS vendors market their hardware-accelerated resolvers as being particularly effective in defending against DDoS attacks. Here again, this argument only looks good on paper.

DDoS attacks, even those targeting just the DNS, come in many flavors. Some try to bombard the DNS server with queries, some try to bombard the server with responses, and some try to bombard the server with packets that are computationally expensive to process, all in an effort to take the server or network offline. Hardware-accelerated servers can only help if the server is being bombarded with queries for the same domains, which can be answered out of the hardware cache. For other types of attacks the hardware offers no protection.

Secure64 has taken a different approach to performance scalability. Our Capacity Expansion Module doubles the performance of our DNS Cache product in software, not hardware. This provides scalability of DNS performance regardless of the cache hit ratio, especially in the real-world performance range of 70-90% cache hits, making DNS Cache the highest performing caching resolver available in real-world traffic conditions.

What does this all mean? Here are some tips if you are looking for solutions to improve the performance and scalability of your DNS:

  1. Develop your own benchmarks and ask the DNS vendors you are working with to show how their product performs in these tests. Try to benchmark the servers under current and projected future real-world traffic loads as well as under more abnormal conditions. Test the server under startup conditions, when every query must be resolved over the internet.
  2. If you are concerned about performance under attack (and you should be), test the server under a variety of attack conditions like flood attacks, reflected flood attacks, amplified floods and SYN floods.
  3. Compare the total cost of ownership of each solution to the level of performance and attack resistance that it provides.

The bottom line? Don’t trust the number on any vendors’s datasheet. Verify real world performance in your own environment.