DNS query distribution and its effect on caching

4 min readJan 13, 2020

Many studies suggest that internet resources (including dns resource records) follow a zipf-like distribution. see “DNS Performance and Effectiveness of Caching”.

zipf law originally states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table.

we can check these claims using data from our own authoritative DNS server.

we use two different aggregation on queries, qname-qtype and qname . we also fetch data from two time frames, a 5-minute window and a 1-day window. here are sorted query frequencies of these 4 scenarios:

we can see the familiar long-tail pattern in all our data sets ( although it looks more laplacian than power-law). but not all long-tail patterns are power-law. here are the results of curve fitting using matlab:

General model:
     f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
       a =       0.039  (0.03837, 0.03962)It worth to mention anotherGoodness of fit:
  SSE: 0.001408
  R-square: 0.6219
  Adjusted R-square: 0.6219
  RMSE: 0.0003753

General model:
     f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
       a =     0.02773  (0.02719, 0.02828)Goodness of fit:
  SSE: 0.001122
  R-square: 0.5046
  Adjusted R-square: 0.5046
  RMSE: 0.000335

General model:
     f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
       a =     0.03655  (0.03585, 0.03725)Goodness of fit:
  SSE: 0.001314
  R-square: 0.6033
  Adjusted R-square: 0.6033
  RMSE: 0.0004222

General model:
     f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
       a =     0.02935  (0.02889, 0.02982)Goodness of fit:
  SSE: 0.0008184
  R-square: 0.6101
  Adjusted R-square: 0.6101
  RMSE: 0.0002861

is there a better fit?

probably, for example check this distribution called “Burr Distribution” identified by the following formula:

although it is mostly used for describing the frequency of an event over time, the fitting (not taking into account the initial rise of log-logistic distributions) is uncanny. this may or may not mean something; but first we should take a step back and ask ourselves this question:

should we really care?

probably not.

all we need to now is that we have a consistent 80–20 situation. there are three areas in caching strategy where this has an effect:

size limit eviction:

in a size-limited cache we want to keep the top 20% resources and discard or not allow the tail items.

a big enough cache can eliminate this problem, but this is not always possible.

time limit eviction

we prefer to keep our top 20% resources in cache at all time. in case of an external expiration (e.g. dns ttl value) we can still somehow keep our resource (pre-fetching or using stale value).

time limit eviction may not be in our best interest.

sharding

sharding is the process of distributing cache items across multiple storages in order to optimize parallel access.

in this case we want to distribute most of the shards between top 20% resources.

cache state in golang

it’s now time to benchmark a few popular cache libraries of our programming language of choice, golang, and see how they perform under these conditions. the code for these tests can be found here.

we tested the following libraries:

hit rate

for hit rate, there’s also a theoretical optimal case which uses future information. note that the zipf generator we used have a heavier tail than our sample data.