DNS query distribution and its effect on caching

Arash Cordi
4 min readJan 13, 2020

Many studies suggest that internet resources (including dns resource records) follow a zipf-like distribution. see “DNS Performance and Effectiveness of Caching”.

zipf law originally states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table.

we can check these claims using data from our own authoritative DNS server.

we use two different aggregation on queries, qname-qtype and qname . we also fetch data from two time frames, a 5-minute window and a 1-day window. here are sorted query frequencies of these 4 scenarios:

qname — 1day query frequency
(qname, qtype) — 1day query frequency
qname — 5min query frequency
(qname, qtype) — 5min query frequency

we can see the familiar long-tail pattern in all our data sets ( although it looks more laplacian than power-law). but not all long-tail patterns are power-law. here are the results of curve fitting using matlab:

qname — 1day log-log diagram
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.039 (0.03837, 0.03962)It worth to mention another
Goodness of fit:
SSE: 0.001408
R-square: 0.6219
Adjusted R-square: 0.6219
RMSE: 0.0003753
(qname, qtype) — 1day log-log diagram
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.02773 (0.02719, 0.02828)
Goodness of fit:
SSE: 0.001122
R-square: 0.5046
Adjusted R-square: 0.5046
RMSE: 0.000335
qname — 5min log-log diagram
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.03655 (0.03585, 0.03725)
Goodness of fit:
SSE: 0.001314
R-square: 0.6033
Adjusted R-square: 0.6033
RMSE: 0.0004222
(qname, qtype) — 5min log-log diagram
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.02935 (0.02889, 0.02982)
Goodness of fit:
SSE: 0.0008184
R-square: 0.6101
Adjusted R-square: 0.6101
RMSE: 0.0002861

is there a better fit?

probably, for example check this distribution called “Burr Distribution” identified by the following formula:

(qname, qtype) — 1day log-log diagram

although it is mostly used for describing the frequency of an event over time, the fitting (not taking into account the initial rise of log-logistic distributions) is uncanny. this may or may not mean something; but first we should take a step back and ask ourselves this question:

should we really care?

probably not.

all we need to now is that we have a consistent 80–20 situation. there are three areas in caching strategy where this has an effect:

  • size limit eviction:

in a size-limited cache we want to keep the top 20% resources and discard or not allow the tail items.

a big enough cache can eliminate this problem, but this is not always possible.

  • time limit eviction

we prefer to keep our top 20% resources in cache at all time. in case of an external expiration (e.g. dns ttl value) we can still somehow keep our resource (pre-fetching or using stale value).

time limit eviction may not be in our best interest.

  • sharding

sharding is the process of distributing cache items across multiple storages in order to optimize parallel access.

in this case we want to distribute most of the shards between top 20% resources.

cache state in golang

it’s now time to benchmark a few popular cache libraries of our programming language of choice, golang, and see how they perform under these conditions. the code for these tests can be found here.

we tested the following libraries:

hit rate

for hit rate, there’s also a theoretical optimal case which uses future information. note that the zipf generator we used have a heavier tail than our sample data.

Throughput

throughput benchmarks ran on an Intel Core i7–8750H with 16 GB of RAM

conclusion

goburrow has the best hit rate, ristretto has the best performance in read heavy scenarios, and fastcache in write heavy scenarios

--

--