DNS query distribution and its effect on caching
Many studies suggest that internet resources (including dns resource records) follow a zipf-like distribution. see “DNS Performance and Effectiveness of Caching”.
zipf law originally states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table.
we can check these claims using data from our own authoritative DNS server.
we use two different aggregation on queries, qname-qtype
and qname
. we also fetch data from two time frames, a 5-minute window and a 1-day window. here are sorted query frequencies of these 4 scenarios:
we can see the familiar long-tail pattern in all our data sets ( although it looks more laplacian than power-law). but not all long-tail patterns are power-law. here are the results of curve fitting using matlab:
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.039 (0.03837, 0.03962)It worth to mention anotherGoodness of fit:
SSE: 0.001408
R-square: 0.6219
Adjusted R-square: 0.6219
RMSE: 0.0003753
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.02773 (0.02719, 0.02828)Goodness of fit:
SSE: 0.001122
R-square: 0.5046
Adjusted R-square: 0.5046
RMSE: 0.000335
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.03655 (0.03585, 0.03725)Goodness of fit:
SSE: 0.001314
R-square: 0.6033
Adjusted R-square: 0.6033
RMSE: 0.0004222
General model:
f(x) = (x^(-(a+1)))/zeta(a+1)
Coefficients (with 95% confidence bounds):
a = 0.02935 (0.02889, 0.02982)Goodness of fit:
SSE: 0.0008184
R-square: 0.6101
Adjusted R-square: 0.6101
RMSE: 0.0002861
is there a better fit?
probably, for example check this distribution called “Burr Distribution” identified by the following formula:
although it is mostly used for describing the frequency of an event over time, the fitting (not taking into account the initial rise of log-logistic distributions) is uncanny. this may or may not mean something; but first we should take a step back and ask ourselves this question:
should we really care?
probably not.
all we need to now is that we have a consistent 80–20 situation. there are three areas in caching strategy where this has an effect:
- size limit eviction:
in a size-limited cache we want to keep the top 20% resources and discard or not allow the tail items.
a big enough cache can eliminate this problem, but this is not always possible.
- time limit eviction
we prefer to keep our top 20% resources in cache at all time. in case of an external expiration (e.g. dns ttl value) we can still somehow keep our resource (pre-fetching or using stale value).
time limit eviction may not be in our best interest.
- sharding
sharding is the process of distributing cache items across multiple storages in order to optimize parallel access.
in this case we want to distribute most of the shards between top 20% resources.
cache state in golang
it’s now time to benchmark a few popular cache libraries of our programming language of choice, golang, and see how they perform under these conditions. the code for these tests can be found here.
we tested the following libraries:
hit rate
for hit rate, there’s also a theoretical optimal case which uses future information. note that the zipf generator we used have a heavier tail than our sample data.
Throughput
throughput benchmarks ran on an Intel Core i7–8750H with 16 GB of RAM
conclusion
goburrow has the best hit rate, ristretto has the best performance in read heavy scenarios, and fastcache in write heavy scenarios