Days since it was DNS
dayssince.itwasdns.netWhen will this tired meme be retired?
There's almost 300 RFCs related to DNS. It can do many things, and some of them are complex. But human error is almost always the root cause.
Your inability to configure DNS properly speaks more about you, then the service itself.
I'm also a huge fan of the blog posts that suggest throwing away fundamental technologies and replacing them with half-bakery without understanding the problem domain or the problems that have been encountered and solved before.
Ok, since I’m obviously not nerdy and/or cynical enough to get the joke, what exactly is wrong with DNS? None of the links seem to indicate what exactly the problem with it is.
It's not DNS. There's no way it's DNS. It was DNS.
https://medium.com/adevinta-tech-blog/its-not-always-dns-unl...
Discussed here:
It's not always DNS, unless it is - https://news.ycombinator.com/item?id=38719126 - Dec 2023 (73 comments)
I think it’s implied that it’s been zero days since the underlying cause of some such problem turned out to be DNS related. The number zero being hard coded implies that the root of the problem is always DNS. But, let’s give MTU its due.
Lately, MTU has gotten up my list of things to check when stuff goes down.
It seems carriers can't get their MTUs straight as of late, especially on MPLS links...
I really thought we had this figured out 20 years ago...
Two days ago one (!) User reported they got \\contso.com\dfsroot\profiles\user inaccessible (with errors loading the desktop etc).
For me the path was accessible, logging on the same server proved the path was accessible, 3 hours of the proper troubleshooting confirmed everything should work. But yet.
Skipping short the circumstances, one of (the 6 total) DCs decided what... DNS server isn't worth running. And for this one user DFS (last changed at least two years ago) decided to fall back to the file server from 2018, which, of course, pointed the DFS target to a no longer existant share.
Of course it wasn't DNS in this case. It was the DNS in this one.
DFS-N != DNS
a.) lack of monitoring for running services b.) cruft/old configurations
It was inavailable DNS server which triggered changing to an old DFS-N server. People on the same RDS server were working fine and did for literally years.
As someone who administers DNS servers, I'm going to guess this is due to DNS being the first thing that gets blamed when something goes wrong; and it is almost never DNS.
That's the point, but often in many network issues, the name resolution is the root cause of the problem. Not necessarily the DNS itself. Sometimes the /etc/hosts is more than enough to cause headaches!
I've certainly added a hostname to an /etc/hosts file for testing and forgotten.
Nothing makes sense, where is this address coming from? Oh. It was me. I put it there.
A DNS misconfiguration is often the root cause of an issue. Hence the saying “it’s always DNS”.
"It’s always DNS" is basically tongue-in-cheek expression, because DNS issues are so frequently the cause of weird outages.
Almost anything you do on the internet (or local network) depends on DNS functioning correctly. DNS can get complex quickly - multiple servers (caching/authoritative/recursive) and protocols = lots of opportunities for something to be misconfigured. Cached entries in particular can be a nightmare if something gets outdated - it takes time for an update to a DNS record to propagate to all the other DNS servers on the Internet. All kinds of other random services etc depend on DNS records being correct and DNS working. When there’s an issue it’s not always immediately apparent that a DNS problem is the root cause, leading to lots of time chasing your tail/tearing your hair out trying to figure out what the heck broke.
I was setting up my own mail server the other day and re-realized how long global DNS propagation really takes.
I’m in the US so it was almost instantaneous between updating the dns records in domain register any being able to verify the changes with my own rDNS server.
But using a UK or NL dns server didn’t immediately pick up those recent changes.
Had to wait an additional 48 hrs for global dns propagation.
A change to a record in your zone should propagate to all your authoritative servers within a few seconds, using the DNS NOTIFY feature. If it doesn’t, that’s a bug in your provider’s setup.
Caches rely on the TTL of records in your zone, or the SOA negative TTL field for negative answers. You control these TTLs, so don’t set them to 48 hours. In most cases there’s little benefit to having TTLs longer than 1 hour. (I use 24 hours for TTLs on NS records and nameserver addresses, because they tend to be more stable, and it’s good for tail latency to keep them in caches longer.)
> Caches rely on the TTL of records in your zone, or the SOA negative TTL field for negative answers.
Sadly the word "should" ought to have appeared in your sentence.
A lot of resolvers ignore the TTL, either because of the number of misconfigured TTL entries (too short), because they resolve a LOT of names and figure they can't afford to keep looking up certain names, or out of sheer orneryness.
I don't update frequently so when I do plan to make updates I adjust my TTL to a short period, wait a few days, then make the updates, then after a week turn the TTL way up again. I've noticed that this is pointless with some big sites.
The reason I set my TTLs to 1 hour is to avoid the faff with preparing for a change by fiddling with TTLs. It’s much easier to have a moderate TTL that’s OK for normal use, and not too long to make changes painful.
By “zone”, is this regional? It was for a .dev tld. So that makes sense as to why I (someone in US) was able to see changes immediately.
No, zone in DNS parlance basically means your domain name (and its records).
This is surprising. What was the TLD for your name? And can you share your SOA config for your zone? (don't need the names, I'm curious about the TTLs in all the SOA fields)
Most TLDs serve glue records with 1-3 day ttls. It's not surprising to me that some servers had the old glue cached (well, I'm assuming they've got traffic... I would be surprised if my domains' glue were cached anywhere of note)
If you can configure your old nameservers to serve the new NS records, sometimes that's helpful.
tld is .dev
localhost:~# dig dev soa … ; <<>> DiG 9.16.39 <<>> dev soa ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65000 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;dev. IN SOA
;; ANSWER SECTION: dev. 299 IN SOA ns-tld1.charlestonroadregistry.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
This is the soa config?
> SOA ns-tld1.charlestonroadregistry.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
I was thinking that maybe you had a large TTL for either the SOA or the minimum TTL field, but both of those are pretty reasonable at 5 minutes and 1 minute.
See this RFC, https://www.rfc-editor.org/rfc/rfc2308#section-4
I hope it's also reported as a DNS record
NTP, BGP, MTU and a lot of other acronyms should be also in the list of not trivial to trace root cause for things that goes from big outages to mysterious malfunctions, specially if there are many parties or servers involved. And the security protocols that are above those and more (DNSSEC, SSL, etc).
I'd been meaning to/thinking about setting up a page for BGP, so here we go: https://itlookedlike.itwasdns.net/but-it-was-bgp/
Shit can and must be troubleshooted with domain-specific tools.
DNS: I do have an ongoing problem specific to Unbound where it refuses to serve some entries in a transparent zone of DHCP-registered addresses that have written to config files properly but it insists on refusing to resolve certain hosts nondeterministically. But that's the only problem I have had in a long while because mostly infrastructure works out of necessity and scale.
Love the hard coded 0
It goes up whenever you can't reach it
That would be opposite
Yup - nothing to compute!
Now that’s what I call optimization.
This would pair up well with my favorite t-shirt ;)