DNSSEC NTAs: No Good Compromises
Summary
Quad9 believes that DNSSEC is required for a secure DNS and will only implement Negative Trust Anchors (NTAs) for the shortest time possible when it believes the greater harm would come from taking no action.
This post will talk about our NTA policy moving forward, including background on what an NTA is and when they will be used by Quad9.
DNSSEC & NTAs: General Background
Quad9 tries to be as secure and trustworthy as possible when resolving domains for end users, and one of the many methods we use to ensure this security is to validate domains that are using DNSSEC.
DNSSEC is a complex method of ensuring that domain names are cryptographically secured against tampering between end users, recursive resolvers, and authoritative servers. In short, it attempts to guarantee the response received by the recursive resolver (Quad9) was sent by the owner and operator of the domain across the set of potentially untrusted networks which comprise the internet. Extending this security guarantee all the way to the end user requires encryption between the user and the recursor, which is beyond the scope of this document but is something Quad9 offers with several available encryption protocols. DNSSEC is useful to protect the answers across what is typically the “longest” section of the transaction - between the recursive resolver and the authoritative server. It can be used all the way to the client, but very few stub resolvers like laptops or mobile devices implement DNSSEC validation and rely on the recursive resolver for that task. DNSSEC does not secure the transaction against observation during the lookup process; it only validates that the sender’s data is genuine.
DNSSEC has been increasingly prevalent with domain name operators in recent years to prevent a variety of security issues surrounding possible illicit DNS tampering. It also provides a number of additional benefits against some types of denial-of-service attacks and has other secondary benefits upon which new stability and security tools are being built. For a full explanation of DNSSEC, please take a look at this quick primer.
Signing a zone with DNSSEC is an implicit demand by the authoritative domain name system operator that Quad9, the recursive resolver operator, must not resolve the domain if the encryption process returns with a fault. A DNSSEC fault would indicate that someone is attempting to subvert the domain name data of the authoritative server by some mechanism, and the DNSSEC failure is then the intended result. A domain hijack, cache poisoning, or other malicious action would cause such a failure, and Quad9’s refusal to return a corrupted name in those circumstances is the desired outcome.
In some circumstances, DNSSEC may be configured for a zone but may not work properly during a complete query chain process as a result of something other than malicious actions. This is most often due to misconfiguration, or in some circumstances it is due to some intermediary device or network element either corrupting or refusing to pass DNSSEC messages appropriately. Sometimes the failure is the result of differing interpretations of IETF RFC standards between authoritative and recursive resolvers or between two differing versions or packages of either authoritative or recursive software. Any of these types of faults will end with the same result as though there was an attempt to subvert the domain maliciously; the recursive resolver (Quad9) will mark the lookup as failing the DNSSEC tests, the lookup itself will fail, and the client will receive an error.
What happens when a fault is not a fault?
It is possible for a recursive resolver to selectively ignore DNSSEC validation for a zone which shows faults during the DNSSEC process. An exception rule can be inserted in a recursive resolver called a “Negative Trust Anchor (NTA)” which specifically excludes a domain or zone from DNSSEC validation processes, even if that zone is marked as being DNSSEC-signed by the owner. This is typically done when a domain/zone failure would cause significant operational problems for the users of the recursive resolver and typically only when a domain/zone is known to be faulty due to a configuration or other issue unrelated to any known security incident.
NTA exception rules manually inserted into recursive resolvers to bypass DNSSEC for a host or domain remove the integrity validation guarantees for that domain tree. NTAs are intended to be temporary workarounds for configuration mishaps, but they are sometimes used when a network path has difficulty passing DNSSEC records or when bugs are encountered in operational settings that cause failures for a small set of domains.
We believe NTAs are dangerous and often lead to more longer-term problems than they solve in the short term. Quad9’s intention is to let any records in a zone that is “broken” from a DNSSEC perspective remain unresolvable in order to encourage authoritative operators to repair the problem with the non-functional zone. If we or other recursive resolver operators continue to insert NTA exceptions in our respective systems, this will cause the operator of the faulty authoritative zone to receive contradictory information about the status of their DNSSEC configuration from various recursive resolvers which may or may not have identical NTA settings. Having a zone which resolves in an inconsistent way between recursive resolver operators creates conditions where authoritative operators are both uncertain of faults and may be unmotivated to correct issues that occur. In our experience, the state of some resolver systems working correctly (i.e., failing to resolve) and some not working correctly (i.e., resolving the name) with DNSSEC leads to an assumption of fault by the most distant participant, which slows or halts the repair process. In other words: “This is the fault of some other person, very far away from me: Ticket Closed.” With some resolvers working and some not, this may lead to authoritative operators blaming the recursive operator and vice versa. An even worse outcome that may occur is when there are support organizations involved who are not well-versed in DNS security, which may lead to further delays when inconsistent results are demonstrated by end users.
This implicit disregard of DNSSEC requirements by insertion of NTAs and the subsequent failure to repair both lead to a reduction in secure behaviors and is contradictory to the goals of DNSSEC adoption by both authoritative operators and recursive operators alike, and more importantly it reduces security overall for end users.
Prisoner’s Dilemma
We hope that other providers join us in eliminating or significantly reducing their NTA lists and making those lists public as we have done on our website here: https://quad9.net/service/negative-trust-anchors/. Inconsistent or unclear NTA inclusions by various providers leads to a “prisoner’s dilemma” where the recursive operators who have the worst security (meaning a higher number of NTAs or no validating DNSSEC at all) are most likely to receive migratory users in the case of faults that would cause a strict recursive operator to appear to be non-functional. Recursive operators who “defect” or who were never cooperating with DNSSEC security standards to start with will seem to be functional for resolution of what should be faulty DNSSEC lookups. In these cases, users may end up incorrectly considering those resolvers more reliable when in fact they are less secure.
Quad9 does operate a service with no DNSSEC (and no malware blocking) which can be used for testing DNSSEC versus non-DNSSEC validated answers. Our DNS service operating on 9.9.9.10/149.112.112.10/2620:fe::10 is non-validating, but we would strongly suggest it is not used for more than testing purposes as there are no protections provided by that resolver service address, as opposed to our other service addresses which have DNSSEC strict validation and malware blocking.
Our intention is to avoid adding NTA exceptions in the future and bring the NTA list down to zero entries as an eventual goal. If it is possible for other DNSSEC-validating, large recursive operators to also commit to the reduction or elimination of NTA records, then the criteria we use for adding NTA exceptions below will significantly change or be removed entirely. We may not be in that condition today, but it is our hope that the continued improvement of DNSSEC and expansion of strict DNSSEC validation will eventually allow the elimination of the concept of NTAs and subsequent prisoner’s dilemma conflict between recursive operators.
NTA Addition Criteria Discussion
We recognize in certain extraordinary circumstances that NTAs may still be operationally necessary (see below). Adding an NTA is implicitly against the desire of the zone owner/operator; it is a declaration to ignore the security settings demanded by the authoritative server. But often there are conflicting goals in operational environments which require temporary solutions to prevent outcomes which are ultimately worse than others. End users expect that domains will resolve “correctly” by their own definition and are typically unaware of DNSSEC or the distinction between correct and incorrect resolution with security settings as an additional layer of complexity.
Users who believe that the recursive resolver is at fault (instead of the domain operator) may be at risk for changing their settings to a resolver that answers in the way they expect, even if that result is less secure. Once that change is performed it is unlikely they will change back to the DNSSEC validating resolver, even though that migration implies they have moved to a resolver that either already has an NTA or is not doing DNSSEC strict validation. Such a migration implicitly means lower security for that end user. We believe this type of migratory behavior (abandonment of security-oriented DNS for nonsecurity-oriented DNS) is a net negative overall. It is worse than the addition of an NTA if we can validate that adding the NTA does not produce results that we know to be malicious for the period of time necessary.
Quad9 will apply the following criteria when deciding to add an NTA:
- If we receive significant customer complaints about a faulty DNSSEC-signed zone, AND
- If we believe this will lead to a significant number of customers leaving the platform if it does not resolve, AND
- If we believe the zone in question is simply faulty instead of compromised,
then we MAY consider adding an NTA for that zone or parent zone. Meeting these criteria does not always mean an NTA will be added, but these are our minimal requirements.
There are additional criteria we would evaluate further which may prevent the addition of the NTA, and an NTA exception is to be considered as the last option to prevent user migration in large numbers.
As part of an NTA addition, we will try to reach the operators of the zone in question to notify them of the fault and ask them about the repair status. We will frequently re-evaluate the list of NTA additions to remove zones which no longer meet the criteria or which have been repaired. Zone owners/operators may contact us to remove NTAs, regardless of the zone’s functional condition.
NTA History
Quad9 was one of the first large-scale recursive resolvers to validate DNSSEC. “Validating” means if a DNSSEC failure is observed, instead of just logging a fault but continuing to answer the query, we will not answer the question. In 2017 when Quad9 was publicly launched, DNSSEC was still undergoing significant deployment churn, especially underneath several top level domains. The “.gov” and “.mil” domains are operated by the US Government, which had mandated DNSSEC deployment in a relatively short period of time across all domains within those two zones. We think this is a well-considered policy and improves the security for both users and operators. However, several circumstances led to an unexpected result:
- There are a large number of zones and independent operators of zones within those two top-level domains, with varying levels of DNSSEC expertise;
- The level of DNSSEC robustness in available software packages at the time of the deployment was not universally stable; and
- The timeframes for deployment for certain organizations may have been shorter than required for error-free implementation.
These three reasons (and quite certainly others) led to a significant portion of the .gov and .mil TLDs having long-term DNSSEC failure rates which were higher than in other TLDs and which caused critical errors for end users. (Note: there was a similar circumstance for Canadian Provincial government zone errors, but the circumstances around that deployment are not known to us.)
As an additional complication, in North America, Quad9 is extensively used by state/province, local, and small federal agencies as an inexpensive additional security layer. This means any DNSSEC failures on .gov domains were especially difficult from a customer service perspective, as that user base would present “fix-or-leave” support tickets which could only be solved by addition of NTAs. The .gov/.mil domains were at a volume that was sufficient for us to exclude both TLDs from DNSSEC validation in order to prevent end users from moving to non-validating resolvers, which was a sad and counter-intuitive result of what was a useful security initiative.
In subsequent years since the addition of those NTAs, the improvement of DNSSEC software and expertise has gradually created an environment where these blanket NTAs are no longer needed. The number of faulty zones is now down to the point where these top-level domain NTAs can be removed. The remaining misconfigured zones will either stop resolving as intended or will be identified and selectively accepted with a temporary NTA, and the operators of those zones will be given notice that we will remove the NTAs in a short period of time.
Other recursive resolvers have installed NTAs, most without rigorous documentation. There are almost certainly many .gov and .mil zones that are excluded from DNSSEC validation across other validating DNSSEC recursive resolvers, along with other zones in .com, .net, .org, and almost every other active top-level GTLD (generic top-level domain) or CCTLD (country code top level domain) zone. Our hope is that by publishing our NTA list on our website, as well as our criteria, we can take the first steps to an industry-wide model of transparency, notification, and action to prevent users from experiencing confusing or insecure DNS interactions.
Quad9 has added no NTAs in the last five years that have lasted more than 24 hours. Notably, the .bd TLD outage of April 2024 was an event where we assisted the Bangladesh user community by putting in a temporary NTA. We do not expect this policy statement to change the frequency of these events.
Summary
We will not eliminate NTAs today with this policy, but our goal is to reduce them to a very small number with very limited scope. Our intention is to remove them entirely at some point in the future when DNSSEC becomes as ubiquitous as HTTPS on websites, or when some other authorization/authentication mechanisms (such as ADOT and similar protocols and new signaling methods like DELEG) are implemented to assure the integrity of the resolution chain between recursive and authoritative DNS operators. By clarifying our policy and publishing our NTA list, we hope to create a demonstration of behavior for making security policy and treatment more transparent across the DNS resolver community. We invite others to join us.