Is TTL broken in Java?
Today at my client’s site, I encountered an issue where CoreDNS was being overwhelmed with DNS queries. Logs were flooded with UnknownHostException across a fleet of services. It turned out to be related to DNS response caching. Looking at the git history, I discovered that 5 years ago, someone had set the property sun.net.inetaddr.ttl to zero for all JVM-based Docker images. The client’s infrastructure team recently migrated from Nomad to Kubernetes, which, unfortunately, exposed this lack of DNS response caching.
What is TTL and Why Does It Matter? #
TTL is an acronym for Time To Live and represents a duration in seconds. It’s a crucial networking concept that defines the lifespan of data in a computer or network. Data is considered stale after the interval has passed and is then either refreshed or discarded. Big TTL intervals can lead to outdated information being returned, while very low or zero TTL values can cause excessive load on servers due to frequent requests for the same data. A TTL of 0 seconds means that nothing is cached at all. In the case of DNS, this means that clients will try to resolve the DNS for every single request, putting unnecessary strain on the DNS servers.
Testing DNS Caching Behavior #
Working with legacy systems can be challenging because the original reasoning behind decisions is likely lost over time, unless documented properly. There might have been valid reasons for setting the TTL to 0, but it’s essential to revisit and reassess these decisions periodically. Those old arguments might no longer hold in today’s context.
One argument sparked my curiosity: “We set TTL to 0 because it’s broken in Java and therefore, responses are cached forever.” This statement prompted me to investigate further. Is DNS caching really broken in Java? To validate this claim, I decided to conduct a simple experiment by creating a DNS Cache Tester tool. It helped me verify whether DNS responses are being cached correctly based on a specified TTL value:
import java.net.InetAddress;
import java.util.ArrayList;
import java.util.List;
public class DnsCacheTester {
private static final int DEFAULT_ITERATIONS = 10;
private static final int SLEEP_MS = 1000;
public static void main(String[] args) throws Exception {
if (args.length < 1) {
System.out.println("Usage: java DnsCacheTester <hostname> [iterations]");
System.exit(1);
}
String hostname = args[0];
int iterations = DEFAULT_ITERATIONS;
if (args.length >= 2) {
try {
iterations = Integer.parseInt(args[1]);
} catch (NumberFormatException e) {
System.out.printf("Invalid iterations argument, using default: %d%n", DEFAULT_ITERATIONS);
}
}
List<Long> times = new ArrayList<>();
for (int i = 0; i < iterations; i++) {
try {
long start = System.nanoTime();
InetAddress addr = InetAddress.getByName(hostname);
long end = System.nanoTime();
long durationMs = (end - start) / 1_000_000;
times.add(durationMs);
System.out.printf("Iteration %d: %d ms, IP: %s%n", (i + 1), durationMs, addr.getHostAddress());
} catch (Exception e) {
System.out.printf("Iteration %d: DNS resolution failed (%s)%n", (i + 1), e.getMessage());
times.add(-1L);
}
try {
Thread.sleep(SLEEP_MS);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
System.out.println("Interrupted, exiting.");
break;
}
}
List<Long> successfulTimes = times.stream().filter(t -> t >= 0).toList();
long min = successfulTimes.stream().min(Long::compare).orElse(0L);
long max = successfulTimes.stream().max(Long::compare).orElse(0L);
double avg = successfulTimes.stream().mapToLong(Long::longValue).average().orElse(0.0);
System.out.printf("%nDNS Resolution timings for %s:%n", hostname);
System.out.printf("Min: %d ms%n", min);
System.out.printf("Max: %d ms%n", max);
System.out.printf("Avg: %.2f ms%n", avg);
long failedCount = times.stream().filter(t -> t == -1L).count();
if (failedCount > 0) {
System.out.printf("Unsuccessful attempts: %d%n", failedCount);
}
}
}Now the only thing left to do is run this tool inside a JVM and experiment with different TTL settings:
docker run --rm -v "$PWD":/app -w /app amazoncorretto:21-alpine \
sh -c 'javac DnsCacheTester.java && java \
-Dsun.net.spi.nameservice.provider.1=dns,sun \
-Dsun.net.spi.nameservice.nameservers=host.docker.internal \
-Dsun.net.inetaddr.ttl=<number of seconds> \
DnsCacheTester <host> 10'Running the Tests #
Let’s compare a TTL of 0 seconds to a TTL of 4 seconds over a 10-second period. My hypothesis is as follows:
If DNS caching is broken in Java, we should see no difference in timings between different TTL values
However, as shown in the results below, there’s a clear difference. Response times of 0 ms indicate the DNS response was definitely served from cache:
TTL of 0 #
Iteration 1: 56 ms, IP: 142.250.153.139
Iteration 2: 3 ms, IP: 142.250.153.139
Iteration 3: 3 ms, IP: 142.250.153.138
Iteration 4: 2 ms, IP: 142.250.153.139
Iteration 5: 2 ms, IP: 142.250.153.139
Iteration 6: 2 ms, IP: 142.250.153.139
Iteration 7: 3 ms, IP: 142.250.153.139
Iteration 8: 3 ms, IP: 142.250.153.139
Iteration 9: 3 ms, IP: 142.250.153.139
Iteration 10: 5 ms, IP: 142.250.153.139
DNS Resolution timings for google.com:
Min: 2 ms
Max: 56 ms
Avg: 8.20 msTTL of 4 #
Iteration 1: 133 ms, IP: 142.250.153.139
Iteration 2: 0 ms, IP: 142.250.153.139
Iteration 3: 0 ms, IP: 142.250.153.139
Iteration 4: 0 ms, IP: 142.250.153.139
Iteration 5: 3 ms, IP: 142.250.153.138
Iteration 6: 0 ms, IP: 142.250.153.138
Iteration 7: 0 ms, IP: 142.250.153.138
Iteration 8: 0 ms, IP: 142.250.153.138
Iteration 9: 2 ms, IP: 142.250.153.139
Iteration 10: 0 ms, IP: 142.250.153.139
DNS Resolution timings for google.com:
Min: 0 ms
Max: 133 ms
Avg: 13.80 msWhat About Older Versions of Java? #
Java 21 wasn’t available 5 years ago, so I also tested DNS response caching in Java 8, 11, and 17. No problems were detected in any of these versions. To test Java 8 and 11, I had to modify the code to remove streams and lambdas. Hypothetically, it’s possible that problems existed in other distributions, but I decided not to dive into that rabit hole.
Aftermath #
Applying a TTL of 1 second to all JVM-based Docker images owned by my team has significantly reduced the load on the CoreDNS servers. The UnknownHostExceptions in our logs have disappeared, and we’ve observed increased overall stability of the services. It’s interesting to see how just a 1-second TTL reduces the load by 50% while still ensuring that DNS records remain relatively fresh. I do think 1 second is still too low, but this is something I’ll discuss with the infrastructure team in the coming days.
Conclusion #
Working with legacy systems is hard. Finding the original reasoning behind decisions can be challenging, especially when they were made many years ago by people who may no longer be available. However, it’s essential to periodically review and reassess these decisions to ensure they still align with current needs and technologies. It’s valuable to challenge assumptions using the scientific method: formulate hypotheses, conduct experiments, and draw conclusions. In this case, DNS caching wasn’t broken in Java, and revisiting the TTL settings not only resolved a significant issue but also improved the overall performance and reliability of my team’s services.
Posted on Dec 18, 2025 (updated on Dec 23, 2025)