Best examples of unit test failure examples: test doubles in real projects
Let’s start where most developers actually feel pain: when a unit test suite is all green, but production is red. These are real-world style examples of unit test failure examples: test doubles that passed locally yet failed to reflect reality.
1. The overconfident mock: HTTP client that never times out
A backend team built an HTTP client wrapper for calling a third‑party payment API. In unit tests, they used a mock for the HTTP client. The mock was configured to always:
- Return a 200 OK response
- Respond instantly
- Never raise timeouts or transient network errors
The production system, of course, behaved differently. Under load, the real HTTP client started timing out. The retry logic had a bug: it retried indefinitely on certain exceptions. CPU usage spiked, threads piled up, and the service degraded.
The test doubles were the problem. They created an alternate universe where the network was perfect. These tests were a textbook example of how mocks can make you overconfident:
- The retry loop was “covered” by tests
- But the tests never simulated a realistic failure mode
- The infinite retry bug only appeared in production
This is one of the best examples of unit test failure examples: test doubles that only simulate happy paths. To avoid this, modern teams increasingly combine unit tests with chaos-style tests and structured fault injection, especially around network calls.
2. Fake repository with no concurrency: race conditions in production
A team used a simple in‑memory fake repository to test domain logic:
public class InMemoryOrderRepository : IOrderRepository {
private readonly Dictionary<Guid, Order> _orders = new();
public Task<Order?> GetAsync(Guid id) =>
Task.FromResult(_orders.TryGetValue(id, out var order) ? order : null);
public Task SaveAsync(Order order) {
_orders[order.Id] = order;
return Task.CompletedTask;
}
}
Unit tests using this fake all passed. But the real implementation used a relational database with transaction isolation and row‑level locking. Under concurrent load, two requests occasionally updated the same order simultaneously. The last writer silently won, and some business rules were violated.
The fake repository:
- Had no locking
- Had no transaction boundaries
- Could not reproduce race conditions
This is one of the clearest examples of unit test failure examples: test doubles masking concurrency bugs. The tests gave the illusion that the code handled simultaneous updates correctly. It didn’t.
In 2024–2025, more teams are adding:
- Integration tests against real or containerized databases (e.g., Testcontainers)
- Property-based tests that generate concurrent scenarios
These patterns catch behavior that pure in‑memory fakes will never reveal.
3. Stubbed clock that never jumps: time zone and DST failures
Date and time bugs are still responsible for some of the most embarrassing production incidents. One particularly painful example of unit test failure came from a scheduling system.
The team did the right thing in principle: they injected a Clock interface and used a stubbed clock in tests. But all the tests used only two values:
- A fixed time in January
- A fixed time in June
No tests covered:
- Daylight Saving Time transitions
- Leap years
- Time zone conversions between user locale and server
In production, a batch job scheduled “at midnight” ran an hour late during the DST transition in March. The stubbed clock never simulated that edge case. This is another of those quiet examples of unit test failure examples: test doubles where the double is technically correct but behaviorally incomplete.
Better patterns include:
- Generating a variety of date/time cases in tests
- Running some tests with real system clocks in a controlled environment
- Using libraries with well‑tested time behavior (e.g.,
java.timein Java,NodaTimein .NET)
Authoritative references like the NIST time services at time.gov underscore how tricky time handling can be, which is exactly why simplistic stubs are risky.
4. Mocked message broker: the missing dead-letter queue
In a microservices architecture, a team used a mock for their message broker (Kafka/RabbitMQ style) in unit tests. The mock implementation was minimal:
Publishjust appended messages to an in‑memory listSubscribeiterated over that list- No delivery failures
- No retry policies
- No dead‑letter queues
In production, messages sometimes failed deserialization due to version mismatches between producers and consumers. The real broker routed these to a dead‑letter queue. The consuming service’s metrics and alerts depended on monitoring that queue.
Because the test doubles never simulated this behavior, several bugs slipped through:
- The consumer didn’t log failed messages correctly
- The alerting configuration wasn’t updated for a new topic
- Operations only discovered the problem days later by manually inspecting logs
This is a subtle example of how mocks for infrastructure can erase critical operational behavior. Modern teams are increasingly using contract testing (for example, Pact) and end‑to‑end tests in ephemeral environments to catch these mismatches instead of relying only on mocks.
5. Over-specified mocks: refactor breaks tests, not behavior
Not all examples of unit test failure examples: test doubles are about bugs that escape. Some are about tests that fail when the code is perfectly fine.
A frontend team wrote unit tests for a data fetching hook. They used a mocking framework to assert the exact sequence of calls to a lower‑level HTTP utility:
- First call with URL A
- Then call with URL B
- Then call with URL C
Later, they refactored the hook to parallelize requests B and C for performance. Behavior from the user’s perspective was identical, and even more efficient. But the tests failed because the mocks expected a specific call order.
This kind of brittle test is a classic example of failure caused by overusing mocks for implementation details instead of observable behavior. The tests:
- Blocked refactoring
- Added friction to performance improvements
- Provided little real confidence
Current testing trends emphasize behavioral assertions (what the UI renders, what the API returns) rather than strict call‑order verification, especially in 2024’s async‑heavy JavaScript and TypeScript ecosystems.
6. Fake authentication service: security bug slips through
Security tests are notoriously tricky. In one system, the team used a fake authentication service in unit tests that:
- Accepted any username/password pair
- Returned a token with every role enabled
- Never expired tokens
Business logic tests used this fake to “simplify” authorization checks. In production, the real identity provider enforced:
- Strict role‑based access control
- Token expiration and refresh
- Multi‑factor authentication for certain roles
A regression introduced a path where a user with a basic role could trigger an admin‑level operation. The unit tests passed because the fake auth service always granted admin‑like tokens. Production logs later revealed unauthorized actions.
This is one of the most worrying examples of unit test failure examples: test doubles because it intersects with security. While full security validation requires more than unit tests, better doubles would:
- Simulate realistic role sets
- Enforce some basic authorization constraints
- Include tests for denial cases, not just success cases
For modern security guidance, organizations often consult resources like the NIST Cybersecurity Framework, which indirectly reinforces the idea that security assumptions should be tested under realistic conditions—not with overly permissive fakes.
7. In-memory cache fake: eviction never happens
Caching bugs often show up only after weeks in production. A backend team used an in‑memory dictionary as a fake cache in unit tests. The fake never:
- Evicted entries
- Enforced size limits
- Simulated distributed cache behavior
The real cache (Redis) had eviction policies and memory limits. Under real traffic, hot keys were evicted and reloaded frequently. A bug in the cache refresh logic caused stale data to be served for certain keys after eviction.
Because the fake cache never evicted anything, tests never triggered the refresh path. This is another practical example of how incomplete behavior in test doubles leads to undetected bugs.
As distributed caching patterns evolve in 2024–2025, more teams are adopting:
- Integration tests with real Redis or Memcached containers
- Load tests that exercise eviction behavior
Instead of relying solely on in‑memory fakes that behave like infinite, perfect caches.
Patterns behind these examples of unit test failure examples: test doubles
If you look across these real examples, some patterns emerge. These are the reasons so many examples of unit test failure examples: test doubles feel familiar across languages and frameworks.
Behavior coverage vs. line coverage
Most teams still optimize for code coverage percentages. But a test that hits a line of code with an unrealistic test double isn’t giving you real confidence.
Across the examples above:
- The retry loop was covered, but never with a real timeout
- The repository methods were covered, but never under contention
- The scheduling logic was covered, but never through DST transitions
High coverage with naive doubles is one of the best examples of false confidence in testing. The code executes, but not under the conditions that matter.
Test doubles that lie about the contract
Many of the worst failures come from doubles that don’t match the real contract of the dependency:
- A mock HTTP client that never times out
- A fake auth service that always grants admin
- A fake broker that never dead‑letters
These doubles violate the behavioral contract of the real system. In 2024, contract testing tools and schema validation (OpenAPI, JSON Schema) are commonly used to reduce this mismatch, especially in microservices.
Over-mocking internals instead of testing outcomes
The over‑specified mock examples include:
- Tests that assert exact call order
- Tests that verify every internal method call on collaborators
These tests:
- Break when you refactor
- Don’t necessarily fail when behavior is wrong
Modern testing advice from seasoned practitioners (for example, in resources from universities like Carnegie Mellon’s Software Engineering Institute) emphasizes testing observable behavior and using test doubles sparingly for external dependencies.
How to avoid repeating these examples of unit test failure examples: test doubles
You can’t eliminate all risk, but you can dramatically reduce the number of bugs that slip past your unit tests by treating doubles as a design tool, not a shortcut.
Favor fakes that behave like the real thing
Instead of bare‑bones stubs, build fakes that:
- Enforce basic constraints (auth, validation, roles)
- Simulate failure modes (timeouts, network errors, eviction)
- Respect concurrency where relevant
You don’t need production‑grade implementations, but the best examples of effective test doubles mimic the important parts of the real system’s behavior.
Combine unit tests with higher-level tests
Most of the worst examples of unit test failure examples: test doubles come from relying on unit tests alone. In 2024–2025, mature teams typically:
- Keep unit tests fast and focused on pure logic
- Use integration tests with real infrastructure for critical paths
- Add contract tests between services
- Run smoke tests in staging environments that mirror production
This layered approach catches the kinds of issues that mocks and stubs tend to hide.
Be intentional about what you mock
A simple rule of thumb many teams use now:
- Mock external systems you don’t control (third‑party APIs)
- Prefer in‑process fakes or real implementations for things you do control (your own modules, domain logic)
And when you do mock, focus on behaviors that matter:
- Success and failure paths
- Edge conditions
- Timeouts and retries
Not every internal method call.
FAQ: common questions about examples of unit test failure examples: test doubles
Q: What are some common examples of unit test failure examples: test doubles in modern microservices?
In microservices, common examples include mocked HTTP clients that never time out, fake message brokers without dead‑letter queues, and in‑memory databases that ignore transaction isolation. These doubles make tests pass under conditions that never occur in production, so bugs in retry logic, error handling, and concurrency slip through.
Q: Can you give an example of a good test double versus a bad one?
A good example of a test double might be a fake payment gateway that sometimes declines transactions, sometimes times out, and enforces realistic validation rules. A bad example is a stub that always returns “payment approved” instantly, with no errors, no delays, and no validation. The first one helps you test real behavior; the second just paints everything green.
Q: How do I know if my mocks are too detailed?
If your tests fail every time you refactor internals—even when the public behavior is unchanged—your mocks are probably over‑specified. Another sign is a heavy focus on verifying call counts and order instead of checking outputs, state changes, or user‑visible behavior.
Q: Are there situations where test doubles are a bad idea altogether?
Yes. For core domain logic that doesn’t touch external systems, using real collaborators often leads to clearer, more maintainable tests. For critical integrations (like security, billing, and data persistence), relying only on doubles is risky; you should add integration and end‑to‑end tests with real infrastructure.
Q: How are testing practices around test doubles changing in 2024–2025?
Teams are moving toward:
- Stricter rules on when mocks are allowed
- Wider use of contract testing between services
- More use of containerized infrastructure in CI for realistic integration tests
- Better observability (logs, metrics, traces) to validate behavior in staging and production
These trends directly address the kinds of failures highlighted in the examples of unit test failure examples: test doubles above.
Related Topics
Real-world examples of unit test failure due to race conditions in modern codebases
Best examples of unit test failure examples: incorrect test setup
Real-world examples of unit test failures from outdated libraries
Real-world examples of unit test failure due to improper exception handling
Real-world examples of unit test failures: mocking issues examples for modern codebases
Best examples of unit test failure examples: test doubles in real projects
Explore More Unit Testing Failures
Discover more examples and insights in this category.
View All Unit Testing Failures