Chaos Engineering

Chaos Engineering

February 10, 2024 | 1 Minute Read

Observing a distributed system’s behavior in a controlled experiment is called chaos engineering.

Netflix Chaos Engineering

Netflix created a tool to randomly shut down servers and called it chaos monkey:

Read information about server nodes via the continuous delivery platform
It interacts via the continuous delivery platform to kill server nodes

Then they check if traffic gets routed to another server without affecting users.

how they do chaos engineering:

Create a hypothesis about how the system will behave during a failure
Run a small test to introduce failure - switch off the server or change the network configuration
Observe the system’s behavior & measure the failure impact
Automate fix for the problem.

Benefits

Run Canaries on production environment by introducing failures
Evaluate impact & frequency - server crash, latency, response time
Minimize blast radius that helps to avoid real failures.