Sometimes I think about how there are computer scientists that worry about proving correctness.
And then there are two of our dev teams that have just accepted that their applications crash from memory leaks 33 and 63 times a week respectively.

There's no way this isn't impacting customers, but apparently not enough that they complain and that's why there's no pressure to fix it.

I created an elaborate alarm to even catch this. And every single time it happens there's an alert in our alerting system for a full hour. Everyone knows these applications crash _all the time_.

But check out this cool graph it gives me

Every dot on the baseline 2 is an application instance getting oom killed. And then we have those dots up around 7.. Imagine crashing 6 times in one hour.

I'm pretty sure all the dots are concentrated on the last 2 days because I force restarted all the application at the same time at the start of this week.

@rune my graphs only look like that when I have a lot of deployments and the temporary containers I have yet to work out how to filter from my data fill up the graph with dots.

