Categories
linux

Entropy and Monitoring Systems

Update 2 March 2022:

Since originally writing this post in 2012, I have come to the realisation that a fascination with entropy availability is not helpful, and that my own understanding of entropy and how it relates to /dev/random was flawed.

See Myths about urandom which helped me understand how /dev/random actually works, and a more recent update on LWN about improvements to the Linux random-number devices

 

The discussion below about how executing lots of processes can cause a reduction in “available entropy” and thus lead to things stalling is only valid for older linux kernels it seems

I use munin for monitoring various aspects of my servers, and one of the things munin will monitor for me the amount of entropy available. On both my current server and my previous one I’ve noticed something unusual here:

According to munin, I’m almost perpetually running out of entropy. Munin monitors the available entropy by chekcing the value of /proc/sys/kernel/random/entropy_avail, which is the standard way you’d check it. My machine has several VMs running, and hosts a few services that use entropy at various times (imaps, ssmtp or smtp+tls, ssh, https), so it’s not unreasonable that I may have been entropy starved. If my entropy levels are always around the 160 mark, it’s likely that at any given time I’m totally starved of entropy, so anything using encryption will stall a bit.

I had a brief look into various entropy sources, such as timer_entropyd or haveged, but none of them seemed to help. I’d seen several references to Simtec’s entropykey, which looked very promising, so I ordered one from the UK, which arrived a week or so ago.

I’ve yet to arrange a trip to the datacentre to install it however, and after a bit of poking round today I’m not so sure it’s as desperately needed as I thought

I randomly checked on the contents of /proc/sys/kernel/random/entropy_avail, just to see what it was like. There were over 3000 bits of entropy present. Very odd. I repeated this several times, and watched the available entropy decrease from over 3000 down to around 150 or so, the same as in my munin graph above. I repeated this about a quarter of an hour later, with the same results – over 3000 entropy, rapidly decreasing to very little.

After a bit of further digging, I found this blog post, which mentioned that creating a process uses a small amount of entropy. The author of that post was seeing problems with his entropy pool not staying full, which sounds like what I was seeing. I’m still not clear on what requires entropy though, as some of my systems at work clearly don’t deplete the entropy pool during process creation.

So, I did some different monitoring: Check the value of entropy_avail every minute, through a different script. The graph below shows the results:

Clearly, entropy is normally very good, but is dropping down to very low levels every 5 minutes. It replenishes just fine in the intervening 5 minutes however, which suggests that I don’t really have a problem with entropy creation, just with using it too quickly.

As for the question, “why is my entropy running out so fast?”, the answer is quite simple: Munin. On my host machine, munin runs around 50 plugins, each of which generally calls other processes such as grep, awk, sed, tr, etc. I don’t have exact figures on how many processes were being kicked off every 5 minutes, but I wouldn’t be surprised to find it was hundreds, all of which used a little bit of entropy

I’ll still install the EntropyKey, and maybe it’ll help my pool recover quicker.