We have a client that has a standalone ESX 4.0 Server running on an HP ML370 G5 with the redundant fan kit. This server has been in production for about 18 months without any issues.

Recently the Fans in the server would run at 100% for a few seconds and then go down to 20%. This would occur randomly and didn’t seem tied to server activity or utilization. The latest firmware was installed on the server along with all of the latest patches for vSphere 4.0, but the problem still persisted. We called HP Technical Support and they sent out a new motherboard, hoping that this would solve the problem. After the motherboard was replaced the problem still persisted. We had another identical HP ML370 G5 server that wasn’t having any fan problems so we swapped out the power back plane board (the board where the power supplies plug into) to see if the problem followed the back plane board. The problem didn’t follow the back plane board and stayed with the original server. A quick Google Search revealed that often replacing the back plane board fixes the fan problem, but in this case the problem still remained.

As you may know the fan speed and server temperature can by monitored by the Intelligent Lights Out (ILO) board on the ML370 G5. Bringing up the ILO Web Interface and reviewing the CPU temperate when the fans ran at a 100% showed Temp 2 in CPU1, Temp 3 in CPU1, Temp 5 in CPU2 and Temp 6 in CPU2 would jump from 30C to 74C within a few seconds. The ambient temperature would stay constant at 30C. Obviously the CPU temperature cannot increase or decrease by 44C in a few seconds, so the CPUs were reporting the incorrect temperatures.

Armed with this new troubleshooting information, we called HP again and had two new CPUs sent out. Evidently the temperate CPU sensors are an integral part of the CPU. After both CPUs where swapped out the problem was solved! After a week, the fan speed remains at 20% and have not spiked once. If you run into fan problems on an HP server, bring up the ILO interface and review the temperature readings. It could save you some troubleshooting time.
Servers

Get updated on the latest Information Technology news, Cybersecurity, Information Technology Trends, and recent real-world troubleshooting experiences.

SUBSCRIBE NOW!