Feedback

type to search

Server freezing under heavy cpu / disk / network load

Asked by [ Editor ] , Edited by Davide [ Editor ]

Hello, i would like to ask for help since i am feeling a bit clueless here.

I recently bought a low end dedicated machine that was supposed to host
some services: squid, proftpd and rtorrent.

I installed debian lenny and immediately updated to squeeze and
configured the services. I started rtorrent but after the machine
reaches heavy load ( > 10MBps network traffic, maxed out cpu), it holds
for a while then all network connections drop and i have to order a hard
reset in order to bring it back online.

I thought it was a misconfiguration issue, so i tried reconfiguring the
server and installing ubuntu 10.04 on it, but i’m getting the same results.

I had a look at /var/log/kernel.log and on ubuntu i am seeing some
“Clocksource tsc unstable” messages right before the machine crashes.

I can see the same kind of messages on squeeze aswell, just not that
close to the reboots as they were on ubuntu. Google tells me they might
have something to do with cpu frequency scaling. There’s loads of
reports by users like me who are experiencing random freezes. Seems
though that there is no clear answer: people solved the issue with video
card driver updates, replacing bad hardware, changing the frequency
scaling governor, and so on.

So far i only played around with the frequency scaling governor, setting
it to “performance” seems to freeze the machine quicker than with the
default “ondemand”.

Here are the cpu specs of the machine:
# cat /proc/cpuinfo
processor       : 0
vendorid       : AuthenticAMD
cpu family      : 15
model           : 39
model name      : AMD Athlon™ 64 Processor 3700+
stepping        : 1
cpu MHz         : 2200.000
cache size      : 1024 KB
fdiv
bug        : no
hltbug         : no
f00f
bug        : no
comabug        : no
fpu             : yes
fpu
exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsropt
lm 3dnowext 3dnow up pni lahf
lm
bogomips        : 4398.97
clflush size    : 64
cachealignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling
availablegovernors
conservative userspace powersave ondemand performance

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling
availablefrequencies
1000000 1800000 2000000 2200000

# cat /sys/devices/system/clocksource/clocksource0/available
clocksource
acpi_pm

I asked the datacenter support to perform a hardware check and they
tested the machine for 8 hours without errors.

Now. How can i find out what’s going on with this server? I’m pretty
sure its faulty hardware, but i have no proof to show to the datacenter
support.

I am currently running squeeze 2.6.32-5-686-bigmem. The machine has
1024MB of ram and 2x160Gb Sata HDDs. The NIC is a 100MBit realtek one,
with proper drivers from the firmware-realtek debian package.

I would love to have some opinions on how to deal with this.

NN comments
helmut
-

Slightly OT: 1024MB RAM means that a bigmem kernel is not required.

darkrow
-
yeah i was wondering about that too. I didn’t select bigmem, but i guess it doesn’t really matter, does it?

or Cancel

2 answers

1

helmut [ Editor ]

There probably is no straight forward answer to this question. The first task obviously is to find out what is going wrong. In the cases I experienced so far it helped a lot to install a monitoring solution.

I can recommend munin (packages munin munin-node munin-plugins-extra). It collects data every five minutes and draws nice plots. It comes with plugins for monitoring temperatures, disks (SMART), and different aspects of load. Note that not all plugins are configured automatically.

Another tool that is entitled to solve this task is collectd.

Once you have a monitoring solution installed, you should try to reproduce the crashes and look at the graphs. In most cases you will find an oddity that can serve as a starting point to dig further.

NN comments
darkrow
-

Crashes occur on high cpu/ram/disk load when the network is maxed out. Linear FTP transfer works fine at maxed out bandwith, but the machine can’t take rtorrent at full bandwith. I limited the rtorrent bw usage to 5MBps download total and it hasn’t crashed so far, but this is so weird! I have never seen a machine acting like this :(

helmut
-
Maybe it is related to some bad handling of out of memory situations? Try running your torrent client with ulimit options -d, -s or -m.

darkrow
-
do you have any kind of documentation on the matter? Currently maxmemoryusage is set to 860/1024. What are the other options you are suggesting about?

or Cancel
0

depaloan [ Editor ]

Have you tried to reproduce the issue with another BitTorrent client? There is a command line version of Transmission, if you want to try it.

On my home connection, leaving rtorrent without any download limit makes me impossible to surf the Web (or to connect through SSH).

NN comments
darkrow
-

I will probably try that in a couple of days, thanks for the suggestion.

or Cancel

Your answer

You need to join Debian to complete this action, click here to do so.