System is very slow since a few month (complex setup, RAID)

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

Hello.

I installed the machine in 2010. Was fast and good.

In 2014 I changed the disks to get more space. I move files, and data, without reinstalling any software. But, I changed one detail: the HOME partition was changed from EXT4 to ZFS. I now have 5 HDD.

Since that disk change, the system got a bit slower. But it is now significantly slower than before. And I don't understand why.

Long story short: Gentoo is on RAID6 over 5 disks. HOME is on ZFS raidz2 over 5. BIG (say, /mnt/tmp) is on a more complex setup; but in short, it's also raid6 over 5 disks.

1: all apps are slower and slower. But ... deadly slow. Rox-filer used to be able to open a folder with 200 pictures in 3 or 5s. Now, it needs about 1s to generate each preview for each picture. That's about 3mn for a 200 pics folder, while it used to be below 10s.

2: some apps freese for long time; E17 shows blinking red decorations

3: some times, the whole X will freese (mouse wont move) for 20s, or up to 8mn.

When things are slow, or frozen, the load increases hugely (from 0.6, in average, to 2, 5, or

. But ... the CPU remains usually 75, or even 92% iddle. And the HDD led blinks very slowly. Like 3 dots per seconde.

Keruskerfuerst · Posted: Fri Dec 04, 2015 7:28 pm Post subject:

Detailed hardwareinfo?

eccerr0r · Posted: Fri Dec 04, 2015 7:39 pm Post subject:

And what kernel version?
Why can't you compile the needed code into your kernel for iotop? It should be there in recent kernels...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

eccerr0r · Posted: Fri Dec 04, 2015 8:56 pm Post subject:

Only guess now is fragmentation, if your disk has a lot of turnover from bittorrenting or something...?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

TigerJr · Guru Joined: 19 Jun 2007 Posts: 540

If you have xen kernel, than you can have xen guests, many guests - many iops. why you can't debug io rates with snmp or other monitoring systems? Or even with sarg and shell script mrtg or gnuplot with simple webserver to understand what is making high IO rate or latency bottleneck or raid had degraded due to disk fail?
_________________
Do not use gentoo, it die

eccerr0r · Posted: Sat Dec 05, 2015 1:07 am Post subject:

Keruskerfuerst · Posted: Sat Dec 05, 2015 8:06 am Post subject:

ZFS has a fragmentation program.

Detailed hardware info means:
CPU(s)
Mainboard
RAM; type
Graphics card
Harddisk controller
Harddisks (HDD or SSD)

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

TigerJr · Guru Joined: 19 Jun 2007 Posts: 540

mdadm? you didn't use hardware raid???
_________________
Do not use gentoo, it die

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

I had an idea.

There is one thing that grows up with time: my Gmail box. Each email sent is copied into SentFolder on local, and, due to SMTP relay being gmail, it's also in Sent folder on TWO Gmail accounts. I am cleaning the online account for the box used for outgoing messages (the account configured for the SMTP relay). The account (on local disk, for cache) was 18G. And is now 5G. But there are still surprises. After heavy cleaning, thunderbird tells me the AllMail folder has a "size on disk 65MB", but, via terminal, after compacting ... it's rather doing ... 2.5G.

I have yet to understand why Thunderbird things that 2.5GB is 65MB ...

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

Took 315 seconds to flush bin that contained 132,862 messages ... from Gmail Webmail.

Even the largest computer in the world was in trouble to flush my account. No wonder my small desktop was slow managing it

Never saw Google spend more than 5s on any request.

One account left ... but don't know how to clean it. I have too many messages, in too many folders to make a copy of it.
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

schorsch_76 · Guru Joined: 19 Jun 2012 Posts: 452

It doesnt help to cry. If you want to do something, do it.

Take a full backup and keep your current kernel as a fallback. Then try a newer kernel and try to improve. If you cant do a full backup, then try to backup the system and keep your data save.

It is always a bad sign, if the one who installed/build something doesnt want to touch it, because it could break.

_________________
// valid again: I forgot about the git access. Now 1.2GB big. Start: 2015-06-25
git daily portage tree
Web: https://github.com/schorsch1976/portage
git clone https://github.com/schorsch1976/portage

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

TigerJr · Guru Joined: 19 Jun 2007 Posts: 540

Keruskerfuerst · Posted: Sat Dec 05, 2015 4:14 pm Post subject:

1. You should update the bios of the mainboard. Here: http://www.gigabyte.com/products/product-page.aspx?pid=3154#bios
2. I only see one HDD in you setup.
3. ZFS does fragment and there is a utility to defragment the partitions.

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

schorsch_76 · Guru Joined: 19 Jun 2012 Posts: 452

How about perf? [1]

[1] https://perf.wiki.kernel.org/index.php/Main_Page

Do you at least have the souce and config of your kernel? With that you could include the functions for iotop.

Without iotop it is difficult to trace io troubles. You dont even know when the hdd runs ... (no LED).
_________________
// valid again: I forgot about the git access. Now 1.2GB big. Start: 2015-06-25
git daily portage tree
Web: https://github.com/schorsch1976/portage
git clone https://github.com/schorsch1976/portage

TigerJr · Guru Joined: 19 Jun 2007 Posts: 540

For analyze we need what you need - information about running you system. But information must be helpful. You say that all information you monitor doesn't helpful, now i don't know what information to request. IOPS monitor than error appears, CPU load, memory load, /proc/mdstat - but you say there is nothing useful

MRTG -not so heavy... (otherwise prtg, cacti, zabbix, watsup and even gnuplot) and can be used via crontab script for each graphs, quite easy - it generates html pages with png images those you monitor via http webserver. You can use all the information you need with mrtg(disk load, network load, cpu load, iops, processes, memory, swap). If server have faced with DoS problem you can understand than error appears and what indications was before the problem appears and even what was source of DoS. That is good for diagnosis. Analyzing only LED gives you small amount of information and haven't LED history to understand what LED rates was hour ago or past day.

Did you check your disks for bad blocks?
_________________
Do not use gentoo, it die

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

schorsch_76 · Guru Joined: 19 Jun 2012 Posts: 452

You need to set the kernel options according to [1]

CONFIG_TASKSTATS
CONFIG_TASK_DELAY_ACCT
CONFIG_TASK_IO_ACCOUNTING

[1] http://linux.die.net/man/1/iotop

These changes would require to rebuild your kernel and your out of tree kernel modules (maybe ZFS). To get a backup net, backup your kernel, initrd and the modules Folder.

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

schorsch_76 · Guru Joined: 19 Jun 2012 Posts: 452

You see, when i do something like this,