View previous topic :: View next topic |
Author |
Message |
tallica Apprentice
Joined: 27 Jul 2007 Posts: 152 Location: Lublin, POL
|
Posted: Tue Nov 23, 2010 8:52 am Post subject: |
|
|
Looks like there are some problems with patch v4: http://lkml.org/lkml/2010/11/21/41 _________________ Gentoo ~AMD64 | Audacious |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
pste Tux's lil' helper
Joined: 14 Dec 2004 Posts: 106
|
Posted: Tue Mar 01, 2011 6:44 am Post subject: What about 64-bit kernels and I/O... |
|
|
I think this is a bug, an inconsistency, an unfortunate system design or something similar that has been around for quite a while, which makes me very annoyed, not least because it pokes a hole in the hallmark of linux - speed and stability! - and it seems never to get fixed!?!? Unfortunately I'm in no position (not timewise, nor knowledgewise) to pursue this problem myself, which leaves me with no other option than to provide background, some additional information and experiences and hope that someone clever can get something out of it and accept the challenge!
Quick summary:
Probably since kernel 2.6.18 (googling seems to converge on this kernel) there's been some kind of problem with I/O (disk I/O, ...??) on 64-bit kernels, resulting in high cpu load and a lagging or even hanging system. It seems like the problem begins when I/O reaches a certain amount, like copying many large files (taking a backup...) or doing many things concurrently, like copying files while rsync'ing through a vpn tunnel (high cpu load in general). It seems like it has something to do with the I/O scheduler, but I experience problems even if I use the "simplest" deadline scheduler. To me, zen-sources also works slightly better that gentoo-sources.
It's not hardware problems, because it works fine with windows (which is kind of extra annoying, isn't it?), and my impression (haven't tested thoroughly though - but my home servers, both with gentoo-sources-32bit with several usb drives on usb hubs seems to work flawlessly) is that there's no problem with 32-bit kernels. The external usb drives I'm using are of different brands and models, all have the same behavior. I've tried -a lot- of different kernel settings, including many minimal ones, but I cannot find a pattern.
What happens?
Personally I think it's connected with usb and usb-harddrives (it's at least a failsafe way to make the problem show!). Every time I make a backup, copying goes at 20MB/s plus for a while, but then it starts to slow down. I cannot say exactly when, but either when copied size reaches - say 8GB (just a guess - sometimes more), or when the file count is big! (no number...). Then transfer speed falls down to a few MB/s (2-5MB/s perhaps), system start to lag and cpu is 100%. Often (not always) I get something about "reset usb-device" in the log, sometimes "I/O-error on device" forcing me to shut-down drives and computer, start over, and fsck. Occasionally I get a complete system lock-up (my comp. freezes and all leds on keyb shine).
Making backups between two usb-drives on the same usb-hub, seems to create most problems, including total system hang-ups
Hopefully, someone that thinks this is something that is required to be fixed can make someting out of this start digging - I'm cheering loudly in that case!
Good luck!
/pste |
|
Back to top |
|
|
idella4 Retired Dev
Joined: 09 Jun 2006 Posts: 1600 Location: Australia, Perth
|
Posted: Tue Mar 01, 2011 9:48 am Post subject: |
|
|
pste,
very general. use top, iotop, and setup the conditions that it occurs. You can at least capture some snapshots of the system state with the tops. Then dmesg content. Need some sort of baseline. _________________ idella4@aus |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Tue Mar 01, 2011 9:59 am Post subject: Re: What about 64-bit kernels and I/O... |
|
|
pste wrote: | Making backups between two usb-drives on the same usb-hub, seems to create most problems |
That's a total bottleneck on the hardware side though. Even in ideal conditions you shouldn't see more than 10MB/s transfer speeds for usb to usb especially with a hub involved. You're talking to both disks on a line that can ideally transfer 40MB/s, that means 20MB/s for each drive, then in comes the protocol overhead, filesystem overhead, hub overhead, context switches overhead, and you end up with extremely slow speed as is typical for USB... Add unreliable hardware to that (such as an overheating usb hub) and you're in data corruption land...
Performance issues in the kernel, it's a possibility, happens all the time, but if you also get strange stuff in dmesg, it's much more likely that your hardware is the culprit somehow |
|
Back to top |
|
|
ppurka Advocate
Joined: 26 Dec 2004 Posts: 3256
|
|
Back to top |
|
|
tomk Bodhisattva
Joined: 23 Sep 2003 Posts: 7221 Location: Sat in front of my computer
|
|
Back to top |
|
|
pste Tux's lil' helper
Joined: 14 Dec 2004 Posts: 106
|
Posted: Tue Mar 01, 2011 11:17 am Post subject: |
|
|
ppurka - yes, that (this) thread is one of the sources that made me say that this is a since long problem! Google gives you more...
frostschutz - yes I know that the setup with usb-drives is a hardware bottleneck, but I do believe that it should not mean anything else than that the backup takes a long time, it should -not- make the entire system lag, or hang! I do think this is caused by some kind of race condition that occur in 64-bit kernels... And, NO! it's not the hardware, I wrote above that the same setup works fine in windows and (similar setups) with 32-bit kernels! - but yes, hardware related, meaning (kernel) driver problems, perhaps? I do agree that overheating is a possible explanation for the I/O-error situations, but I find it strange that this differ between OS:es (or kernel types). Furthermore, the problem does also occur without the hub (e.g. copying from system drive to usb drive or between usb drives on different usb ports of the comp.), I stated the example because my impression is that it's the quickest way to create problems...
idella4 - sure, I'll try to capture something, although not today... (I need to recompile the kernel with a few new flags - the iotop emerge told me, but I need my comp running a while longer...) But a problem is that for the worst case (the most interesting one) I must try to create one of these total lock-ups and then hand-copy (or photograph) the tops and dmesg screens precisely because the system is frozen, and it feels a little risky to recover the filesystem(s) everytime it hangs... A concrete example (for anyone to try): try starting a rsync -avh --progress /home /media/your-usb-drive/ (or similar) and wait (of course, /home must be many GB large!). I'm doing precisely this at the moment! For me this command keeps showing about 15-20MB/s for every file. But after a while the system gets lagged (rsync keeps running at the same speed though), then if I for instance try starting a movie in vlc (having to read a big file from the harddrive) - this is (naturally) really slow, but sometimes the movie hangs, and closing vlc takes about 5 minutes! When sync is finished, system is back to normal responsivess...
Thanks for the response!
/pste |
|
Back to top |
|
|
joeklow n00b
Joined: 23 Jan 2011 Posts: 46
|
Posted: Tue Mar 01, 2011 6:29 pm Post subject: |
|
|
Reporting 2.6.36 ck-sources running at multicore Phenom II.
Recompiled this kernel, changing deprecated SATA support ("ATA/ATAPI support") over to new (serial ATA/PATA drivers).
I/O scheduler: BFQ (was CFQ)
Profile: Desktop (was Server)
CPU scheduler: CFS+autogroups
Timer: 1000 (was 200)
Also, /etc/init.d/local.start has the following to disable cache (stupid XFS loves to flush data once in hour, and it would be stupid to let the flushed data stay in cache).
Quote: |
hdparm -W0 /dev/sda
|
Now can emerge -u world at host, in virtual machine and run Windows virtual machine simultaneously, and the remaining resources are sufficient to have a far better response (can surf/code).
Without those tricks was unable to do anything while merging something, and system was almost unresponsible while emerge --sync'ing. |
|
Back to top |
|
|
Yamakuzure Advocate
Joined: 21 Jun 2006 Posts: 2305 Location: Adendorf, Germany
|
Posted: Wed Mar 02, 2011 4:32 pm Post subject: |
|
|
Huh? I haven't had any lag since gentoo-sources-2.6.36-rsomething and with gentoo-sources-2.6.37 (okay, with cgroups hack) I have no lag even if I do a huge parallel merge (load between 25 and 40 on an i7 Dualcore laptop with HT) and have VMWare with WindowsXP open.
Is it just this cgroups stuff? I am basically using what is described here: https://forums.gentoo.org/viewtopic-t-852922.html
(And no, I do not have any problems with Amarok, DragonPlayer or any other multi media stuff) _________________ Edited 220,176 times by Yamakuzure |
|
Back to top |
|
|
devsk Advocate
Joined: 24 Oct 2003 Posts: 3003 Location: Bay Area, CA
|
Posted: Tue Mar 29, 2011 7:19 am Post subject: |
|
|
2.6.38 with AUTOGROUP helps a lot with this issue. |
|
Back to top |
|
|
devsk Advocate
Joined: 24 Oct 2003 Posts: 3003 Location: Bay Area, CA
|
Posted: Tue Apr 19, 2011 6:34 am Post subject: |
|
|
Any news on this front? Does AUTOGROUP help people with this issue? Or this is a non-issue now? |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
TimeManx n00b
Joined: 11 Jul 2011 Posts: 55
|
Posted: Sun Jan 01, 2012 11:02 am Post subject: |
|
|
I've configured 3.1.6 with autogroup, cfq, zram (128 MB as swap), zcache, transparent huge pages (madvise), memory compaction, preemptile kernel on a system with 2 GB of RAM. The system is quite responsive in the first half hour after boot but the performance keeps deteriorating.
Also, copying large amounts of data from one drive to another is sporadic and the drives become inaccesible during that time which causes dolphin to freeze. |
|
Back to top |
|
|
TinheadNed Guru
Joined: 05 Apr 2003 Posts: 339 Location: Farnborough, UK
|
|
Back to top |
|
|
Holysword l33t
Joined: 19 Nov 2006 Posts: 946 Location: Greece
|
Posted: Fri Feb 17, 2012 6:05 pm Post subject: |
|
|
Is this problem still up? I'm suffering from unresponsiveness very often; even stupid facebook flash games can bring my i7 with 4GB down - and let's say, I've got an infinite swap.
I've tried to blame the kernel (was zen-sources-something, can't remember but it doesn't have more than 2 weeks that I've updated it), but gentoo-sources also freezes/slows down. Was using BFQ+BFS, and now I'm with CFQ+CFS; same. SLUB to SLAB? Same. Remarking that BFQ is incompatible with cgroups, one would see that I was not using cgroups initially. Now I am using cgroups+autogroups. Nothing. My main system lies in a ReiserFS3 partition, not ext4 though.
I wouldn't claim this is annoying; this is being dangerous for me, since at least once a day my system crashes hopelessly and I have to hard-reboot it. Have lost a couple of files so far. I was considering a depclean + emerge -e world but now I am wondering if this is worth to try. 4 months ago I didn't have this problem (was using zen-sources back then) and some of you seem to have been having this for years... _________________ "Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach) |
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3526
|
Posted: Fri Feb 17, 2012 6:38 pm Post subject: |
|
|
Are you running /home from a network drive?
My performance problems were from /home being mounted on nfsv4, and were related to firefox and its sqlite sync() behavior. A year or two back I moved .mozilla and .thunderbird to local disk, then symlinked the nfs-mounted .mozilla and .thunderbird directories to the local ones. Problem gone.
Some time after moving that system to 3.2.x I saw the notice of improved responsiveness, and tried moving .mozilla back to nfs. My performance problems came back, though they didn't seem quite as bad. The other night I moved .mozilla back to local disk.
Other than that, I'm happy and even with that I wasn't having problems with crashing. Have you tried memtest86+? _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Holysword l33t
Joined: 19 Nov 2006 Posts: 946 Location: Greece
|
Posted: Fri Feb 17, 2012 6:55 pm Post subject: |
|
|
depontius wrote: | Are you running /home from a network drive?
My performance problems were from /home being mounted on nfsv4, and were related to firefox and its sqlite sync() behavior. A year or two back I moved .mozilla and .thunderbird to local disk, then symlinked the nfs-mounted .mozilla and .thunderbird directories to the local ones. Problem gone.
Some time after moving that system to 3.2.x I saw the notice of improved responsiveness, and tried moving .mozilla back to nfs. My performance problems came back, though they didn't seem quite as bad. The other night I moved .mozilla back to local disk.
Other than that, I'm happy and even with that I wasn't having problems with crashing. Have you tried memtest86+? |
No, everything is local. I also use chrome, not firefox, but have tested a few times with firefox, it seems to have the same behaviour. _________________ "Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach) |
|
Back to top |
|
|
Holysword l33t
Joined: 19 Nov 2006 Posts: 946 Location: Greece
|
Posted: Fri Feb 24, 2012 2:16 pm Post subject: |
|
|
Okay folks... it seems it was both my kernel (it was zen-sources with BFS+BFQ), as seen in this thread, and a memory leak problem with my wm (as seen here). _________________ "Nolite arbitrari quia venerim mittere pacem in terram non veni pacem mittere sed gladium" (Yeshua Ha Mashiach) |
|
Back to top |
|
|
xman1 n00b
Joined: 11 Apr 2004 Posts: 58
|
Posted: Thu Apr 26, 2012 4:14 pm Post subject: |
|
|
Has this been solved yet? I had these same issues and it turned out my Western Digital hard drive has a bug with APM. Pop into PM-utils default config and set APM to 255 to disable it and all works well now.
You can also do this with hdparm:
Code: | hdparm -B 255 /dev/sda |
Maybe this will help someone as the pauses are quite annoying.
-X
PS. I forgot to mention the pauses were affecting things system wide. The whole system would wait on the APM bug. Thanks WD. |
|
Back to top |
|
|
smlbstcbr n00b
Joined: 08 Apr 2006 Posts: 51
|
Posted: Wed Jun 27, 2012 4:52 pm Post subject: |
|
|
Bump. I still have those issues in 3.3.8-gentoo. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Wed Jul 04, 2012 10:37 am Post subject: |
|
|
in total we're trading in some throughput for interactivity & responsiveness:
you guys having the problems could give following tweaks a try:
Code: | echo cfq > /sys/block/sda/queue/scheduler
echo 10000 > /sys/block/sda/queue/iosched/fifo_expire_async
echo 250 > /sys/block/sda/queue/iosched/fifo_expire_sync
echo 80 > /sys/block/sda/queue/iosched/slice_async
echo 1 > /sys/block/sda/queue/iosched/low_latency
echo 6 > /sys/block/sda/queue/iosched/quantum
echo 5 > /sys/block/sda/queue/iosched/slice_async_rq
echo 3 > /sys/block/sda/queue/iosched/slice_idle
echo 100 > /sys/block/sda/queue/iosched/slice_sync
hdparm -q -M 254 /dev/sda |
(source: http://unix.stackexchange.com/questions/30286/can-i-configure-my-linux-system-for-more-aggressive-file-system-caching)
I'm currently using all except the last one
in addition I'm using ck2 patchset for 3.4* kernel, patched BFS cpu scheduler up to version 424
and added Chen's O(1) tweak:
http://pastebin.com/ixw9PXAw
(thread: http://phoronix.com/forums/showthread.php?71658-RIFS-ES-Linux-Kernel-Scheduler-Released/page7 )
this helps A LOT
edit:
some additional stuff
when your system uses swap heavily raise page-cluster:
Code: | echo "12" > /proc/sys/vm/page-cluster |
or
Code: | echo "10" > /proc/sys/vm/page-cluster |
helps with interactivity issues for me
keep swapping low if possible:
Code: | echo "15" > /proc/sys/vm/swappiness |
Con Kolivas afaik recommends 10
Code: | echo "10" > /proc/sys/vm/swappiness |
keep
dirty_background_ratio and dirty_ratio low
Code: | echo "5" > /proc/sys/vm/dirty_background_ratio |
and
Code: | echo "9" > /proc/sys/vm/dirty_ratio |
also make sure that pdflush/bdflush don't write out stuff too seldom
Code: | echo "300" > /proc/sys/vm/dirty_writeback_centisecs |
300 (3 seconds) should be the default, afaik powertop and other tools recommend 1500 (15 seconds)
edit:
added some settings I'm currently playing around with
edit2:
set
Code: | echo "300" > /proc/sys/vm/dirty_writeback_centisecs |
instead of 500 that seems to improve stalls _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004
Last edited by kernelOfTruth on Sat Jul 07, 2012 11:40 am; edited 1 time in total |
|
Back to top |
|
|
smlbstcbr n00b
Joined: 08 Apr 2006 Posts: 51
|
Posted: Sat Jul 07, 2012 2:09 am Post subject: |
|
|
I'll see how that works in my machine. How unfortunate to have such issues in the Gentoo Kernel. It seems to me that it has slowed since the change to 3.XX kernels. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
|
Back to top |
|
|
smlbstcbr n00b
Joined: 08 Apr 2006 Posts: 51
|
Posted: Sun Jul 08, 2012 3:24 pm Post subject: |
|
|
Well, I'm trying your solution (thank you for posting them). There's a slight improvement. Not as smooth as it used to be.
EDIT: I have been using a value of 200 for the last parameter and my system has improved significantly, though there's still some lag when swapping windows or opening some documents. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|