Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
SATA Drives + Heavy Usage = Unusable System
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Fri Apr 18, 2008 7:30 pm    Post subject: Reply with quote

Quote:
I'm also going to experiment with different I/O schedulers on this drive to see if it helps or hurts. I've been using CFQ.


++

that should fix the most "hangs" and hick-ups :) (I went from deadline -> anticipatory -> cfq) cfq is the best , almost no hangs, also try to disable fair group scheduling (cpu scheduler) :wink:

disabling ncq also might be an idea :idea:
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Sat Apr 19, 2008 1:48 am    Post subject: Reply with quote

Well, CFQ is what I was using when experiencing the problems. Now I'm trying deadline. I tried NCQ enabled and no NCQ and there was no difference.

I had fair group scheduling disabled before, too. I just turned it on during the latest kernel build to see if it helps. Glad to know I'm moving in the wrong direction with all of my changes! hehe.

I'm also running latencytop now to see what it does and if it can tell me anything special when this happens. I'm really not sure what it is telling me or how often it refreshes or anything. It is somewhat thin on documentation. :P
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Sat Apr 19, 2008 1:56 am    Post subject: Reply with quote

using the above mentioned config, I just encountered the symptom again. So CPU & I/O scheduler changes didn't help. Here is what latencytop showed during this time:

During UI freezes with heavy disk activity latencytop said this:
Code:
Cause                                               Maximum          Average
fsync() on a file                                 2474.1 msec        173.0 msec
generic_file_llseek vfs_llseek sys_lseek system_ca2075.3 msec        741.6 msec
Writing buffer to disk (synchronous)              289.5 msec         87.8 msec
Creating block layer request                      100.1 msec         45.0 msec
EXT3 Creating a file                               28.0 msec         28.0 msec
Waiting for TTY input                              23.4 msec          0.3 msec
FCNTL system call                                  19.4 msec         19.4 msec
do_select core_sys_select sys_select system_call_a  5.0 msec          0.3 msec
Waiting for event (poll)                            5.0 msec          0.5 msec


During normal UI functionality with heavy disk activity it said this:
Code:
Cause                                               Maximum          Average
fsync() on a file                                 636.3 msec         70.9 msec
Writing buffer to disk (synchronous)               13.7 msec          4.4 msec
Reading EXT3 block bitmaps                         10.0 msec         10.0 msec
Waiting for event (poll)                            5.0 msec          0.5 msec
Userspace lock contention                           5.0 msec          0.4 msec
do_select core_sys_select sys_select system_call_a  5.0 msec          0.2 msec
do_select compat_core_sys_select compat_sys_select  5.0 msec          1.0 msec
Writing data to TTY                                 1.2 msec          0.2 msec
Waiting for TTY input                               0.8 msec          0.1 msec


So I don't think that really tells us anything we didn't already assume. It is waiting for the disk stuff.
Back to top
View user's profile Send private message
Dairinin
n00b
n00b


Joined: 03 Feb 2008
Posts: 64
Location: MSK, RF

PostPosted: Sat Apr 19, 2008 7:58 am    Post subject: Reply with quote

Gigabyte mobos has a "feature" to switch some ports to legacy mode even when you use ahci in general. It is done to allow windows XP users to setup system without F6 floppy.

On the first page in dmesg output you can see that ahci is not used on all ports. This behavior can be turned off in bios, try to do it and see what happens. In fact, you do not need generic ide with modern Intel chipsets >p965 at all.
Also you most probably do not need pci hotplug, so it's better to turn it off.
Back to top
View user's profile Send private message
Gentree
Watchman
Watchman


Joined: 01 Jul 2003
Posts: 5350
Location: France, Old Europe

PostPosted: Sat Apr 19, 2008 10:03 pm    Post subject: Reply with quote

also have SATA drive on sata-sil (not sata-sil24). Slugs the system bad just having it connected. I use it for backup then unplug it. :evil:
_________________
Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Sun Apr 20, 2008 4:46 am    Post subject: Reply with quote

Also it would be good to note that for me, I have tried my disks on 2 different controllers, both on board one a JMicron the other ata_piix. Both have this problem. Yeah gigabyte have that AHCI "feature" etc. I've played with it and haven't felt any different, however that was on one controller only I'
ll try to the other to see what happens.

Also about NCQ this doesn't affect me because ata_piix has no ncq support.
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Mon Apr 21, 2008 7:48 pm    Post subject: Reply with quote

I turned on AHCI and all disk features to their max and I get better hdparm timings however the lag still exists....
Back to top
View user's profile Send private message
neuron
Advocate
Advocate


Joined: 28 May 2002
Posts: 2371

PostPosted: Tue Apr 22, 2008 11:05 am    Post subject: Reply with quote

subscribing to this.

Could people try latencytop? I'm seing issues with programs doing fsync mostly, causing huge spikes for me. For example pidgin does this far too often, fsync on ext3 sync's everything to disk, not just the file in cache, so any io bottlenecks are made very visible by fsync's.
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 1:00 pm    Post subject: Reply with quote

Alright, last night I migrated everything over to a new drive, and aside from being quieter and faster, the problem still exists. :(

neuron, I posted by latencytop results earlier in the thread, with what sounds like the same results as you. Also, if I type "sync" at a shell prompt during this slowdown, it can sometimes take MINUTES for it to complete sync. Now, part of that could be my huge cache (I have 8 GB of RAM) but the delays happen even on a freshly booted system with hardly any cache in use.

The problem is not necessarily caused by huge amounts of data being written, but rather it seems the frequency of writes. If there are tons of separate tiny reads/writes going on it is worse than one large write.

For me, it is at its worst when I am downloading usenet headers with BNR2, a multi-threaded newsreader. I have 15 connections to news servers all downloading headers at once. On my old computer (slow IDE hard drives, P4 2.8ghz CPU, 1 gig RAM) this was no problem, it went smooth and the system was perfectly usable. On my new system (fast SATA hard drives, C2D E6600 OC to 3ghz, 8 gig RAM) it causes this UI lockup and the actual header updates take LONGER than they did on the older, slower system. Sometimes the wait for the disk is so bad that the connection to the news server disconnects because of idle timeout in the middle of downloading the headers.
Back to top
View user's profile Send private message
Sujao
l33t
l33t


Joined: 25 Sep 2004
Posts: 677
Location: Germany

PostPosted: Tue Apr 22, 2008 4:36 pm    Post subject: Reply with quote

It seems I have the same problem: https://forums.gentoo.org/viewtopic-p-5070001.html#507000

Can we blame the chipset manufacturer for not publishing specifications or is it the kernel peoples fault? In the former case I would like to write a "hate mail" :x
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Tue Apr 22, 2008 4:47 pm    Post subject: Reply with quote

you guys tried tuning the VFS ?

try out hte following for test-purposes :idea:

Code:
echo 66 > /proc/sys/vm/mapped
echo 3000 > /proc/sys/vm/dirty_expire_centisecs
echo 3000  > /proc/sys/vm/dirty_writeback_centisecs
echo 10   > /proc/sys/vm/dirty_background_ratio
echo 95   > /proc/sys/vm/dirty_ratio
echo 100000 > /proc/sys/vm/vfs_cache_pressure

_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 6:36 pm    Post subject: Reply with quote

kernelOfTruth wrote:
you guys tried tuning the VFS ?

try out hte following for test-purposes :idea:

Code:
echo 66 > /proc/sys/vm/mapped
echo 3000 > /proc/sys/vm/dirty_expire_centisecs
echo 3000  > /proc/sys/vm/dirty_writeback_centisecs
echo 10   > /proc/sys/vm/dirty_background_ratio
echo 95   > /proc/sys/vm/dirty_ratio
echo 100000 > /proc/sys/vm/vfs_cache_pressure


I will try those when I get home from work. My current settings:

Code:
/proc/sys/vm/mapped - does not exist
/proc/sys/vm/dirty_expire_centisecs - 2999
/proc/sys/vm/dirty_writeback_centisecs - 1499
/proc/sys/vm/dirty_background_ratio - 5
/proc/sys/vm/dirty_ratio - 10
/proc/sys/vm/vfs_cache_pressure - 100
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 6:47 pm    Post subject: Reply with quote

For some reason for every program the only thing that comes under latency top even in heavy use is waiting for CPU.... what kernel config do you need for it?

Also those VFS options slowed down desktop use alot.
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 7:54 pm    Post subject: Reply with quote

_pi wrote:
For some reason for every program the only thing that comes under latency top even in heavy use is waiting for CPU.... what kernel config do you need for it?

Also those VFS options slowed down desktop use alot.


If you enabled it in kernel config (it is a new option in 2.6.25 kernel), it still does not enable itself at runtime. You have to turn it on with:

Code:
sudo sysctl -w kernel.latencytop=1


Then latencytop should show all the info.
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 8:13 pm    Post subject: Reply with quote

Yeah I did that I didn't know you had to set that value. Also I urge you to try Vanilla kernel it seems that the gentoo patch set slows things down visibly, at least for me it does.
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 8:15 pm    Post subject: Reply with quote

Code:
Cause                                               Maximum          Average
fsync() on a file                                 130.8 msec         36.6 msec
Writing buffer to disk (synchronous)               87.1 msec          5.1 msec
Creating block layer request                       52.6 msec          5.9 msec
Writing a page to disk                             29.7 msec          3.2 msec
Page fault                                          8.9 msec          4.9 msec
Waiting for event (poll)                            5.0 msec          0.4 msec
do_select core_sys_select sys_select sysenter_past  5.0 msec          0.7 msec
sys_epoll_wait sysenter_past_esp                    5.0 msec          1.6 msec
Userspace lock contention                           4.9 msec          1.0 msec

Something is definitely wrong here. This is "normal load"
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 8:21 pm    Post subject: Reply with quote

doing some googling, it seems this is a known problem with the combination of sata + ext3 + fsync. Apparently it has something to do with the way ext3 implements fsync, and may not be a problem with a different filesystem (or maybe it is a problem with all journaling filesystems?). Due to the size of the disks, I simply cannot afford the days a fsck could take on a non-journaled FS.

I might try to rotate my disks again tonight and switch from ext3 to reiser or xfs to see if it helps.

We are all (who are having this problem) using ext3 here, right?
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 8:32 pm    Post subject: Reply with quote

I am using ext3, could I get links, I have a massive ammount of data, so I'd wanna look at any possible workarounds.
Back to top
View user's profile Send private message
kernelOfTruth
Watchman
Watchman


Joined: 20 Dec 2005
Posts: 6111
Location: Vienna, Austria; Germany; hello world :)

PostPosted: Tue Apr 22, 2008 8:49 pm    Post subject: Reply with quote

with which mount-option do you guys mount your ext3-partitions ?

default(s) ?

you could try out commit=120, data=writeback or something like that if improves anything (which of course isn't a solution for 24/7 usage for data safety's sake)

here some stuff for reading:

lkml.org fsync ext3
_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004 :D
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 8:51 pm    Post subject: Reply with quote

/dev/sda1 / ext3 defaults,atime 0 1
/dev/sda2 /home/ ext3 defaults,atime 0 1
/dev/sda3 /boot ext2 defaults,atime 1 1
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 8:57 pm    Post subject: Reply with quote

I just had an idea. I can mount ext3 as ext2. That would be a fast way to test to see if it is a problem with ext3's journaling without actually reformatting. :) I 'll try it in 3 hours when I'm home. :)

Here's my relevant portion of fstab:
Code:
/dev/sda1        /boot   ext2    defaults                1 2
/dev/sda5        /       ext3    defaults,commit=300,noatime,nodiratime          0 1
/dev/sda6       /home   ext3    defaults,commit=300,noatime,nodiratime
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 9:01 pm    Post subject: Reply with quote

The fstab options kernelofTruth gave me worked quite well with what I have. Everything writes perfectly, I can unrar things in the background, etc without lag on the desktop. Latencytop never shows fsync exceeding 200 However not all of it is gone.
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 9:13 pm    Post subject: Reply with quote

I think that lends more credibility to the theory that the ext3 journaling is at the center of the problem. The option "data=writeback", from what I understand, causes the journaling and data to get written out of order, which makes things quite fast, but can be a nightmare if your system crashes because the data and the journal may not match, resulting in files containing the wrong contents and things like that. (which somewhat defeats the purpose of journaling from a data-integrity standpoint). I guess that's why KoT said it's not a solution for safety's sake. :)
Back to top
View user's profile Send private message
paulbiz
Guru
Guru


Joined: 01 Feb 2004
Posts: 508
Location: St. Louis, Missouri, USA

PostPosted: Tue Apr 22, 2008 9:18 pm    Post subject: Reply with quote

_pi wrote:
I am using ext3, could I get links, I have a massive ammount of data, so I'd wanna look at any possible workarounds.


Here is a page with discussion by Linus himself regarding what appears to be this same problem:

http://kerneltrap.org/node/14148

It seems he is not a fan of ext3. I quote: "I hate hate hate it. It's totally unusable, imnsho." :P
Back to top
View user's profile Send private message
_pi
n00b
n00b


Joined: 06 Apr 2007
Posts: 23

PostPosted: Tue Apr 22, 2008 9:27 pm    Post subject: Reply with quote

You're right. I took off data=writeback, even with noatime and nodiratime it pretty much goes back to the way it was. >.>
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum