Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
SATA: Slow cached reads revisited
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Tue May 15, 2007 1:17 pm    Post subject: SATA: Slow cached reads revisited Reply with quote

As the unfortunate author of this thread, I experience quite low cached reads with my SATA drive:
Code:
$ hdparm -tT /dev/sda
/dev/sda:
 Timing cached reads:   1404 MB in  2.00 seconds = 701.84 MB/sec
 Timing buffered disk reads:  176 MB in  3.00 seconds =  58.57 MB/sec

As I understood the other thread, this didn't have much to do with the actual harddrive, but more with the communication between memory and CPU. Anyway, as you might notice, I do have better performance then the former poster, but I have confirmed with memtest86+ that my two identical PC-3200 memory sticks (with DDR400 chips) indeed run at full speed, i.e. 200 MHz with working dual channel. Shouldn't that give me a memory bandwidth around 3 GB/s which hdparm should report for cached reads? I tend to see people post hdparm results for cached reads at that speed, so I guess I should, or am I mistaking (if so, please enlighten me!)? What could be wrong?
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
Desintegr
l33t
l33t


Joined: 25 Mar 2004
Posts: 863
Location: France - Orléans

PostPosted: Tue May 15, 2007 1:47 pm    Post subject: Reply with quote

I have an AMD64 3000+ with 2x512MB Kingston DDR 400 (dual-channel enabled) and I get the same result like you (~800 MB/s).
My chipset is a NForce3 Ultra. (Gigabyte K8NS-939).

No problem with disk speed (Maxtor SATA) : ~50-55 MB/s

I also try with Ubuntu Feisty (32bits), I got ~800 MB/s too.

I've also found a interesting thread : http://www.mail-archive.com/debian-amd64@lists.debian.org/msg21903.html
_________________
Gentoo ~AMD64
Hoc Volo, Sic Jubeo !
Mon wiki : http://desintegr.free.fr
Back to top
View user's profile Send private message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Tue May 15, 2007 5:03 pm    Post subject: Reply with quote

The result I posted earlier (701.84 MB/sec cached) was measured when I ran a couple harddrive using processes, but without them it's alightly above 800 MB/s. Howvere, I get the exact same results with Debian 4.0. My much weaker laptop, a Dell Latitude X1 with two unpaired memory sticks (256 MB and 1024 MB) and thus no dual channel, gets something similar.

If it's of any importance, I run an Athlon64 3200 (2000 MHz) with two paired PC-3200/DDR400 512 MB memory modules (Corsair value) with working dual channel on an ASUS A8V Deluxe motherboard.
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9883
Location: almost Mile High in the USA

PostPosted: Tue May 15, 2007 6:33 pm    Post subject: Reply with quote

just want to make sure... what version of hdparm and what CFLAGS? can you run the SAME binary of hdparm under both situations?

IIRC there were some hdparm changes in some versions, plus since this is a cpu/ram benchmark, optimization will come into play.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Tue May 15, 2007 8:56 pm    Post subject: Reply with quote

eccerr0r wrote:
just want to make sure... what version of hdparm and what CFLAGS? can you run the SAME binary of hdparm under both situations?

IIRC there were some hdparm changes in some versions, plus since this is a cpu/ram benchmark, optimization will come into play.

I'm using hdparm-6.9 with very sane CFLAGS, namely "-march=k8 -pipe -O2". But are you positive that the compiler optimisations will increase memory throughput by a factor in the range 2 to 5? That would definitely surprise me.

As a side note I read the following in the link that Desintegr provided:
Quote:
The changelog for hdparm v6.9 has:
"fix X2 over-reporting of -T results"

I don't think that effects me since my CPU is single core, but perhaps it can explain things for others that have a similar problem. Or could it be that these 3000+ MB/s reports I've seen are affected, and should be halved? Anyway, dual channel DDR400 should do better than ~800 MB/s, so I still see this as a problem.
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
Desintegr
l33t
l33t


Joined: 25 Mar 2004
Posts: 863
Location: France - Orléans

PostPosted: Tue May 15, 2007 9:02 pm    Post subject: Reply with quote

kristoffer wrote:

As a side note I read the following in the link that Desintegr provided:
Quote:
The changelog for hdparm v6.9 has:
"fix X2 over-reporting of -T results"

I don't think that effects me since my CPU is single core


X2 doesn't mean Dual-Core. It means « times 2 » like in 2x3=6.

You should try another benchmark tools.
_________________
Gentoo ~AMD64
Hoc Volo, Sic Jubeo !
Mon wiki : http://desintegr.free.fr
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9883
Location: almost Mile High in the USA

PostPosted: Tue May 15, 2007 9:19 pm    Post subject: Reply with quote

kristoffer wrote:
I'm using hdparm-6.9 with very sane CFLAGS, namely "-march=k8 -pipe -O2". But are you positive that the compiler optimisations will increase memory throughput by a factor in the range 2 to 5? That would definitely surprise me.

Yes. A poorly compiled loop can have detrimental effects on its spin time. Poor prefetching, whether inserted by compilers explicitly or due to not-so-great memory optimizations by not understanding the architecture can reduce speed greatly.

The best is to check all other things being equal. I just wanted to make sure you didn't run gentoo's hdparm and compared it to ubuntu's hdparm with their respective kernels.

Also wanted to make sure we're not chasing a phantom issue... This is a synthetic benchmark issue, and apparently hasn't been proven a true memory performance issue - if you see real software running half speed, then it needs to be looked at. Does it take twice as long to rotate a large bitmap graphics image (which is a memory intensive operation)?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Tue May 15, 2007 9:39 pm    Post subject: Reply with quote

Desintegr wrote:
X2 doesn't mean Dual-Core. It means « times 2 » like in 2x3=6..

I thought that in this context, X2 referred to X2 as in "AMD Athlon64 X2 Dual-Core" and similar. Are you sure about your interpretation?
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
Desintegr
l33t
l33t


Joined: 25 Mar 2004
Posts: 863
Location: France - Orléans

PostPosted: Tue May 15, 2007 9:44 pm    Post subject: Reply with quote

kristoffer wrote:
Desintegr wrote:
X2 doesn't mean Dual-Core. It means « times 2 » like in 2x3=6..

I thought that in this context, X2 referred to X2 as in "AMD Athlon64 X2 Dual-Core" and similar. Are you sure about your interpretation?


Use hdparm before the patch : you'll get ~1400 MB/s. Try after : you'll get ~700 MB/s.
_________________
Gentoo ~AMD64
Hoc Volo, Sic Jubeo !
Mon wiki : http://desintegr.free.fr
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2588
Location: Near Toronto

PostPosted: Wed May 16, 2007 12:47 pm    Post subject: Reply with quote

This is definitely hdparm - I noticed the drop in -T results
in both a centrino laptop and a amdx2 desktop after
an hdparm upgrade ...

Here's my current results, first for laptop:

Code:
/dev/hda:
 Timing cached reads:   1242 MB in  2.00 seconds = 620.81 MB/sec
 Timing buffered disk reads:  108 MB in  3.02 seconds =  35.79 MB/sec


and for amd:

Code:
/dev/sda:
 Timing cached reads:   1516 MB in  2.00 seconds = 757.79 MB/sec
 Timing buffered disk reads:  192 MB in  3.06 seconds =  62.64 MB/sec


The -T reading used to be twice as fast :(
But I didn't notice any real difference ;)
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Wed May 16, 2007 1:49 pm    Post subject: Reply with quote

eccerr0r wrote:
A poorly compiled loop can have detrimental effects on its spin time. Poor prefetching, whether inserted by compilers explicitly or due to not-so-great memory optimizations by not understanding the architecture can reduce speed greatly.

I know that optimisations matters quite alot (I have in fact written a C compiler so I know about some of the techniques used) but can it improve things by a factor of 2 or more? Also, since gcc is no toy compiler, I'm quite sure its aware of the underlying architecture of amd64. Or maybe you referred to the a situation where I used completely wrong CFLAGS, including wrong -march? Anyways, I tried recompiling hdparm witout -O2 and with -O3 and didn't notice any differences, so CFLAGS doesn't seem to be the issue here.

Desintegr wrote:
Use hdparm before the patch : you'll get ~1400 MB/s. Try after : you'll get ~700 MB/s.

I stand corrected. Seems like a weird "bug" or whatever, though.

Still, there is no explanation why some DDR400 users have 3000+ MB/s results. Even if that was with <=hdparm-6.6 the correct speed should be around 1500 MB/s, which still is twice as fast as most people in this thread. Is that simply due to recent improvements of motherboard memory architecture? I mean, my board is 3 years old, so there's definitely some room for improvement there. Perhaps my board can't utilize the full speed of DDR400? Seems stupid, and I doubt it, but I have no clue otherwise.[/quote]
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9883
Location: almost Mile High in the USA

PostPosted: Wed May 16, 2007 3:16 pm    Post subject: Reply with quote

kristoffer wrote:
I know that optimisations matters quite alot (I have in fact written a C compiler so I know about some of the techniques used) but can it improve things by a factor of 2 or more? Also, since gcc is no toy compiler, I'm quite sure its aware of the underlying architecture of amd64. Or maybe you referred to the a situation where I used completely wrong CFLAGS, including wrong -march? Anyways, I tried recompiling hdparm witout -O2 and with -O3 and didn't notice any differences, so CFLAGS doesn't seem to be the issue here.

The question is if there are two binaries out there, both executable on the target platform but with the "wrong" optimization. How about -Os vs -O2, and -march k8 vs -march nocona. I can't say that these will definitely cause poor optimization and degraded performance but the possibility exists, it cannot be discounted as a possible source. Look at the P4, and how poorly it runs with existing code, but if you tune it for that architecture, it will perform much better. It's even worse for Itanium, gcc versus Intel's compilers, gcc tends to lose. Granted you were likely using the best available options but how should I know? I'm saying this as a generic problem not as a specific solution, and you should do binary for binary comparisons. Always compare versions and compile options.

So it does look like after all you were chasing a phantom problem. Next time use a "real" benchmark such as code you run day-to-day before claiming a problem. Sounds like you are some sort of CS student, you should go and read the hdparm source code and see what it's actually doing. My guess is that it's doing incomplete transfers to the disk controller due to word or transfer size limits and hence not measuring your true memory speed, but I'm too lazy to read the code since I don't really care...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
kristoffer
Tux's lil' helper
Tux's lil' helper


Joined: 05 Oct 2003
Posts: 85

PostPosted: Wed May 16, 2007 5:13 pm    Post subject: Reply with quote

eccerr0r wrote:
The question is if there are two binaries out there, both executable on the target platform but with the "wrong" optimization. How about -Os vs -O2, and -march k8 vs -march nocona. I can't say that these will definitely cause poor optimization and degraded performance but the possibility exists, it cannot be discounted as a possible source. Look at the P4, and how poorly it runs with existing code, but if you tune it for that architecture, it will perform much better. It's even worse for Itanium, gcc versus Intel's compilers, gcc tends to lose. Granted you were likely using the best available options but how should I know? I'm saying this as a generic problem not as a specific solution, and you should do binary for binary comparisons. Always compare versions and compile options.

So it does look like after all you were chasing a phantom problem. Next time use a "real" benchmark such as code you run day-to-day before claiming a problem. Sounds like you are some sort of CS student, you should go and read the hdparm source code and see what it's actually doing. My guess is that it's doing incomplete transfers to the disk controller due to word or transfer size limits and hence not measuring your true memory speed, but I'm too lazy to read the code since I don't really care...

Sure and all, but I'm still of the impression that something is wrong and that it's not entierly hdparm's fault. I'm not interested in an argument or wherever this is headed; I'm simply interested in why some people have such high ratings on that test compared to my rig. Any way, I took a quick look a the source as you suggested and, not surprisingly, it's pretty straight forward:

For -T (cached reads), hdparm allocates 2 MB of shared memory into a buffer, then it reads (with the read() syscall) the first 2 MB of data from the specified device into that buffer which I guess is done in order to put the data in the cache right before the timer starts. After that the timer is started and the above read is looped as fast as possible for 2 seconds. The number of iterations is multiplied with 2 MB and divided with 2 seconds, resulting in the reported speed for cached reads. The time taken for overhead operations like lseek() and the timer is taken into account for that as well.

I'm definitely no pro on these low-level things, but would appreciate if any one could point out any source of errors in that measurement with respect to a modern computer architecture and the Linux kernel's buffer cache for disk reads.

A problem with benchmarks is that they have to be compared to something different. I don't know what to compare my hdparm results to except for other people's results, some which are surprisingly much higher than mine with very similar hardware. Sure, I can also compare them among different versions of software and kernels on my computer, which I have done with no alarming differences (except for the *2 issue with older versions of hdparm which now is sorted out). What I'm interested in is learning whether Linux handles my harddrives efficiently, and unless someone has an idea why hdparm's method of testing it is wrong, I don't know whether I'm chasing a phantom or not. The fact that the theoretical speed of DDR400 is 4 times higher than what hdparm measures that my Linux kernel is making out of it (and what I see other people reporting) is enough to make me skeptical.

I will try to look into some other means for benchmarking which might tell me that my disk caching is working properly and that hdparm does it's thing wrong some how. But still, for all I care other programs I use might utilize similar methods of accessing the same disk areas as hdparm does and thus also suffer from the same penalties. If that's the case, I would be interested in sorting out this hdparm thing despite other benchmarks. In fact, I sometimes feel that my gentoo system isn't as snappy as it should be, especially when it comes to starting new processes, so I though it could be stuff like this that caused it.
_________________
I have to return some videotapes
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum