View previous topic :: View next topic |
Author |
Message |
yottabit Guru
Joined: 11 Nov 2002 Posts: 313 Location: Columbus, Ohio, US
|
Posted: Thu Mar 10, 2005 7:29 am Post subject: Prohibit Kernel File Caching? -- Legit Question, REALLY!! |
|
|
First off: no yelling at me for asking the question. I have my purposes.
Is there a way (perhaps through /proc or with mount options), without hacking the kernel, to stop the kernel from caching reads & writes? Ideally I'd be able to specify on a mount-by-mount basis...
(Because some will want to know: the kernel is honestly slowing down disk reads to a point where it's causing degradation of performance. The kernel seems to place a higher priority on reading all of the file possible into memory before actually serving the data to the requestor... It's crazy!)
Anyone? Bueller? Bueller? _________________ Play The Hitchhiker's Guide to the Galaxy! |
|
Back to top |
|
|
adaptr Watchman
Joined: 06 Oct 2002 Posts: 6730 Location: Rotterdam, Netherlands
|
Posted: Thu Mar 10, 2005 10:39 am Post subject: |
|
|
If your kernel is causing disk performance degradation - and the CPU is not a 486 or something that slow - then the problem is really with your kernel, not with Linux...
Or the disk controller is dodgy, or simply broken.
What kind of throughput do you get from the drive(s) ?
EDIT: for one, your statement simply makes no practical sense - the kernel does not cache "anything it can" from a file - it caches blocks, whose size is determined by the maximum amount of data that can be transferred in one operation.
It is not possible to read faster by reading less, so reading one byte will not be any faster than reading the maximum block size (usually 128K for modern drives, 256 sectors).
Second, the notion that processing cached disk contents slows down the system is rather funny - the kernel disk cache is on the order of 1000 times as fast as the actual disks can ever be.
There are kernel (sysctl) settings for this, but your post implies that you have set these to non-standard values rather than that they could or should be improved from the defaults... _________________ >>> emerge (3 of 7) mcse/70-293 to /
Essential tools: gentoolkit eix profuse screen
Last edited by adaptr on Thu Mar 10, 2005 10:47 am; edited 1 time in total |
|
Back to top |
|
|
yottabit Guru
Joined: 11 Nov 2002 Posts: 313 Location: Columbus, Ohio, US
|
Posted: Thu Mar 10, 2005 10:46 am Post subject: |
|
|
The degradation is a general I/O type, not specifically limited to the disk. The contributing factors could be highmem kernel support, the choice of kernel I/O scheduler (deadline at the moment, which seems reasonable), interface driver (promise), controller (Promise S150 TX4), drives (Hitachi 7K250 250 GB SATA-100 w/ 8 MB Cache), striping size (no striping and 32k tested thus far), and filesystem (tested ReiserFS 3.6 and ext2 thus far).
Yes, those present a lot of variables. But while watching top, and with the results of Iozone, it is quite evident that the performance bottleneck disappears when passing the -I option to Iozone, bypassing the kernel cache.
You can reference my on-going struggle in this thread, but here I'd like to concentrate on facilitating a cache-disable on the necessary filesystem in case my testing doesn't provide a clear winner that interoperates with the kernel's cache in a more friendly manner.
J
EDIT: After reading your edit I guess I should clarify. The kernel must know the I/O request is for sequential data in a mammoth file, and it seems the kernel decides to 'anticipate' the next requests (which is a logical thing to do since they're 100% predictable in this case) by filling its cache at a higher priority than simply passing the data to the requesting application. I can see an interesting pattern based on NIC utilization (to the workstation requesting the data) where utilization starts very low while top shows the kernel cache growing like mad, and then network utilization peaks much higher. After a short while of this, the pattern repeats. With Iozones disabling the kernel caching mech (by I/O call type) the performance difference is dramatic. _________________ Play The Hitchhiker's Guide to the Galaxy!
Last edited by yottabit on Thu Mar 10, 2005 10:55 am; edited 1 time in total |
|
Back to top |
|
|
adaptr Watchman
Joined: 06 Oct 2002 Posts: 6730 Location: Rotterdam, Netherlands
|
Posted: Thu Mar 10, 2005 10:53 am Post subject: |
|
|
Well, maybe you have already tested this to the limit (or think you have), but the issue is really quite simple:
There are two factors, pure disk I/O and pure virtual (page) memory I/O, which has slightly more overhead than real RAM I/O, but not so much as you'd notice - the entire kernel runs on the virtual memory subsystem, after all.
These two are bound together by the caching code in the kernel.
I am curious as to what numbers you got exactly, since if the kernel's memory caching of disk accesses really slows things down then you have encountered a serious issue indeed - a relatively arbitrary piece of code that can under some circumstances cause a 100-fold or bigger throughput degradation right inside the kernel.
Not good, and very, very unlikely.
Hence my questions - have you examined all the angles ?
EDIT: having read your edit I understand a little better what you mean.
Still, the difference in access speed between any I/O system and pure memory I/O is more than 100-fold in any case - if the kernel's caching code were to dump the entire free RAM at once you would not even be able to see this, since modern DRAM easily achieves transfer speeds of gigabytes per second, virtual memory overhead included.
This means that there is more complex logic at work with the discrepancy (stair-stepping) you observe in top, and I think it indicates that either end of the caching process is actually waiting for something before taking the next step - which, like you said, is completely unnecessary when caching linear blocks from large files; just assign a DMA transfer and forget about it. _________________ >>> emerge (3 of 7) mcse/70-293 to /
Essential tools: gentoolkit eix profuse screen
Last edited by adaptr on Thu Mar 10, 2005 11:00 am; edited 1 time in total |
|
Back to top |
|
|
yottabit Guru
Joined: 11 Nov 2002 Posts: 313 Location: Columbus, Ohio, US
|
Posted: Thu Mar 10, 2005 10:58 am Post subject: |
|
|
As soon as my current Iozone test on Reiser4 completes (15-30 minutes) I'll run an Iozone test with ext2 (format optimized for large files) with cache enabled and with cache disabled, and post the results.
The other testing based on NIC utilization and watching top is a little more difficult to put into reliable numbers. _________________ Play The Hitchhiker's Guide to the Galaxy! |
|
Back to top |
|
|
yottabit Guru
Joined: 11 Nov 2002 Posts: 313 Location: Columbus, Ohio, US
|
Posted: Fri Mar 11, 2005 7:50 pm Post subject: |
|
|
Okay, sorry it was a little longer than 15-30 minutes (like 2 days), hehhe. But I had a disastrous experience with ReiserFS 4 Beta that warranted some documentation. See this thread if you're interested.
ext2 format options (I mounted to /mnt/bigarray but it's not an array, just FYI):
Code: | hal root # time mke2fs -T largefile4 -v /dev/sdd1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
59648 inodes, 61049000 blocks
3052450 blocks (5.00%) reserved for the super user
First data block=0
1864 block groups
32768 blocks per group, 32768 fragments per group
32 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
real 0m11.298s
user 0m0.035s
sys 0m0.148s
hal root # time mount -v /dev/sdd1 /mnt/bigarray -t ext2
/dev/sdd1 on /mnt/bigarray type ext2 (rw)
real 0m0.005s
user 0m0.000s
sys 0m0.002s
hal root # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 77G 43G 34G 56% /
none 506M 0 506M 0% /dev/shm
/dev/sdd1 233G 20K 222G 1% /mnt/bigarray |
Now that the boring stuff is out of the way, onto the data gathered by Iozone.
The options used for Iozone were:
Code: | # With -I to disable kernel caching
iozone -a -i 0 -i 1 -I -M -o -p -g 2g -b ~/iozone/ext2-native.xls
# Without -I to enable default kernel caching:
iozone -a -i 0 -i 1 -M -o -p -g 2g -b ~/iozone/ext2-native-caching.xls |
With caching disabled the 64 MB record-length first-write performance of the 2 GB file was 35.797 MB/s. The first-read performance was 59.364 MB/s.
With caching enabled the 64 MB record-length first-write performance of the 2 GB file was 18.057 MB/s, nearly half as fast! The first-read performance was 41.932 MB/s, also slower.
The full data is available for download here.
Comments? _________________ Play The Hitchhiker's Guide to the Galaxy! |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|