Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Poor NFS performance on one machine
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Wed Jul 04, 2007 10:06 pm    Post subject: Poor NFS performance on one machine Reply with quote

This is kind of a continuation of this thread. Basically, NFS transfers are extremely slow to one computer only.

I have a gigabit LAN with a number of computers connected (all running Linux). One of these machines is a fileserver, and one is a media PC. I used the net-analyzer/nttcp program to verify that gigabit speeds are being achieved between all machines. I.e., I believe this rules out flaky hardware, bad drivers, poor cables and the switch as a source of problems.

Machine #1:
Code:
fileserver $ nttcp -T machine1
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
     l  8388608    0.08    0.08    837.2073    871.6682    2048  25549.54   26601.2
     1  8388608    0.08    0.03    836.5603   2237.3350    2413  30079.78   80446.7



Machine #2:
Code:
fileserver $ nttcp -T machine2
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
     l  8388608    0.09    0.09    741.5672    737.5654    2048  22630.83   22508.7
     1  8388608    0.09    0.02    741.0513   4194.0419    5889  65029.43  368039.5


Media PC:
Code:
fileserver $ nttcp -T media_pc
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
     l  8388608    0.08    0.06    853.6503   1065.3723    2048  26051.34   32512.6
     1  8388608    0.08    0.07    842.0712   1016.9551    2058  25823.45   31186.5



Next, I used time and dd to copy an enormous file (2xRAM) from /dev/zero to an NFS-mounted share. Then I copied that file from the NFS-mounted share to /dev/zero. Here are the results:

Machine #1:
Code:
 $ time dd if=/dev/zero of=./testfile bs=16k count=262912
4307550208 bytes (4.3 GB) copied, 192.195 s, 22.4 MB/s
real    3m12.255s

$ time dd of=/dev/null if=./testfile bs=16k
4307550208 bytes (4.3 GB) copied, 95.6154 s, 45.1 MB/s
real    1m35.634s


Machine #2:
Code:
$ time dd if=/dev/zero of=./testfile bs=16k count=262912
4307550208 bytes (4.3 GB) copied, 211.494 seconds, 20.4 MB/s
real    3m31.514s

$ time dd of=/dev/null if=./testfile bs=16k
4307550208 bytes (4.3 GB) copied, 77.4161 seconds, 55.6 MB/s
real    1m17.421s


Fileserver (here the copying is local, i.e. no network involved):
Code:
$ time dd if=/dev/zero of=./testfile bs=16k count=262912
4307550208 bytes (4.3 GB) copied, 49.1751 s, 87.6 MB/s
real    0m49.206s

$ time dd of=/dev/null if=./testfile bs=16k
4307550208 bytes (4.3 GB) copied, 23.8136 s, 181 MB/s
real    0m23.817s


Media PC:
Code:
$ time dd if=/dev/zero of=./testfile bs=16k count=131456
2153775104 bytes (2.2 GB) copied, 110.67 s, 19.5 MB/s
real    1m50.781s

$ time dd if=./testfile of=/dev/null bs=16k
2153775104 bytes (2.2 GB) copied, 55.7257 s, 38.6 MB/s
real    0m55.728s


So the Media PC is slightly slower, but not alarmingly so.

Now I do a more "real-world" test: copy an actual 2.4 GB file from the NFS share to a local folder.

Machine #1:
Code:
$ time cp /nfs/share/hugefile /tmp/
real    1m8.703s
# => 36.14 MB/s


Machine #2:
Code:
$ time cp /nfs/share/hugefile /tmp/
real    0m55.509s
# => 44.68 MB/s


Media PC:
Code:
$ time cp /nfs/share/hugefile /tmp/
real    2m34.234s
# => 15.95 MB/s


So when doing a "real" file copy from an NFS share to the local drive, the media PC is twice as slow as the others! That doesn't make any sense to me.

Also, the mount NFS mount options I use for the media PC is the same as the other two computers.

Any ideas?

Thanks!
Back to top
View user's profile Send private message
cyrillic
Watchman
Watchman


Joined: 19 Feb 2003
Posts: 7313
Location: Groton, Massachusetts USA

PostPosted: Wed Jul 04, 2007 11:46 pm    Post subject: Reply with quote

It looks like your Media PC has a slower harddrive than your other machines do.

Try "hdparm -tT" on several of your machines.
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Thu Jul 05, 2007 12:13 am    Post subject: Reply with quote

cyrillic wrote:
It looks like your Media PC has a slower harddrive than your other machines do.

Try "hdparm -tT" on several of your machines.


My Media PC's disk performance seems consistent with the others:

Machine #1:
Code:
$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   5830 MB in  2.00 seconds = 2915.58 MB/sec
 Timing buffered disk reads:  206 MB in  3.02 seconds =  68.22 MB/sec


Machine #2:
Code:
$ hdparm -tT /dev/hda

/dev/hda:
 Timing cached reads1288 MB in  2.00 seconds = 644.05 MB/sec
 Timing buffered disk reads:  176 MB in  3.00 seconds =  58.66 MB/sec


Fileserver:
Code:
$ hdparm -tT /dev/hda

/dev/hda:
 Timing cached reads:   920 MB in  2.00 seconds = 460.24 MB/sec
 Timing buffered disk reads:  146 MB in  3.03 seconds =  48.18 MB/sec


Media PC:
Code:

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   2036 MB in  2.00 seconds = 1017.87 MB/sec
 Timing buffered disk reads:  188 MB in  3.02 seconds =  62.26 MB/sec


But that hdparm command only tests read speed. In my scenario, I am reading from NFS, and writing to the local disk. So I tried the "dd" test on the Media PC:

Code:
media_pc:/tmp $ time dd if=/dev/zero of=./testfile bs=16k count=131456
2153775104 bytes (2.2 GB) copied, 96.6191 s, 22.3 MB/s

real    1m36.639s


That's not quite as slow as writing a file copied from an NFS share (15 MB/s), but still pretty slow.

Any ideas why the write speed might be so slow on this disk?

Just for kicks, I used dd to do a read tes as wellt:
Code:
media_pc:/tmp $ time dd if=/dev/null if=./testfile bs=16k             
2153775104 bytes (2.2 GB) copied, 87.3624 s, 24.7 MB/s

real    1m27.449s


Those results aren't consistent with hdparm -tT.

Here's the output of hdparm -I on the media PC:
Code:
$ hdparm -I /dev/sda
/dev/sda:

ATA device, with non-removable media
        Model Number:       WDC WD3200KS-00PFB0                     
        Serial Number:      WD-WCAPD3145103
        Firmware Revision:  21.00M21
Standards:
        Supported: 7 6 5 4
        Likely used: 7
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors:  625142448
        device size with M = 1024*1024:      305245 MBytes
        device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 1
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    SATA-II signaling speed (3.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct


Anything I should tweak to improve this drive's performance?
Back to top
View user's profile Send private message
cyrillic
Watchman
Watchman


Joined: 19 Feb 2003
Posts: 7313
Location: Groton, Massachusetts USA

PostPosted: Thu Jul 05, 2007 12:53 am    Post subject: Reply with quote

number_nine wrote:
Any ideas why the write speed might be so slow on this disk?

It could be caused by a number of things, including:
    Choice of filesystem type
    Fragmentation
    Almost full filesystem
    Hardware characteristics ...
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Thu Jul 05, 2007 1:23 am    Post subject: Reply with quote

cyrillic wrote:
number_nine wrote:
Any ideas why the write speed might be so slow on this disk?

It could be caused by a number of things, including:
    Choice of filesystem type
    Fragmentation
    Almost full filesystem
    Hardware characteristics ...


Regarding the first three points: I was using /tmp for testing (which is its own partition). It's using ext3. This partition shouldn't be too fragmented, as it's virtually empty:
Code:
media_pc $ df -h           
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda6             3.8G   65M  3.6G   2% /tmp


These characteristics are similar to machines 1 and 2 (also using ext3, practically empty).

So I think that leaves hardware characteristics. The board is an Abit NF-M2 nView, which uses the nVidia gForce 6150 / NF430 chipset. The drive itself is a Western Digital 320 GB SATA (see hdparm -I output above). This is fairly modern hardware, so I'd expect it to be as fast as any of my other machines.

Here's the output of lspci on the media pc:
Code:
media_pc $ lspci
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:05.0 VGA compatible controller: nVidia Corporation C51PV [GeForce 6150] (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a2)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a2)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a2)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
03:0a.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder (rev 05)
03:0a.1 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [Audio Port] (rev 05)
03:0a.2 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [MPEG Port] (rev 05)
03:0a.4 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [IR Port] (rev 05)


Any other thoughts or ideas?
Thanks again!
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Sat Jul 07, 2007 10:56 pm    Post subject: Reply with quote

Bump... any ideas? It looks to me like the hardware is the limiting factor here, but I'd like to do some more tests to determine that for sure.

Thanks!
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Mon Jul 09, 2007 6:31 am    Post subject: Reply with quote

number_nine wrote:

So I think that leaves hardware characteristics.


How about the hardware? From the count of your dd command it seems the media pc has half the memory as the other two boxes? What about cpu speed comparison. Do some boxes have raid drives?

Any chance the media pc is 64bit gentoo? There is a mysterious disk slowdown going on for that arch.

I read one of your other threads, my x2 3800 bogomips are the same for both cores at ~ 4020 - why aren't yours?


Also for those size files have you looked into XFS or JFS etc?
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Mon Jul 09, 2007 12:42 pm    Post subject: Reply with quote

flybynite wrote:
How about the hardware? From the count of your dd command it seems the media pc has half the memory as the other two boxes? What about cpu speed comparison. Do some boxes have raid drives?


Right, the media PC has 1 GB (2x512), and the other boxes all have 2 GB. Do think that's significant though?

The media PC is an Athlon64 X2 3800 (2.0 GHz, socket AM2).

The file server is an Athlon64 X2 3600 (1.9 GHz, though I actually have it running under the "powersave" CPU frequency governor, so each core is running at 1.0 GHz).

Machine #1 is a Core2Duo E6600.

Machine #2 is an Athlon64 X2 3800 (2.0 GHz, socket 939).

Only the fileserver has RAID (RAID 5 for the data store). All other machines are single disk configurations (no LVM or anything like that either).

flybynite wrote:
Any chance the media pc is 64bit gentoo? There is a mysterious disk slowdown going on for that arch.


Yes, although so are all other boxes except Machine #2 (which is 32bit Ubuntu).

Do you have a link to more info on this?

flybynite wrote:
I read one of your other threads, my x2 3800 bogomips are the same for both cores at ~ 4020 - why aren't yours?


I just replied to that thread not too long ago---originally, the media PC was overclocked a bit. I believe that was causing the bogomips discrepancy. I've since restored the CPU to factory speeds, and now each core's bogomips is virtually equal. However, the disk performance issue remains.

flybynite wrote:
Also for those size files have you looked into XFS or JFS etc?


If I can't come to a conclusion on this problem, I might give XFS a try.

Thank you!
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Mon Jul 09, 2007 1:09 pm    Post subject: Reply with quote

Where, physically on the disk, is your tmp partition that you tested the dd into?

I had noticed that hard disks write from the outside in, meaning sector 0 is on the largest circumference track and the last sector is on the innermost track. And that I/O to the inner parts of the disk can be as much as half the speed of I/O to the outside of the disk. hdparm tests the outside as far as I know.
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Mon Jul 09, 2007 1:19 pm    Post subject: Reply with quote

Akkara wrote:
Where, physically on the disk, is your tmp partition that you tested the dd into?

I had noticed that hard disks write from the outside in, meaning sector 0 is on the largest circumference track and the last sector is on the innermost track. And that I/O to the inner parts of the disk can be as much as half the speed of I/O to the outside of the disk. hdparm tests the outside as far as I know.


I was wondering about that myself. The disk is a 320 GB drive, with the first (outermost) 77 or so GB used for system data (including /, /boot, /var, /tmp, /usr). The remaining 240 GB is for /home. So /tmp isn't on the fastest part of the disk, but it should be on the "faster" 25% of the disk.

Code:

Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          17      136521   83  Linux
/dev/sda2              18         516     4008217+  82  Linux swap / Solaris
/dev/sda3             517        1015     4008217+  83  Linux
/dev/sda4            1016       38913   304415685    5  Extended
/dev/sda5            1016        2012     8008371   83  Linux
/dev/sda6            2013        2511     4008186   83  Linux
/dev/sda7            2512        4504    16008741   83  Linux
/dev/sda8            4505       38913   276390261   83  Linux

$ mount | grep sda
/dev/sda3 on / type ext3 (rw,noatime)
/dev/sda5 on /var type ext3 (rw,noatime)
/dev/sda6 on /tmp type ext3 (rw,noatime)
/dev/sda7 on /usr type ext3 (rw,noatime)
/dev/sda8 on /home type ext3 (rw,noatime)
/dev/sda1 on /boot type ext2 (rw,noatime)
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Tue Jul 10, 2007 6:03 am    Post subject: Reply with quote

number_nine wrote:

Right, the media PC has 1 GB (2x512), and the other boxes all have 2 GB. Do think that's significant though?


The disk cache seriously skews any speed readings. Since ext3 is notoriously slow with huge files, the larger mem machines may cache all the metadata for the disk while the media pc has to page disk structures in and out for example.

The disk transfer test you did will show the transfer finished even though there is still disk cache not written to the physical medium yet. You probably noticed the transfer complete while the disk light was still on. The disk cache is larger with more memory.

This could explain when each box was writing to the nfs share, the transfers were 22,20,19.5 upload and 45,55,38 download. Upload was the same since the raid 5 write is the limit. The raid 5 read is fast so for the download the pc writing was the limit. The media pc was slower but probably in the norm for misc variables and less memory/disk cache.

Lets recap.
A. So the network transfer test showed close performance so it's not the network itself.
B. With the raid 5 on the fileserver writing is slower so it limits the transfers and not the source as much. This test really only shows the limits of the raid 5 writing performance was exceeded by all the sending boxes.
C. Then there is the copy from the server to the media pc which is what you really want and it is half the other boxes performance.

Whats different about this is the raid 5 on the server is sending data faster than a single disk can write. Your using the network and the disk simultaneously.

1. The nfs could be a player. The kernel versions are probably not the same on all machines. NFS code does change and the share might have the same options, might be actually mounted different due to default options changing on different kernel versions.

2. You've got an interrupt storm. Gigabit can overwhelm the pc backbone, check for mitigation options in the kernel. The skge driver for example has a feature to coalesce interrupts just to mitigate this problem.

Is your network card and the disk drive sharing an interrupt? This could be slowing down both drivers when used together but won't be noticed when tested separately like the simple network test and the simple disk test. You may need to disable unused devices such as the parallel/serial ports, onboard sound, move cards, or deactivate/reactive/reboot to shuffle the interrupts so the disk and network don't share. Video cards also hog interrupts and should not share with the network or the disk. It took about 20 reboots to shuffle the interrupts in my myth box.


number_nine wrote:

flybynite wrote:
Any chance the media pc is 64bit gentoo? There is a mysterious disk slowdown going on for that arch.

Yes, although so are all other boxes except Machine #2 (which is 32bit Ubuntu).
Do you have a link to more info on this?


Here is some gentoo discussion.
https://forums.gentoo.org/viewtopic-t-482731.html

I've seen it in my box, taking 10 seconds to move the mouse across the screen. I've changed some things and haven't seen it noticeably since the changes. It seemed that anything that hit the disk hard saturated the system and slowed all other processes to a crawl.


number_nine wrote:

flybynite wrote:
Also for those size files have you looked into XFS or JFS etc?


If I can't come to a conclusion on this problem, I might give XFS a try.


XFS or JFS won't fix your problem, but will probably improve performance on all machines with large files. Deleting a multi-gigabyte file under ext3 could freeze your system for almost 5 seconds, probably longer than permissible when dealing with video. Either XFS or JFS will do that delete in under 1 second. Yes, there are band-aids that hide the problem, but you sound like you have the know how to fix the problem. But, I understand it isn't alway convenient or other factors can make ext3 a good overall choice even though it struggles with gigabyte size files.
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Tue Jul 10, 2007 1:49 pm    Post subject: Reply with quote

Wow, lots of info, thank you!

flybynite wrote:
number_nine wrote:

Right, the media PC has 1 GB (2x512), and the other boxes all have 2 GB. Do think that's significant though?


The disk cache seriously skews any speed readings. Since ext3 is notoriously slow with huge files, the larger mem machines may cache all the metadata for the disk while the media pc has to page disk structures in and out for example.


You're talking about the kernel's disk cache, right (and not the drive's actual hardware cache)?

That's interesting. Based on the thread you linked below, I started playing with the sys-apps/dstat program. I copied my test 2.4 GB file from the NFS share to the media pc's /tmp folder while running dstat. The first transfer took over four minutes, which is about 10 MB/s. I deleted the file from /tmp, and did the copy again and it took about two minutes (as I previously reported). So there was definitely a cache effect in that test, though I'm not sure which cache(s) were used.

Anyway, the dstat output was definitely interesting on the first run, but because I'm still new to the tool, I can't say exactly what was interesting... Point is, I need to do some more testing with dstat. :)

flybynite wrote:
The disk transfer test you did will show the transfer finished even though there is still disk cache not written to the physical medium yet. You probably noticed the transfer complete while the disk light was still on. The disk cache is larger with more memory.


Hmm. Maybe I should try upping the RAM in the media PC, just to see what happens.

FWIW, I've been doing most of this testing remotely, so haven't been able to see the disk light. :) But I think dstat will work as a disk light of sorts.

flybynite wrote:
1. The nfs could be a player. The kernel versions are probably not the same on all machines. NFS code does change and the share might have the same options, might be actually mounted different due to default options changing on different kernel versions.


Actually, by sheer luck, kernel versions are the same on the server, media PC, and machine #1 (2.6.19-gentoo-r5). However, the kernel configs are different for each machine.

flybynite wrote:
2. You've got an interrupt storm. Gigabit can overwhelm the pc backbone, check for mitigation options in the kernel. The skge driver for example has a feature to coalesce interrupts just to mitigate this problem.

Is your network card and the disk drive sharing an interrupt? This could be slowing down both drivers when used together but won't be noticed when tested separately like the simple network test and the simple disk test. You may need to disable unused devices such as the parallel/serial ports, onboard sound, move cards, or deactivate/reactive/reboot to shuffle the interrupts so the disk and network don't share. Video cards also hog interrupts and should not share with the network or the disk. It took about 20 reboots to shuffle the interrupts in my myth box.


Again, by luck, my interrupts look pretty evenly distributed:
Code:
$ cat /proc/interrupts
           CPU0       CPU1
  0:  273023560  229759134    XT-PIC-XT        timer
  1:          0          2   IO-APIC-edge      i8042
  9:          0          5   IO-APIC-fasteoi   acpi
 12:          0          4   IO-APIC-edge      i8042
 14:        158      54000   IO-APIC-edge      ide0
 16:      38852   15500201   IO-APIC-fasteoi   nvidia
 18:       2486     565445   IO-APIC-fasteoi   cx88[0], cx88[0], cx88[0]
 19:          0          3   IO-APIC-fasteoi   ohci1394
 20:          0          3   IO-APIC-fasteoi   ehci_hcd:usb1
 21:          0          0   IO-APIC-fasteoi   libata
 22:      17937    2875256   IO-APIC-fasteoi   libata, HDA Intel
 23:     132714  105589916   IO-APIC-fasteoi   ohci_hcd:usb2, eth0
NMI:     502496     502302
LOC:  502703331  502703289
ERR:          0


Even though none of the biggest interrupt generators are sharing an IRQ, could the sheer number of interrupts be overwhelming my system?

Another thought: is it possible that the PCI bus is being saturated? If the disk controller and NIC are both attached to the PCI bus, maybe that's the problem. Is there a way to tell which devices are connected to the PCI bus versus the PCI express bus?

I've got an Intel PCIe Gigabit NIC on hand. That gives me an extra card with which to tweak kernel config params as well as see if PCI vs PCIe makes any difference.

flybynite wrote:
I've seen it in my box, taking 10 seconds to move the mouse across the screen. I've changed some things and haven't seen it noticeably since the changes. It seemed that anything that hit the disk hard saturated the system and slowed all other processes to a crawl.


That's some more testing I can do (i.e., see if the box grinds to a crawl when doing heavy disk IO).

flybynite wrote:
XFS or JFS won't fix your problem, but will probably improve performance on all machines with large files. Deleting a multi-gigabyte file under ext3 could freeze your system for almost 5 seconds, probably longer than permissible when dealing with video. Either XFS or JFS will do that delete in under 1 second. Yes, there are band-aids that hide the problem, but you sound like you have the know how to fix the problem. But, I understand it isn't alway convenient or other factors can make ext3 a good overall choice even though it struggles with gigabyte size files.


I agree. The first time I built a MythTV box, I used XFS. I had some reason for switching to ext3, but can't recall now (probably means it wasn't a very good reason!) :)

Thanks again for all the help! Gives me lots of stuff to try.
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Wed Jul 11, 2007 12:07 am    Post subject: Reply with quote

One way to get a better idea what your write performance is without the cache holding part of your data to write later, is to do it between syncs:
Code:
sync
time cp /your/big/file /to/where/you/want/it
time sync


The first sync writes out anything pending from before. The sum of the cp time and the second sync then gives a good idea of what the total time to get it to disk is.
Back to top
View user's profile Send private message
markp2000
n00b
n00b


Joined: 30 Oct 2004
Posts: 41

PostPosted: Wed Jul 11, 2007 5:48 am    Post subject: Reply with quote

I have followed you from the other post. I have read the above and I still feel you are having NIC issues.

http://mysettopbox.tv/phpBB2/viewtopic.php?t=8947&highlight=

This is a post I ran through and then discovered or solved the problem with a new NIC.

Mark
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum