Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Weird behavior with iozone, ext4, 2.6.30-r5, megaraid_sas.
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
jba
n00b
n00b


Joined: 22 Sep 2005
Posts: 35
Location: new york, ny

PostPosted: Thu Oct 29, 2009 5:30 am    Post subject: Weird behavior with iozone, ext4, 2.6.30-r5, megaraid_sas. Reply with quote

Hi all,

Fairly weird problem. I'm benchmarking a new server that will run postgres. Dell R710. 48GB ram. Dual Intel X5570 CPU's. One of the bigger boxes i've had gentoo on.

It's connected up to a Dell MD1120 via a PERC 6/E SAS raid controller. 13 15krpm drives in there (12 in a raid10 set, and a hot spare).

I'm running the 2.6.30-gentoo-r5 with a pretty normal config. Generally the system seems stable.

Through the benchmarks i've been throwing at it, i've run into a weird issue as the benchmarks finish.

Here's a sample IOzone w/ a 64k stripe size:

Code:

db3 data # iozone -i 0 -i 1 -i 2 -i 8 -s 10g -T -t 8
   Iozone: Performance Test of File I/O
           Version $Revision: 3.242 $
      Compiled for 64 bit mode.
      Build: linux-AMD64

   Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                Randy Dunlap, Mark Montague, Dan Million,
                Jean-Marc Zucconi, Jeff Blomberg,
                Erik Habbinga, Kris Strecker, Walter Wong.

   Run began: Thu Oct 29 00:50:20 2009

   File size set to 10485760 KB
   Command line used: iozone -i 0 -i 1 -i 2 -i 8 -s 10g -T -t 8
   Output is in Kbytes/sec
   Time Resolution = 0.000001 seconds.
   Processor cache size set to 1024 Kbytes.
   Processor cache line size set to 32 bytes.
   File stride size set to 17 * record size.
   Throughput test with 8 threads
   Each thread writes a 10485760 Kbyte file in 4 Kbyte records

   Children see throughput for  8 initial writers    =  588562.83 KB/sec
   Parent sees throughput for  8 initial writers    =  474765.53 KB/sec
   Min throughput per thread          =   70407.44 KB/sec
   Max throughput per thread          =   75926.89 KB/sec
   Avg throughput per thread          =   73570.35 KB/sec
   Min xfer                = 9727024.00 KB

   Children see throughput for  8 rewriters    =  578675.23 KB/sec
   Parent sees throughput for  8 rewriters    =  513802.44 KB/sec
   Min throughput per thread          =   71805.98 KB/sec
   Max throughput per thread          =   72727.81 KB/sec
   Avg throughput per thread          =   72334.40 KB/sec
   Min xfer                = 10352288.00 KB

   Children see throughput for  8 readers       =  231325.92 KB/sec
   Parent sees throughput for  8 readers       =  231319.97 KB/sec
   Min throughput per thread          =   28134.68 KB/sec
   Max throughput per thread          =   30442.98 KB/sec
   Avg throughput per thread          =   28915.74 KB/sec
   Min xfer                = 9690864.00 KB

   Children see throughput for 8 re-readers    =  232281.25 KB/sec
   Parent sees throughput for 8 re-readers    =  232276.11 KB/sec
   Min throughput per thread          =   28333.96 KB/sec
   Max throughput per thread          =   29996.86 KB/sec
   Avg throughput per thread          =   29035.16 KB/sec
   Min xfer                = 9904752.00 KB


Not bad numbers, though, the fact that readers are worse than writers is another issue, but as the last test finishes, the system gets very slow, and i start to see this in dstat:

Code:

usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  93   6   0   0|  12M    0 | 560B  428B|   0     0 |5621  5083
  0   0  94   6   0   0|  12M   32k| 696B 1128B|   0     0 |5638  5042
  0   0  93   6   0   0|  12M    0 | 560B  492B|   0     0 |5698  5117
  0   0  93   7   0   0|  13M    0 |1132B  428B|   0     0 |5783  5188
  0   0  94   6   0   0|  12M    0 |1211B  428B|   0     0 |5638  5050
  0   0  93   7   0   0|  12M    0 | 876B  428B|   0     0 |5666  5131
  0   0  93   6   0   0|  12M    0 | 706B  428B|   0     0 |5611  5014
  0   0  93   6   0   0|  12M    0 | 684B  428B|   0     0 |5652  5098
  0   0  93   7   0   0|  12M    0 | 816B  428B|   0     0 |5662  5125
  0   0  94   6   0   0|  12M    0 | 684B  428B|   0     0 |5664  5125
  0   0  92   7   0   0|  12M    0 | 368B  428B|   0     0 |5668  5016
  0   0  94   6   0   0|  12M    0 | 620B  428B|   0     0 |5639  5089
  0   0  92   7   0   0|  12M    0 | 706B  428B|   0     0 |5670  5079
  0   0  94   6   0   0|  12M    0 | 766B  428B|   0     0 |5675  5122
  0   0  94   6   0   0|  12M    0 | 624B  428B|   0     0 |5672  5129
  0   0  93   7   0   0|  12M    0 | 620B  428B|   0     0 |5628  5022
  0   0  93   6   0   0|  12M    0 | 368B  428B|   0     0 |5678  5136
  0   0  94   6   0   0|  12M    0 | 556B  428B|   0     0 |5730  5126
  0   0  92   7   0   0|  12M    0 | 624B  428B|   0     0 |5688  5164
  0   0  94   6   0   0|  12M    0 | 620B  428B|   0     0 |5626  5090
  0   0  93   7   0   0|  12M    0 | 496B  428B|   0     0 |5633  5077
  0   0  93   6   0   0|  12M    0 | 894B  428B|   0     0 |5684  5094
  0   0  94   6   0   0|  12M    0 | 578B  428B|   0     0 |5671  5103
  0   0  93   7   0   0|  12M    0 | 906B  428B|   0     0 |5687  5128
  0   0  94   6   0   0|  12M    0 |1072B  428B|   0     0 |5724  5113
  0   0  94   6   0   0|  12M    0 | 812B  428B|   0     0 |5640  5080
  0   0  93   7   0   0|  12M    0 | 752B  428B|   0     0 |5671  5125
  0   0  93   6   0   0|  12M    0 | 748B  428B|   0     0 |5723  5177
  0   0  93   6   0   0|  12M    0 | 624B  428B|   0     0 |5633  5082
  0   0  93   7   0   0|  12M    0 | 958B  428B|   0     0 |5622  5080
  0   0  94   6   0   0|  11M    0 | 560B  428B|   0     0 |5579  4952
  0   0  92   8   0   0|  12M    0 | 574B  428B|   0     0 |5677  5057
  0   0  94   6   0   0|  12M    0 | 496B  428B|   0     0 |5628  5056
  0   0  93   7   0   0|  12M    0 | 876B  428B|   0     0 |5639  5051


Almost perfectly consistent 12MB/sec reads, and any access to the tested volume becomes extremely lagged - to the point of the system almost locking up. Even an 'ls' on that volume might take 10 minutes to complete - yet IOwait is sub 10%.. (wtf?)

Eventually the IOzone completes, but many minutes (hours?) after the last test has already completed.

If i strace on the iozone process, you can see it doing tons of lseaks and reads:

Code:

[pid  4710] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4710] lseek(6, 7012470784, SEEK_SET) = 7012470784
[pid  4710] read(6, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4710] lseek(6, 5325385728, SEEK_SET) = 5325385728
[pid  4710] read(6, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4710] lseek(6, 2260058112, SEEK_SET) = 2260058112
[pid  4710] read(6,  <unfinished ...>
[pid  4708] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4708] lseek(5, 9403219968, SEEK_SET) = 9403219968
[pid  4708] read(5, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4708] lseek(5, 2561925120, SEEK_SET) = 2561925120
[pid  4708] read(5, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4708] lseek(5, 2125836288, SEEK_SET) = 2125836288
[pid  4708] read(5,  <unfinished ...>
[pid  4715] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4715] lseek(10, 9672151040, SEEK_SET) = 9672151040
[pid  4715] read(10, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4715] lseek(10, 3074637824, SEEK_SET) = 3074637824
[pid  4715] read(10,  <unfinished ...>
[pid  4713] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4713] lseek(7, 5254766592, SEEK_SET) = 5254766592
[pid  4713] read(7, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4713] lseek(7, 1041141760, SEEK_SET) = 1041141760
[pid  4713] read(7,  <unfinished ...>
[pid  4709] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4709] lseek(4, 2355765248, SEEK_SET) = 2355765248
[pid  4709] read(4,  <unfinished ...>
[pid  4711] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4711] lseek(3, 632438784, SEEK_SET) = 632438784
[pid  4711] read(3,  <unfinished ...>
[pid  4712] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4712] lseek(8, 8557805568, SEEK_SET) = 8557805568
[pid  4712] read(8, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4712] lseek(8, 8371425280, SEEK_SET) = 8371425280
[pid  4712] read(8, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4712] lseek(8, 4976214016, SEEK_SET) = 4976214016
[pid  4712] read(8, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4712] lseek(8, 1170989056, SEEK_SET) = 1170989056
[pid  4712] read(8,  <unfinished ...>
[pid  4714] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4714] lseek(9, 2156314624, SEEK_SET) = 2156314624
[pid  4714] read(9, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4714] lseek(9, 9273806848, SEEK_SET) = 9273806848
[pid  4714] read(9, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4714] lseek(9, 4972687360, SEEK_SET) = 4972687360
[pid  4714] read(9, "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4714] lseek(9, 2344914944, SEEK_SET) = 2344914944
[pid  4714] read(9,  <unfinished ...>
[pid  4715] <... read resumed> "\245\245\245\245\245\245\245\245\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid  4715] lseek(10, 72925184, SEEK_SET) = 72925184
[pid  4715] read(10,  <unfinished ...>


Nothing there explains why the system is so lagged.

I haven't tested with other filesystems yet, maybe it's an ext4 bug, but that seems dubious.

Any ideas where else i can look?

TIA, as always.
Back to top
View user's profile Send private message
jba
n00b
n00b


Joined: 22 Sep 2005
Posts: 35
Location: new york, ny

PostPosted: Thu Oct 29, 2009 9:25 pm    Post subject: Reply with quote

An update - the problem appears to be extremely slow flushing of filesystem buffers. The raid device is extremely fast, so i don't quite understand this:

Code:

db3 foo # time dd if=/dev/zero of=bigfile bs=8k count=44779372 ; time sync
44779372+0 records in
44779372+0 records out
366832615424 bytes (367 GB) copied, 657.944 s, 558 MB/s

real   10m58.016s
user   0m5.992s
sys   7m7.905s

real   12m8.595s
user   0m0.000s
sys   12m8.202s


Keep in mind the box has 48G of ram, so buffering shouldn't come into play that much with the initial dd numbers due to the size of that write. Any ideas? Anyone!? Bueller?! :-)
Back to top
View user's profile Send private message
jba
n00b
n00b


Joined: 22 Sep 2005
Posts: 35
Location: new york, ny

PostPosted: Fri Oct 30, 2009 12:51 am    Post subject: Reply with quote

Wow, looks like a pretty serious regression with ext4 - re-formatted with XFS:

Code:

[root@db3][08:07:51][/data]> time dd if=/dev/zero of=bigfile bs=8k count=44779372 ; time sync
44779372+0 records in
44779372+0 records out
366832615424 bytes (367 GB) copied, 660.52 s, 555 MB/s

real   11m7.174s
user   0m5.829s
sys   7m20.474s

real   0m23.431s
user   0m0.000s
sys   0m4.006s


That's much more like it. Going to screw around with other kernel versions now.
Back to top
View user's profile Send private message
jba
n00b
n00b


Joined: 22 Sep 2005
Posts: 35
Location: new york, ny

PostPosted: Fri Oct 30, 2009 1:17 am    Post subject: Reply with quote

Oh yeah, confirmed massive ext4 regression - with volume formatted as ext4 again:

Code:

[root@db3][08:54:21][/data]> time dd if=/dev/zero of=bigfile bs=8k count=44779372 ; time sync
44779372+0 records in
44779372+0 records out
366832615424 bytes (367 GB) copied, 650.501 s, 564 MB/s

real   10m50.589s
user   0m6.356s
sys   7m16.458s

real   10m28.523s
user   0m0.000s
sys   10m28.211s


Pretty giant fail. Wonder if this is worthy of a gentoo kernel bug report.
Back to top
View user's profile Send private message
jba
n00b
n00b


Joined: 22 Sep 2005
Posts: 35
Location: new york, ny

PostPosted: Fri Oct 30, 2009 4:09 am    Post subject: Reply with quote

Btw, downgrading to 2.6.28-hardened fixed this issue.

With 2.6.28-hardened and ext4:

Code:

[root@db3][09:58:34][/data]> time dd if=/dev/zero of=bigfile bs=8k count=44779372 ; time sync
44779372+0 records in
44779372+0 records out
366832615424 bytes (367 GB) copied, 666.51 s, 550 MB/s

real   11m6.589s
user   0m7.502s
sys   7m24.853s

real   0m6.865s
user   0m0.000s
sys   0m0.001s


Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum