View previous topic :: View next topic |
Author |
Message |
petero n00b

Joined: 10 Jan 2013 Posts: 5
|
Posted: Thu Jan 10, 2013 5:44 pm Post subject: ls causes segmentation fault on drbd + ocfs2 |
|
|
Hello,
I've been trying to get drbd + ocfs2 working by following this guide: http://webcache.googleusercontent.com/search?q=cache:njQEooenU8cJ:en.gentoo-wiki.com/wiki/Active-active_DRBD_with_OCFS2+&cd=1&hl=en&ct=clnk
For the most part it works however when I create dir with symlinks to other dirs inside the drbd partition and then call
ls
on that dir it will end with segmentation fault (either that or sometimes it just hangs) and I find this in log
kernel: general protection fault: 0000 [#1] SMP
...
kernel: Pid: 17098, comm: ls Not tainted 3.5.7-gentoo #5 VMware, Inc. VMware Virtual Platform
...
kernel: Call Trace:
kernel: [<ffffffff8125388a>] ocfs2_fast_symlink_readpage+0xde/0x15c
kernel: [<ffffffff8108c494>] ? add_to_page_cache_lru+0x2f/0x39
kernel: [<ffffffff8108c602>] do_read_cache_page+0x8e/0x13c
kernel: [<ffffffff812537ac>] ? ocfs2_unblock_signals+0x1c/0x1c
kernel: [<ffffffff8108c6ea>] read_cache_page_async+0x17/0x19
kernel: [<ffffffff8108c6f5>] read_cache_page+0x9/0x13
kernel: [<ffffffff810c1127>] page_getlink.clone.29+0x28/0x82
kernel: [<ffffffff810c11a2>] page_follow_link_light+0x21/0x34
kernel: [<ffffffff810bfbce>] generic_readlink+0x3a/0x97
kernel: [<ffffffff810bacb1>] sys_readlinkat+0x76/0x94
kernel: [<ffffffff810bace5>] sys_readlink+0x16/0x18
kernel: [<ffffffff81488262>] system_call_fastpath+0x16/0x1b
kernel: Code: d1 48 8d 44 0a ff 40 38 30 74 0a 48 ff c8 48 39 d0 73 f3 31 c0 c9 c3 55 48 89 f8 48 89 e5 eb 03 48 ff c0 48 85 f6 74 08 48 ff ce <80> 38 00 75 f0 48 29 f8 c9 c3 55 31 c0 48 89 e5 eb 17 44 38 c1
kernel: RIP [<ffffffff812e6786>] strnlen+0x14/0x1e
kernel: RSP <ffff88031f8c1d08>
kernel: ---[ end trace add4a6818eca9284 ]---
my kernel is 3.5.7-gentoo x64
initially I was getting warnings that drbd version kernel space (8.3.13) doesn't match drbd tools user space (8.3.11) but I still continued installing and got it to the state when everything was working except of the symlinks
so then I tried installing 8.3.13 of drbd tools to see if it helps, and also different version of ocfs2 tools (finished installation on 1.8.2 later downgraded to 1.6.4) but none of those changes made any difference - ls is still crashing
My colleague has previously done the same installation on ubuntu using drbd 8.3.11-0ubuntu1 and ocfs2 1.6.3-4ubuntu1 and there it all works fine (our install steps were basically the same with the exception of gentoo vs. ubuntu specifics)
Could running kernel + user space drbd on version 8.3.11 (the same as the successful ubuntu install) help ? And how can I install lower than default version of drbd into kernel ? I am a total gentoo beginner so I have no clue what else to do ....
Any ideas how to further investigate this or how to fix it ? |
|
Back to top |
|
 |
syn0ptik Apprentice


Joined: 09 Jan 2013 Posts: 267
|
Posted: Fri Jan 11, 2013 5:01 am Post subject: |
|
|
try strace -f -o /tmp/out your_app
there mistakes in libc because
Code: | kernel: RIP [<ffffffff812e6786>] strnlen+0x14/0x1e |
happened or kernel modules.
or switch those modules for ocfs |
|
Back to top |
|
 |
petero n00b

Joined: 10 Jan 2013 Posts: 5
|
Posted: Fri Jan 11, 2013 4:52 pm Post subject: |
|
|
syn0ptik wrote: | try strace -f -o /tmp/out your_app
there mistakes in libc because
Code: | kernel: RIP [<ffffffff812e6786>] strnlen+0x14/0x1e |
happened or kernel modules.
or switch those modules for ocfs |
Hey if you mean I should run
strace -f -o /tmp/out ls
so that's what I've just tried and: the ls doesn't crash that way instead it prints correctly the content of the folder to the console. The logged /tmp/out is quite long - should I post it here (even if ls didn't crash) ?
What do you mean by: "switch those modules for ocfs" ? |
|
Back to top |
|
 |
petero n00b

Joined: 10 Jan 2013 Posts: 5
|
Posted: Sat Jan 19, 2013 11:43 pm Post subject: |
|
|
So I updated world and installed new kernel (3.7.3) but the problem remains. What else can I do ? Should I report this as a bug or... ? |
|
Back to top |
|
 |
syn0ptik Apprentice


Joined: 09 Jan 2013 Posts: 267
|
Posted: Sun Jan 20, 2013 12:25 am Post subject: |
|
|
No, it for trace. It omit couple things when it runs.
Which command be crashed? |
|
Back to top |
|
 |
randalla Tux's lil' helper


Joined: 14 Oct 2008 Posts: 79 Location: Seattle, WA
|
Posted: Sun Jul 14, 2013 7:34 am Post subject: |
|
|
I'm sorry to bring this old post back up, but I ran into the same thing today. What I also found today was a fix for it:
http://comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008
After applying the patch discussed there, I have not had any issues with symlinks on the ocfs2 partition.
Adam. |
|
Back to top |
|
 |
petero n00b

Joined: 10 Jan 2013 Posts: 5
|
Posted: Mon Aug 05, 2013 5:45 pm Post subject: |
|
|
Thx, randalla. I will give it a try. |
|
Back to top |
|
 |
666threesixes666 Veteran


Joined: 31 May 2011 Posts: 1248 Location: 42.68n 85.41w
|
Posted: Mon Aug 05, 2013 11:18 pm Post subject: |
|
|
upstream say
"Dok: issue is in ocfs2 not DRBD
Dok: the real question is 'do you really need a clustered filesystem?'
666threesixes666: what FS do you recommend for drbd? did you test jfs for it?
Dok: I have
Dok: jfs takes some voodoo to get working in rhel, but it works
Dok: DRBD is just a block device
Dok: the filesystem, as with anything else, really depends upon your expected use case
Dok: I like ext4 myself
Dok: it seems to be the most stable"
you really don't want to move backwards in versions. latest stable upstream is 8.4.3
(i seriously advise against this.....)
(as root)
Code: |
echo ">=sys-cluster/drbd-8.3.12" >> /etc/portage/package.mask
emerge -av drbd
|
and that will put you on 8.3.11-r1 |
|
Back to top |
|
 |
petero n00b

Joined: 10 Jan 2013 Posts: 5
|
Posted: Wed Aug 07, 2013 5:18 pm Post subject: |
|
|
OK, I can confirm that fix posted by randalla works for me as well.
On kernel 3.9.6 the patch is already included. (and everything works ok there out of the box)
So it seems that if you use reasonably new kernel you shouldn't run into this problem.
On the other node I have kernel 3.8.2 and I needed to manually edit fs/ocfs2/symlink.c
After that (after new kernel is built and applied), everything works fine on both nodes.
Thanks ! |
|
Back to top |
|
 |
|