Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
If EXT4, don't use 3.5.7 / 3.6.2 kernels [OBSOLETE]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
aCOSwt
Bodhisattva
Bodhisattva


Joined: 19 Oct 2007
Posts: 2537
Location: Hilbert space

PostPosted: Wed Oct 24, 2012 4:57 pm    Post subject: If EXT4, don't use 3.5.7 / 3.6.2 kernels [OBSOLETE] Reply with quote

ext4 data corruption regression
_________________


Last edited by aCOSwt on Wed Oct 31, 2012 8:51 am; edited 1 time in total
Back to top
View user's profile Send private message
gerard27
Advocate
Advocate


Joined: 04 Jan 2004
Posts: 2377
Location: Netherlands

PostPosted: Wed Oct 24, 2012 5:47 pm    Post subject: Reply with quote

Thanks aCOSwt.
I installed 3.5.7 two days ago.
No problems so far but to be on the safe side I switched back to 3.4.9.
Gerard.
_________________
To install Gentoo I use sysrescuecd.Based on Gentoo,has firefox to browse Gentoo docs and mc to browse (and edit) files.
The same disk can be used for 32 and 64 bit installs.
You can follow the Handbook verbatim.
http://www.sysresccd.org/Download
Back to top
View user's profile Send private message
bandreabis
Advocate
Advocate


Joined: 18 Feb 2005
Posts: 2495
Location: イタリアのロディで

PostPosted: Thu Oct 25, 2012 6:17 am    Post subject: Reply with quote

Affected kernels have been Hard Masked!
At least yesterday they were.
Back to top
View user's profile Send private message
aCOSwt
Bodhisattva
Bodhisattva


Joined: 19 Oct 2007
Posts: 2537
Location: Hilbert space

PostPosted: Thu Oct 25, 2012 6:41 am    Post subject: Reply with quote

Could be some poorly tested specific option's fault :
Quote:
> the full set of options for all my ext4 filesystems are:
>
> rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
> usrquota,grpquota,commit=30,stripe=16,data=ordered,usrquota,grpquota

ok journal_async_commit is off the reservation a bit; that's really not
tested, and Jan had serious reservations about its safety.

* Can you reproduce this w/o journal_async_commit?

_________________
Back to top
View user's profile Send private message
c00l.wave
Apprentice
Apprentice


Joined: 24 Aug 2003
Posts: 268

PostPosted: Sat Oct 27, 2012 12:01 pm    Post subject: Reply with quote

Note that there's a null-pointer dereference occuring when large files are being deleted from ext4 filesystems in 3.4.9, which was my main reason to upgrade to and stay at 3.5.7 on the systems I maintain (I've actually hit the null-pointer bug when moving backups of many tens of gigabytes across disks). It's much more probable to hit that bug than hitting the journal bug that led into masking panics - for what I read, the current bug is considered to occur only under very specific circumstances that require having created and mounted the filesystem with uncommon options.

The null-pointer bug seems to have been missed by Gentoo devs but now has a bug report as well (at least it reads like the same one I encountered).

So I would add "don't use 3.4.9 either" or "but run 3.5.7/3.6.2 anyway if you run default filesystems" (without warranty, you'd better have backups either way).
_________________
nohup nice -n -20 cp /dev/urandom /dev/null &
Back to top
View user's profile Send private message
platojones
Veteran
Veteran


Joined: 23 Oct 2002
Posts: 1602
Location: Just over the horizon

PostPosted: Sat Oct 27, 2012 12:51 pm    Post subject: Reply with quote

Mask has been lifted for 3.6.2.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Sat Oct 27, 2012 4:41 pm    Post subject: Reply with quote

platojones wrote:
Mask has been lifted for 3.6.2.
Not surprised. It was an esoteric bug reproducible only on an esoteric configuration.
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
platojones
Veteran
Veteran


Joined: 23 Oct 2002
Posts: 1602
Location: Just over the horizon

PostPosted: Sat Oct 27, 2012 11:37 pm    Post subject: Reply with quote

ppurka wrote:
platojones wrote:
Mask has been lifted for 3.6.2.
Not surprised. It was an esoteric bug reproducible only on an esoteric configuration.


Not sure it's been reproduced at all. Only the original reporter on the thread and supposedly one other (2nd hand report) so far. Ts'o has yet to be able to reproduce it.
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Nov 02, 2012 8:15 am    Post subject: Reply with quote

As a user, I'm thoroughly confused. I run a stable system as much as possible. I was on 3.4.9, a routine update installed 3.5.7. I rebuilt the kernel and removed 3.4.9, then 3.5.7 was masked and I re-emerged 3.4.9, rebuilt the kernel, and removed 3.5.7, now emerge -auvND world wants to re-install 3.5.7

I don't understand the mount option problem. My /etc/fstab has the following line:
Code:
/dev/sda2               /               ext4            noatime         0 1


What is the latest stable safe kernel to run? Should I mask both 3.4.9 and 3.5.7 ?
What mount options are safe to use? I don't remember the full line when I created the file system years ago. How can I display this?
Should I tar off the system and reformat the drive with ext3? Random data loss is a scary thing. I applaud wholeheartedly those individuals who take the risk to test these kernels, but I don't want risk on my personal system.
Back to top
View user's profile Send private message
c00l.wave
Apprentice
Apprentice


Joined: 24 Aug 2003
Posts: 268

PostPosted: Fri Nov 02, 2012 8:46 am    Post subject: Reply with quote

I don't think the 3.4.9 bug causes random data loss - loss should happen only to files that were being written/deleted at that time. After the null-pointer dereference occured, I found the backup files I copied to be randomly 0 byte size on either target or source and one file cut off. Having upgraded to 3.5.7 I compared file sizes and copied the larger file to the destination drive but I haven't tried if the data is still ok (those were only old backups moved to make space for newer ones). I'm not a kernel developer but the effect of the 3.4.9 bug does not appear to be worse than simply cutting power while writing to disk - the journal will revert any pending transactions and fsck will check for structural conistency.

If you don't remember having set any fancy options for your ext4 partitions, I wouldn't mind the bug in 3.5.7. However, it would be much more severe if it stroke. It's your own choice but I stayed with 3.5.7 so far.

To be completely safe, you could also choose a kernel older than 3.4. I wouldn't want to "downgrade" to ext3, though.

(Your "noatime" mount option is nothing special, it just disables the usually unnecessary "access time" logging.)
_________________
nohup nice -n -20 cp /dev/urandom /dev/null &
Back to top
View user's profile Send private message
aCOSwt
Bodhisattva
Bodhisattva


Joined: 19 Oct 2007
Posts: 2537
Location: Hilbert space

PostPosted: Fri Nov 02, 2012 8:51 am    Post subject: Reply with quote

Tony0945 wrote:
As a user, I'm thoroughly confused. I run a stable system as much as possible. I was on 3.4.9, a routine update installed 3.5.7. I rebuilt the kernel and removed 3.4.9, then 3.5.7 was masked and I re-emerged 3.4.9, rebuilt the kernel, and removed 3.5.7, now emerge -auvND world wants to re-install 3.5.7

I don't understand the mount option problem. My /etc/fstab has the following line:
Code:
/dev/sda2               /               ext4            noatime         0 1


What is the latest stable safe kernel to run? Should I mask both 3.4.9 and 3.5.7 ?
What mount options are safe to use? I don't remember the full line when I created the file system years ago. How can I display this?
Should I tar off the system and reformat the drive with ext3? Random data loss is a scary thing. I applaud wholeheartedly those individuals who take the risk to test these kernels, but I don't want risk on my personal system.


You observed the 3.5.7 -> 3.4.9 -> 3.5.7 flip flop because
1/ 3.5.7 was flagged stable
2/ 3.5.7, by precaution following the problem object of this thread, 3.5.7 was reflagged ~arch => 3.4.9 became last stable
3/ 3.5.7, the problem object of this thread is believed marginal => 3.5.7 comes back stable.

Last x86_64 gentoo stable today is 3.5.7

You do not have to worry with the mount options which probably triggered this option as long as you use default mount options.
Safe mount options are default mount options, that is why... they are default... :twisted:

This is what I get for example for an ext4 in my system.
Code:
LABEL=M_1_G64_VAR       /var                            ext4    defaults,noatime,nodiratime             0 2

The user having the problem was *not* using default mount options.

(BTW, there is no problem with noatime, nodiratime either, even if they are not default)
_________________
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Nov 02, 2012 8:35 pm    Post subject: Reply with quote

Thanks for the prompt response.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum