Gentoo Forums :: View topic - TIP: Compressing portage using squashfs: initscript method

TIP: Compressing portage using squashfs: initscript method

View unanswered posts
View posts from last 24 hours

Goto page Previous 1, 2, 3, 4 ... 9, 10, 11 Next

Gentoo Forums Forum Index

Documentation, Tips & Tricks

View previous topic :: View next topic

Author

Message

jsosic
Guru
Guru

Joined: 02 Aug 2004
Posts: 510
Location: Split (Croatia)

Posted: Mon Mar 12, 2007 6:13 pm Post subject:

Squashfs not only saves space, but makes portage almost twice faster!!!

I've made two tests. First is update-eix, and second is emerge -upDv world. Computer is restarted, and then tests were made one after another. Then, Squasfs+unionfs was setup and computer restarted, with tests repeated. After seeing this, i just removed classic portage tree, I'm staying on squash!!!

Code:

update-eix 1.66s user 2.31s system 5% cpu *1:06.70 total*
update-eix 1.32s user 1.60s system 82% cpu * 3.52 total*

Code:

emerge -upDv world 9.35s user 1.08s system 31% cpu *32.987 total*
emerge -upDv world 9.20s user 1.52s system 57% cpu *18.761 total*

CPU is skyhigh, but you can always use nice! And disk almost doesn't thrash it's head around with this method, so it's much ionicer than the original portage. I don't know why this method doesn't become standard for portage tree?!?!

AWESOME
:twisted:

_________________
I avenge with darkness, the blood is the life
The Order of the Dragon, I feed on human life

stdPikachu
Apprentice

Joined: 10 Mar 2004
Posts: 254
Location: UK

Posted: Thu Jun 14, 2007 9:34 am Post subject:

mv wrote:

NaiL wrote:

Why don't do the same with:
/var/cache/edb
/var/db/pkg

These two are not worth the trouble, because they do not take too much space.

Code:

prospero ~ # du -chs /var/cache/edb/ /var/db/pkg/
111M /var/cache/edb/
172M /var/db/pkg/
282M total

That's too much space in my book. The average file size is utterly tiny as well, no wonder portage thrashes the disc so much.

Are there any plans aftoot to have inline compression built into portage? IMHO using the squashfs/unionfs kludge is annoying since you don't get user transparency. Surely there's a call for a PortageFS to be enabled by default? Portage may have some cool features, but the overheads of it are *massive* and anything that brings those down would be a good thing in my book. Using the best part of 1GB space of uncompressed text just for package management seems ludicrous to me.

Failing that, a utility that automatically creates and mounts squashfs/similar images after every emerge --sync might be useful.

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Thu Jun 14, 2007 10:57 pm Post subject:

stdPikachu wrote:

mv wrote:

NaiL wrote:

Why don't do the same with:
/var/cache/edb
/var/db/pkg

These two are not worth the trouble, because they do not take too much space.

Code:

prospero ~ # du -chs /var/cache/edb/ /var/db/pkg/
111M /var/cache/edb/
172M /var/db/pkg/
282M total

But it's not the true usage if you use reiserfs (the main bulk is then taken by the environment.bz2 files which won't compress much). Moreover, /var/cache/edb will be much smaller if you use the metadata_overlay database (there is certainly a thread or a wiki how to do that) - mine is just 5M.

Nobody prevents you from using two further symlinks to squash_dir to do the same with the mentioned directories.
However, you should of course be aware that /var/db/pkg changes much more often than /usr/portage and that portage might fail to work if it is not possible to mount this directory writable (e.g. you might get problems to emerge aufs if aufs is not running). Of course, you can still unsquashfs the directory in such an emergency situation...

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Thu Jun 14, 2007 11:30 pm Post subject:

mv wrote:

But it's not the true usage if you use reiserfs

I tested now on my system with reiserfs: Uncompressed, /var/db used 88MB, compressed it uses 50MB.
You must estimate yourself whether this is worth the possible problems (if you cannot emerge aufs due to missing writing permissions). Of course, if you use a braindead filesystem like ext3 for /var/db you should really think about changing it...

eatnumber1
n00b
n00b

Joined: 13 Jan 2007
Posts: 55
Location: New York

Posted: Mon Jun 18, 2007 5:13 am Post subject:

I've been using mv's scripts for quite some time. Thx to you!

Anyway, I'm on a laptop and was working on tmpfs mounting stuff in /var to reduce disk spinups, and I got to directories like /var/run which should be able to be tmpfs mounted, but can't because programs expect directories to be in there. Well, I figured, I should just use squashfs and tmpfs mount the rw branch! After much work, I eventually got /var/log, /var/run, and /var/db/pkg mounted in this way (I also tmpfs mounted /var/lock).

The challenge to this was that I had to get the initscript to start before bootmisc... Apparently portage has something to prevent this. Fortunatley, it also has a way to override it. Just put the following in /etc/runlevels/boot/.critical

Code:

checkroot
modules
checkfs
localmount
clock
/* Your squashfs scripts should be here */
bootmisc

I also had to change the initscript's depend function to

Code:

depend () {
need localmount
before bootmisc
}

So now I have two squash_ initscripts. One named squash_dir and one named squash_early, and all the actual scripts that I start are symlinks of one of the two.

On a different topic, I don't remember where the images were originally stored, or where the branches were originally mounted, but I do remember changing it to something I thought much better so I thought I'd share it. I have a /var/squashfs in which is the following directories: images, mnt, and tmp. Then in images is another directory called old. So in images is the current squashfs images, in images/old is backups of the most recent old image, mnt and tmp contain directories for each mount I have (linux for my kernel sources, etc...) which are my ro and rw branches.

Last thing i've noticed is that when stopping and starting the initscript, it is not correctly releasing the loop device and when I try to manually do it, it gives me an error.

P.S. Can someone tell me the latest point at which /var/db/package should be started?

P.P.S. One suggestion: To change the initscript to use a single config file similar to how gentoo does /etc/init.d/net.lo and /etc/conf.d/net. That way it would make it easier to manage multiple squashfs locations.

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Mon Jun 18, 2007 8:21 am Post subject:

eatnumber1 wrote:

/var/log, /var/run

I am not sure whether it is a really good idea to rely on unionfs/aufs for such critical parts of the system as long as these modules are not part of the kernel - if something goes wrong with a new kernel, you won't be able to boot!
Maybe a better way for these directories is to copy them to a tempfs and then use mount --bind (and for shutdown first unmount and then copy [the changed parts] back). Moreover, you will not waste a loop device this way!
Concerning /var/log, I have even more doubts, because if your system crashes you would probably like to be able to read its last words...

Quote:

and /var/db/pkg mounted in this way

This is only needed for emerge commands, so you can treat it in the same way as /usr/portage (in particular, no "early" mounting is needed),

Quote:

On a different topic, I don't remember where the images were originally stored, or where the branches were originally mounted, but I do remember changing it to something I thought much better so I thought I'd share it. I have a /var/squashfs in which is the following directories: images, mnt, and tmp. Then in images is another directory called old. So in images is the current squashfs images, in images/old is backups of the most recent old image, mnt and tmp contain directories for each mount I have (linux for my kernel sources, etc...) which are my ro and rw branches.

This has the disadvantage that all your squashfs images must remain on the same partition and that this partition is not necessarily the same as that of the original directory. I usually use the following configuration in /etc/conf.d/squash*

Quote:

# The directory you want to keep compressed:
DIRECTORY='/gentoo32/usr/share/games'
#Comment out if you want backups:
#FILE_SQFS_OLD="${DIRECTORY}.sqfs.bak"
DIR_CHANGE="${DIRECTORY}.changes"
DIR_SQUASH="${DIRECTORY}.readonly"
DIR_TMP="/gentoo32/tmp"

i.e. all data which I might really need to access is in the same (parent) directory as the directory I want to squash.

Quote:

Last thing i've noticed is that when stopping and starting the initscript, it is not correctly releasing the loop device and when I try to manually do it, it gives me an error.

I had this problem 3 times after a long uptime and after often starting/stopping/restarting several squash* scripts in various orders, but I was never able to reproduce it later on. I suspect this is a bug in mount itself. (But maybe you have run into a different problem?)

The usage of loop devices is the most serious drawback of the whole approach anyway, because it limits the number of directories which you can manage in this way dramatically. For example, I have a 64bit and a 32bit installation, and so it is reasonable to use it currently for /gentoo{32,64}/{var/db,usr/{share/games,src/kernel}, my joint portage tree (without distfiles) and a joint directory for $distfiles/{svn-src,cvs-src} which means that 8 loop devices are in permanent usage on my system...

Quote:

P.P.S. One suggestion: To change the initscript to use a single config file similar to how gentoo does /etc/init.d/net.lo and /etc/conf.d/net. That way it would make it easier to manage multiple squashfs locations.

But I would like to keep one init-script for each squashfs-mounted directory, so that you can easily force compression of a certain directory with

Code:

/etc/init.d/squash... restart

(e.g. for backup purposes which is particularly useful for the kernel). As I understand, you do not mind having separate /etc/init.d/squash* files, but you want only one /etc/init.d/squash_dir file. I am not convinced that this is a clearer configuration, because then in this file each variable would need an additional index to determine the name. Do you know which mechanism is used in /etc/init.d/net.lo to read /etc/conf.d/net and to determine "lo"? When I glanced through it, it seems to make use of undocumented features (e.g. ${svclib}) so I am afraid such an initscript won't work after the next baselayout update.

eatnumber1
n00b
n00b

Joined: 13 Jan 2007
Posts: 55
Location: New York

Posted: Mon Jun 18, 2007 1:26 pm Post subject:

Quote:

I use kamikaze sources which has unionfs in the kernel... so even if I mess something up, i'll still be able to mount it. Also, the directories in /var/run are not so critical as to make the system completely unbootable, it just makes it so many programs cannot start. That is still very fixable. Lastly, I could always just revert to my old kernel, fix whatever I did wrong and boot into the new one again =).

Quote:

Moreover, you will not waste a loop device this way!

When I noticed the fact that it was unable to release the loop devices after unmounting, I increased the number of loop devices to 64... so I don't think wasting loop devices should be worried about.

Quote:

Concerning /var/log, I have even more doubts, because if your system crashes you would probably like to be able to read its last words...

This is true, and is the one big drawback to having /var/log mounted in this way, I think the benefit of not having the disk spin up *almost* randomly to write the logs outweighs the potential loss of data. Also, if you crashed due to a reproduceable error, you could always just disable the init script and reproduce the error, then you'd be able to read the logs.

Quote:

This has the disadvantage that all your squashfs images must remain on the same partition and that this partition is not necessarily the same as that of the original directory.

I see the problem of having all of them on one partition, but I prefer to have them in one easily manageable, central location. I don't see fact that the partition is not necessarily the same as that of the original directory as a disadvantage... am I missing something significant?

Quote:

Last thing i've noticed is that when stopping and starting the initscript, it is not correctly releasing the loop device and when I try to manually do it, it gives me an error.

Upon (re)investigating this problem, it turns out that a loop device is only locked indefinitely if a new squashfs image is created. As an example, if I recently emerge --sync'ed, and I then re-squash the tree, I then notice that losetup -a has two /usr/portage devices, but mount only reports one as in use.

Code:

kiki init.d # losetup -a
*snip*
/dev/loop/5: [0803]:2409097 (/var/squashfs/images/portage.sqfs)
/dev/loop/6: [0803]:1440 (/var/squashfs/images/portage.sqfs)

Code:

kiki init.d # mount
*snip*
/var/squashfs/images/portage.sqfs on /var/squashfs/mnt/portage type squashfs (ro,loop=/dev/loop6)
unionfs on /usr/portage type unionfs (rw,dirs=/var/squashfs/tmp/portage=rw:/var/squashfs/mnt/portage=ro)

So I then do losetup -d /dev/loop/5 to try to get rid of the unused one, and I get this:

Code:

kiki init.d # losetup -d /dev/loop/5
ioctl: LOOP_CLR_FD: Device or resource busy

Also, fuser /dev/loop/5 and lsof | grep /dev/loop/5 produce no output (nothing is using it).
Is this the problem you were having?

Quote:

The usage of loop devices is the most serious drawback of the whole approach anyway, because it limits the number of directories which you can manage in this way dramatically.

Like I began to say earlier, I have max_loop=64 on my kernel line in grub which gives me 64 loop devices to play with... so I don't think i'll be running out any time soon, and if I do I can just increase it again.

Quote:

But I would like to keep one init-script for each squashfs-mounted directory, so that you can easily force compression of a certain directory with

Code:

/etc/init.d/squash... restart

I was suggesting one squash_dir script in /etc/init.d which the initscripts you use is symlinked off of that (which you do) and one config file for all squash_* in /etc/conf.d (like /etc/conf.d/squash_unionfs or something) much like how /etc/conf.d/net works.

Quote:

Do you know which mechanism is used in /etc/init.d/net.lo to read /etc/conf.d/net and to determine "lo"? When I glanced through it, it seems to make use of undocumented features (e.g. ${svclib}) so I am afraid such an initscript won't work after the next baselayout update.

I just had a look at the net.lo script and it does seem to use those undocumented features

Code:

*snip*
local iface="${SVCNAME#*.}"
*snip*

So if you are afraid of using these which probably won't go away, but may, you could do some sed or awk magic in the file, parsing one line at a time and use a configuration like

Code:

linux directory /usr/src/linux
linux order unionfs

This way is easier to manage config files for multiple squashfs images, and most importantly provides the potential for global variables such as

Code:

all order unionfs

P.S. One more feature i'd like to request: The ability to add stuff to the depend() function w/o having to change the initscript. The net.lo script provides this functionality (although it does depend on undocumented features). See the net.lo script for an example.

P.P.S. My apologies for the long post.

stdPikachu
Apprentice

Joined: 10 Mar 2004
Posts: 254
Location: UK

Posted: Mon Jun 18, 2007 3:51 pm Post subject:

mv wrote:

Hmm. I have had problems with Reiser before, and am very much an ext3/JFS man. I don't think that the package database should depend on using a particular filesystem for good performance.

Thanks for pointing portage's capability to use various DB backends though; I've just switched my main machine to use sqlite (one of my favourite little apps, I wish more progs supported it!) and now have a dependency cache of 23MB (uncompressed) in a single file which has reduced disc thrashing significantly, plus the ability to perform SQL dep queries

Are there any plans to do similar things for the package database (/var/db/pkg) too?

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Mon Jun 18, 2007 6:13 pm Post subject:

eatnumber1 wrote:

I increased the number of loop devices to 64...

I was not aware that this is possible, because the kernel did not seem to have a configuration option for this.

Quote:

if I recently emerge --sync'ed, and I then re-squash the tree, I then notice that losetup -a has two /usr/portage devices

How do you re-squash the tree? With /etc/init.d/squash_portage restart? I cannot not reproduce this.
Maybe for some reason some part of /usr/portage is in use so that the umount fails and so the loop device is not freed. However, the script will even try a "lazy" umount in such a case, so something really severe must prevent the umount (but perhaps "lazy" does not work well with loop devices - maybe this is the "bug" in umount which I meant).

Quote:

I have max_loop=64 on my kernel line [...] and if I do I can just increase it again.

What is the maximal possible number? Certainly not more than 256, because each loop has its own minor device number...

Quote:

I just had a look at the net.lo script and it does seem to use those undocumented features

Code:

*snip*
local iface="${SVCNAME#*.}"
*snip*

Actually, it must use much more, because this won't load /etc/conf.d/net automatically...

Quote:

do some sed or awk magic in the file, parsing one line at a time and use a configuration like[...]

No, that is not a good idea: The power of gentoo configuration is that it uses bash, and so you can easily write all sort of magic within your config (in particular, testing for all sort of conditionals using external programs, sourcing other files etc):

Quote:

most importantly provides the potential for global variables such as

Code:

all order unionfs

You want to set global variables? Just source some "squash_defaults" file from within your config file which sets these variables. You want only one file? Then link all /etc/conf.d/squash_* to the same file and use the mentioned undocumented magic by yourself. Here is an example (assuming that you have links from /etc/conf.d/squash_portage and /etc/conf.d/squash_db to this file):

Code:

db_vars () {
DIRECTORY='/var/db'
FILE_SQFS_OLD="${DIRECTORY}.sqfs.bak"
}
portage_vars () {
DIRECTORY='/usr/portage'
}

"${SVCNAME#squash_}"_vars

DIR_CHANGE="${DIRECTORY}.changes"
DIR_SQUASH="${DIRECTORY}.readonly"
DIR_TMP='/tmp'

Quote:

P.S. One more feature i'd like to request: The ability to add stuff to the depend() function w/o having to change the initscript.

Untested:

Code:

#!/sbin/runscript
source /etc/init.d/squash_dir
depend () {
# new dependencies
before localmount
}

steveL
Watchman

Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

Posted: Mon Jun 18, 2007 10:21 pm Post subject:

stdPikachu wrote:

Hmm. I have had problems with Reiser before, and am very much an ext3/JFS man. I don't think that the package database should depend on using a particular filesystem for good performance.

I am a recent convert to ext3 after 6 years exclusively on reiser. I saw this: http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc but didn't quite believe it til I was working on update which I run via symlink. I was getting all kinds of random garbage in the script and weird bash errors. Finally I switched to ext3, and it's never happened again. I still use reiser for /usr/portage as it's all signed and recoverable via sync (and fetch for distfiles) but I wouldn't recommend it for anything I wanted to keep. ext2 is fine for tmp imo.

Quote:

Thanks for pointing portage's capability to use various DB backends though; I've just switched my main machine to use sqlite (one of my favourite little apps, I wish more progs supported it!) and now have a dependency cache of 23MB (uncompressed) in a single file which has reduced disc thrashing significantly, plus the ability to perform SQL dep queries

Are there any plans to do similar things for the package database (/var/db/pkg) too?

Hmm, sounds really interesting.. /me logs onto irc.freenode.org to find out more..

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Tue Jun 19, 2007 11:53 am Post subject:

This is now really not the topic of the thread, but however let me reply:

steveL wrote:

I saw this: http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc

This article suggests that the ext3 journal policy could save you from power failures for bad hardware, but this is simply wrong: The problem is that in case of power failure not only the sector you wanted to write gets wrong data but perhaps even the next sector(s) or, in case that the sector register looses power first, even a completely false sector gets updated. There is no file system which can decrease this risk. In fact, the risk is even higher with ext3, because it simply writes more data (because of the journal redundancy).

Quote:

I was getting all kinds of random garbage in the script and weird bash errors. Finally I switched to ext3, and it's never happened again.

Such bad and unlikely errors can happen everywhere. If they happen while you used ext3, they would probably never happen again after you switched to reiserfs. There are many people who use reiserfs (or ext3 or whatever) for many years and never had any problem.

Quote:

Are there any plans to do similar things for the package database (/var/db/pkg) too?

I hope not, because this is something you really want to have in plain text. However, internally portage uses another database in /var/cache/edb for those parts of /var/db/pkg which it needs often (that's why /var/cache/edb is 5M on my machine, although I use the metadata backend).

steveL
Watchman

Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

Posted: Wed Jun 20, 2007 12:09 am Post subject:

mv wrote:

Er that didn't make much sense to me, it must be a linguas issue, sorry. I don't care if it's writing a bit more if its emphasis is on keeping my data safe first, and performance second. Each to their own, although it seems pretty clear:

Quote:

Why doesn't ext3 get hit by this? Well, because ext3 does physical block journalling. This means that we write the entire physical block to the journal and, only have the updates to the journal are commited, do we write the data to the final location on disk. So if you yank out the power cord, and inode tables get trashed, they will get restored when the journal gets replayed.

It's not even about the power off issue to me; portage (on reiser) came up with garbage in its files; a --sync soon sorted that, but it's happened to others on IRC as well.

mv wrote:

Quote:

I was getting all kinds of random garbage in the script and weird bash errors. Finally I switched to ext3, and it's never happened again.

Yes I know, like I said, I used reiser exclusively for 6 years. But there is no way those errors were acceptable to me, and it does make me wonder about glitches I used to blame on buggy software in Mandrake. And it doesn't happen with ext3 at all, so my experience means I cannot recommend reiser in good faith. YMMV as ever.

mv wrote:

Quote:

Are there any plans to do similar things for the package database (/var/db/pkg) too?

Well I like plain-text too. But I really recommend using -metadata-transfer if you want to speed up sync. Kuroo can't deal with it last time I looked, which was why we wrote update in the first place. As for a db backend it's been talked about for ages, and pkgcore has support for various backends, although I am sure they could do with help on them. It's in the tree now as well (autounmask really helps :)

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Wed Jun 20, 2007 2:27 pm Post subject:

steveL wrote:

I don't care if it's writing a bit more if its emphasis is on keeping my data safe first, and performance second.

My point is that especially in case of power failures/hardware problems/... there is nothing which can guarantee that your data is safe. It is simply a matter of "luck" whether writing more redundancy gives you more security or more risk, because each writing operation (even into a journal) is a risk. Reiser4 is of course better in this respect, since by its "dancing trees" concept it has the full security without doubling the writing operations. But unfortunately, one cannot expect to see Reiser4 ever in the official kernel (and moreover, AFAIK Reiser4 has serious fragmentation problems). In fact, all filesystems in the linux kernel have so serious drawbacks that none of them deserves be used. :wink:

Quote:

It's not even about the power off issue to me; portage (on reiser) came up with garbage in its files

I understood this, but something must have been the cause of this, although it is practically impossible to find out the reason now: Either some hardware failure/power failure or some software failure must have been the cause. (Many years ago there was a bug in reiserfs which could cause such an effect, but this bug was fixed also many years ago, so practically I think one can exclude software failure meanwhile.)

Quote:

But there is no way those errors were acceptable to me

Probably those people who swear on reiserfs/ext3 are all those who had such inacceptable (but actually usually due to hardware) errors with ext3/reiserfs. I am also not exception in this respect

Moreover, I had followed some of the discussions about the (non-)inclusion of the well-functioning e2comp code (compression for ext2) several years ago, and this were reasons enough for me to not use code of the ext2/3 maintainers anymore. This "it is new and works well, but I haven't written it, therefore it is bad by definition, and we must invent some reason to abolish it"-attitude is unbearable to me.

Quote:

Well I like plain-text too. But I really recommend using -metadata-transfer if you want to speed up sync.

Why do you say "But"? I also suggested this (although I didn't have a link at hand...). One has to distinguish the two types of databases which portage needs: One is the database of all packages in the portage tree (/usr/portage/metadata and /var/cache/edb/dep). The other is the database of all installed packages and their files (/var/db/pkg). metadata-transfer (or the db backends) only affect the former. But I was talking about the latter which is more delicate, because any problem in the corresponding database would wreck your whole gentoo installation. Since all portage substitutes should update this database correctly when installing packages and, moreover, already many scripts rely on this database, I would not like to see dramatic changes in its format.

steveL
Watchman

Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

Posted: Wed Jun 20, 2007 6:44 pm Post subject:

mv wrote:

My point is that especially in case of power failures/hardware problems/... there is nothing which can guarantee that your data is safe.

That's what journalling is for; the point is that the syscall shouldn't return until the write to the journal at least is completed. If there's a power-failure, the write isn't marked as committed (and yes this is an awful lot like database transactions.) So yeah, maybe there's scope for redesign in that, but reiser isn't doing it atm. Reiser3 seems to have died, and 4 is still way too experimental to recommend.

Quote:

It is simply a matter of "luck" whether writing more redundancy gives you more security or more risk, because each writing operation (even into a journal) is a risk.

See above. You appear to be saying that the risk is proportional to the amount of data written, which may be true, but it has no bearing on this discussion afaict.

Quote:

Reiser4 is of course better in this respect, since by its "dancing trees" concept it has the full security without doubling the writing operations. But unfortunately, one cannot expect to see Reiser4 ever in the official kernel (and moreover, AFAIK Reiser4 has serious fragmentation problems). In fact, all filesystems in the linux kernel have so serious drawbacks that none of them deserves be used. :wink:

Yes dear

Quote:

It's not even about the power off issue to me; portage (on reiser) came up with garbage in its files

Er on what do you base that assertion? Are you saying it's not a software failure? Odd then that it never happens with ext3, and that others report similar issues when they notice them. Even weirder is your assertion that since one bug was fixed in Reiser many years ago, there are no others.

Quote:

But there is no way those errors were acceptable to me

Probably those people who swear on reiserfs/ext3 are all those who had such inacceptable (but actually usually due to hardware) errors with ext3/reiserfs. I am also not exception in this respect

Please get this straight: it is exactly the same hardware. And please don't accuse me of NIH - i never wrote any of the filesystems I have used in the past 25 years. So yeah, when I see a filesystem suddenly produces random garbage I tend not to want to use it. Do you blame me? And there was no reason for me to notice it before; it's clearly something that shows up with symlinks (and there may well be other bugs.)
And yes, I push the machine just as hard now as I did before, so overheating etc are not the cause. The only problem since switching I have had, was with the portage tree on reiser. I'm happy to keep that, as I described before.

Quote:

Well I like plain-text too. But I really recommend using -metadata-transfer if you want to speed up sync.

Sure, that's why I like plain-text :-)

Yes, missed your earlier post about metadata-transfer, my bad.

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Wed Jun 20, 2007 8:51 pm Post subject:

steveL wrote:

That's what journalling is for

No, this is a common misunderstanding. No journalling can save you from problems during power failure. It can only reduce the risk a little bit so that chances are good that you still have a consistent filesystem if the failure was not too bad.

Quote:

the point is that the syscall shouldn't return until the write to the journal at least is completed. If there's a power-failure, the write isn't marked as committed (and yes this is an awful lot like database transactions.) So yeah, maybe there's scope for redesign in that, but reiser isn't doing it atm.

reiser is doing this. The argument in the url you posted is that if power loss occurs during playing the journal and the disk destroys for this reason more data then it was supposed to write (namely a whole sector) reiser cannot restore this whole sector, because not the whole original sector but only the modified data is stored in the journal. But of course, nobody can guarantee that the dying disk destroys only one sector during dying, i.e. ext3 is not really much safer than reiser in this situation (and as mentioned, since reiser writes less data, there are other situations in which reiser is safer, so there is not much difference from the security viewpoint).

What you actually want (that no syscall should return until it can guarantee a successfull operation) is an atomic commit. The only filesystem which supports this conceptually is Reiser4. That's why I was not really joking when I said that each filesystem in the kernel has serious drawbacks: Reiser4 is the first one which is conceptionally good (but has other drawbacks).

Quote:

Reiser3 seems to have died, and 4 is still way too experimental to recommend.

Unfortunately.

Quote:

Er on what do you base that assertion? Are you saying it's not a software failure? Odd then that it never happens with ext3, and that others report similar issues when they notice them.

If you look for a while in various forums you will find such reports for each filesystem. And if you then follow the discussions, it usually turns out to be a hardware issue which happened once (or in some cases even regularly) for whatever reason.

Quote:

Please get this straight: it is exactly the same hardware. And please don't accuse me of NIH

I am sure that this happened to you as you described. But it happened to you once in 6 years. Once within 6 years for whatever reason some block was written wrong (or to the wrong place) in such a way that your system lost consistency. It is not so unlikely that this happens to one out of several thousand persons after a long while and could happen with every filesystem. And it is not so surprising that it does not happen immediately again to the same person. I am rather sure that if you reformatted a new reiserfs instead of ext3, you would not have the problem either.

Quote:

Even weirder is your assertion that since one bug was fixed in Reiser many years ago, there are no others.

I only heard about this one bug in reiserfs which could cause such a bad data loss (and actually many people who speak about bad experience with reiserfs were a victim of this bug). Of course, this does not mean that there cannot be another bug, but at least in the discussions which I had observed so far, it seems that nobody has experienced anything which was statistically so extraordinary or reproducible. So it seems likely that there is not another such serious bug (of course, there were some smaller bugs which had been unobserved for quite a while, but such "cosmetic" bugs are different issue.)

steveL
Watchman

Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

Posted: Wed Jun 20, 2007 10:53 pm Post subject:

mv wrote:

No journalling can save you from problems during power failure. It can only reduce the risk a little bit so that chances are good that you still have a consistent filesystem if the failure was not too bad.

Er sorry, I disagree. Journalling can and should provide this. My aside about the design was on whether this has to be done with the dual write you mentioned, or whether the logical approach can have merit.

Quote:

Hmm.. I need to think about that for a while, tbh. One thing that springs to mind is if the sector can never be restored by reiser (since it only exists in one place) and might be recoverable by ext3, I am already happier with ext3. But like I say, I am not concerned about the power loss situation, since I have never had a problem with them since I started using journalling.

Quote:

What you actually want (that no syscall should return until it can guarantee a successfull operation) is an atomic commit. The only filesystem which supports this conceptually is Reiser4. That's why I was not really joking when I said that each filesystem in the kernel has serious drawbacks: Reiser4 is the first one which is conceptionally good (but has other drawbacks).

Yeah I know; I am glad you know the terminology as well, it makes the discussion easier

Quote:

Er on what do you base that assertion? Are you saying it's not a software failure? Odd then that it never happens with ext3, and that others report similar issues when they notice them.

Yes but the assertion you made was:

Quote:

this bug was fixed also many years ago, so practically I think one can exclude software failure meanwhile

And you have just acknowledged that development of reiser3 is effectively frozen; so how can you be so sure there are zero bugs in it?

Quote:

Please get this straight: it is exactly the same hardware. And please don't accuse me of NIH

No, the point is I only noticed it when I started developing update, ie the only time one of my code files has been a symlink (since I run it from /sbin/update.) And it didn't happen once, it happened again and again. I am more than willing to accept that my CPU might overheat once in a while, but as I said it never happens with ext3, no matter how long I have been working, and I have more faith in ext3 as a well-maintained fs, which has taken years to get the acceptance of ext2 users. Maybe that's because it is more reliable on flaky hardware? The point is, it is more reliable, and gives the power-off benefit we discussed.
And come on, guys, how many of us are running on machines we built ourselves; can you really be so certain the hardware isn't flaky? And what about BIOS problems we see all the time, do you really think OEMs are immune?
Having said that, it'd be great to debug reiser3 at some point, if I get time and others are motivated.

Quote:

Even weirder is your assertion that since one bug was fixed in Reiser many years ago, there are no others.

Possibly, but it's equally likely no-one ever noticed, just like I never did. After all, people who run reiser tend to be those who have gone with it against advice, so they are more likely to be, say, running unstable software. I just used to be so glad I wasn't on Windoze, I didn't care if there were glitches

Thanks for the discussion so far. I've always liked the forums more than the dev m-l, and this conversation illustrates why.

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Thu Jun 21, 2007 7:51 am Post subject:

steveL wrote:

mv wrote:

No journalling can save you from problems during power failure.

Er sorry, I disagree. Journalling can and should provide this.

One of the typical effects of power failure is that RAM looses consistency first. If your machine writes some data to the journal while ram looses consistency (but still writes that flag that the journal was updated correctly) you will have rubbish once the journal was replayed. (A checksum over the journal might help in such a case but AFAIK no filesystem does this). But things can even be worse, since the sector number (i.e. to where the data should be written) might have lost consistency, so it might happen that the filesystem actually writes rubbish somewhere on the disk.

Quote:

One thing that springs to mind is if the sector can never be restored by reiser (since it only exists in one place) and might be recoverable by ext3, I am already happier with ext3.

Yes, in this situation you have more luck with ext3. On the other hand, if the power loss happens during writing the journal (as in the above example) you have more luck with reiserfs, because ext3 will write the full block of rubbish data from the journal when replaying it while reiserfs will only write the wrong changes and not destroy the existing data in this block. It all depends on where the power loss happens and what it does...

Quote:

But like I say, I am not concerned about the power loss situation, since I have never had a problem with them since I started using journalling.

Are you sure that you never had any power loss (under which I summarize also things like switching off the running machine or similar situations)? If you had even only one, how can you be sure that your problems do not originally come from this one? (The bad things about filesystem inconsistencies is that they can be like a growing illness: They usually do not affect something visible immediately)

Quote:

No, the point is I only noticed it when I started developing update, ie the only time one of my code files has been a symlink (since I run it from /sbin/update.)

My whole gentoo configuration is made of symlinks (on reiserfs to reiserfs), I never had any problems. And I would guess that the part of the code which treats symlinks is the same for all filesystems (i.e. I would expect that only the part which reads/writes the symlink itself is filesystem specific), so I doubt very much that the symlink itself was the cause of your problems. I would expect that it is only by accident that your problems occurred in this moment.

Quote:

And it didn't happen once, it happened again and again.

Did It happen again on a freshly formatted reiserfs? Or "only" on an perhaps already corrupted filesystem? (In the latter case, I should ask you about the reiserfschk version you used - it might be a bug in it which did not find this corruption.)

Quote:

After all, people who run reiser tend to be those who have gone with it against advice, so they are more likely to be, say, running unstable software.

reiserfs was the default filesystem for SuSE for many years (which at least in Germany was the most popular distribution in that time), so many people were even advised to run it.

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Fri Jun 22, 2007 11:26 am Post subject: A python or aufs bug

Putting /var/db into a squash+aufs mount revealed a serious bug of either python or aufs:
The python code

Code:

#!/usr/bin/python
import sys, os
os.rename("/var/db/pkg/app-emacs", "/var/db/pkg/app-emacs.bak")

leads to the strange error message

Code:

OSError: [Errno 18] Invalid cross-device link

if /var/db is mounted using suashfs+aufs.
This means that the automatic update of portage (renaming of installed packages) will fail with this error.

It is really a python problem: The corresponding mv-command works without any errors. Moreover, after that "manual" mv-command (and renaming back the directory), the python script executes without any error: The reason is obviously that /var/db/pkg/app-emacs then exists in the overlayed (writable) directory branch and not only in the sqfs-branch.

Does somebody have an idea for a workaround (so that one can keep /var/db aufs-mounted but use portage anyway)?

Diredicker
n00b
n00b

Joined: 26 Apr 2006
Posts: 31
Location: NL

Posted: Tue Jul 24, 2007 12:41 pm Post subject:

Is this compress function also possible with the portage replacement paludis?

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Mon Aug 06, 2007 8:21 pm Post subject: Re: A python or aufs bug

mv wrote:

Putting /var/db into a squash+aufs mount revealed a serious bug of either python or aufs

Just for the records: This was fixed with portage-2.1.3_rc6. Thanks to Zac Medico for fixing the problem with the os.rename usage.

Diredicker wrote:

Is this compress function also possible with the portage replacement paludis?

I am not sure what this question means. The squashfs+unionfs/aufs approach is not related at all with portage. It should work with any program/directory you want (perhaps with some unexpected restrictions as demonstrated by the above mentioned bug).

Corona688
Veteran
Veteran

Joined: 10 Jan 2004
Posts: 1204

Posted: Tue Aug 07, 2007 6:52 pm Post subject:

steveL wrote:

People don't always have a choice, though. if you want to install Gentoo on a 2GB hard drive, you'll want to install with ReiserFS; no other filesystem I know of can cram all of portage and a couple kernel source trees into that amount of space.

Closer to topic, I remember doing something like this to save space on a minimal(and I mean *minimal*) Gentoo system; Pentium-133 with 1GB hard drive space... except I didn't cram my portage tree, I crammed /usr/share/docs/, /usr/share/man, etc. Here's a tip for that; uncompressed files cram better than compressed ones, so decompress the manpages before making a cramfs out of them. (Be sure to fix symlinks after.) Better performance too, since your computer doesn't have to decompress any particular manpage twice.
_________________
Petition for Better 64-bit ATI Drivers - Sign Here
http://www.petitiononline.com/atipet/petition.html

steveL
Watchman

Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

Posted: Tue Aug 07, 2007 11:20 pm Post subject:

Corona688 wrote:

Er if I only had a 2GB hdd on the box, i wouldn't even put portage on it, i'd mount it over NFS and most likely use a binhost since i am guessing it would be an old, slow, machine with not much RAM.
Besides, I still use reiser for my portage tree, since all the files get checked against sums and are easily restored with a --sync. I just wouldn't trust my root and /home to it any more. But as ever, it's down to what you choose; it's your box

mv
Watchman

Joined: 20 Apr 2005
Posts: 6780

Posted: Sun Aug 19, 2007 11:12 am Post subject:

An Update of the initscripts (including squashfs + aufs/unionfs/funionfs mounting/saving) is available here. The main advantage of the new version is that you can use a temporary directory for changes if e.g. you need a squashfs-mounted directory temporarily writable but do never want to save the changes permanently. A sample application of this case is e.g. when you install texlive with the texlive ebuild, making use of the cdinstall useflag, and you want to mount /usr/share/texmf-dist and /usr/sahre/texmf-doc from a squashfs: In some cases you might want e.g. to compile (temporarily) some doc or example from these directories without copying the whole directory. Note that some environment variables have changed in the new release, most notably: DIR_TMP was removed. Instead a general mechanism was introduced how you can specify masks for temporary files/directories. In particular, it is now easy to create the temporary squashfs-file on the same partition as the original squashfs-file so that moving is very fast (and it is now no longer reasonable to use a ramdisk unless you have extraordinary much ram). Another minor point is that the bug about the mysterious loss of loop devices discussed earlier in this thread is now probably avoided by no longer attempting lazy unmounts if the normal unmount fails. The disadvantage of this solution is now that the script fails to stop if the directory is still in use. You can restore the previous behavior by setting a variable.

synss
Apprentice

Joined: 08 Mar 2006
Posts: 282
Location: Dijon > Berlin > Tokyo > Nürnberg > München

Posted: Mon Aug 27, 2007 12:09 pm Post subject:

mv wrote:

An Update of the initscripts (including squashfs + aufs/unionfs/funionfs mounting/saving) is available here.

I am just back and I use your script now, and it is very nice!!

_________________
Compress portage tree
Elog viewer
Autodetect swap

synss
Apprentice

Joined: 08 Mar 2006
Posts: 282
Location: Dijon > Berlin > Tokyo > Nürnberg > München

Posted: Thu Sep 13, 2007 7:16 am Post subject:

baselayout-2 is nearing completion and will only use POSIX shell initscripts, i.e. no bashisms. I had a look at your (mv's) script but it is a bit over my head to patch it so that it does not fail on dash... If any one is able and has time, that would be a nice thing to do. _________________ Compress portage tree Elog viewer Autodetect swap

Display posts from previous:

	Gentoo Forums Forum Index Documentation, Tips & Tricks	All times are GMT Goto page Previous 1, 2, 3, 4 ... 9, 10, 11 Next
Page 3 of 11

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Copyright 2001-2024 Gentoo Foundation, Inc. Designed by Kyle Manna © 2003; Style derived from original subSilver theme. | Hosting by Gossamer Threads Inc. © | Powered by phpBB 2.0.23-gentoo-p11 © 2001, 2002 phpBB Group
Privacy Policy