any lessfs users out there?

Moriah · Posted: Thu Oct 29, 2009 2:23 am Post subject: any lessfs users out there?

Has anybody done anything with lessfs under gentoo?

see http://www.lessfs.com
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

jormartr · Apprentice Joined: 02 Jan 2008 Posts: 174

Seems a really good project.

I am trying to try it right now, but it depends on fuse 2.8, but that version is not even on portage tree.

I tried with two or three older versions of lessfs, but still these depend on that fuse version, I'll add this project to my rss reader waiting for news.

Thanks for sharing it

kernelOfTruth · Posted: Thu Oct 29, 2009 11:03 am Post subject:

hey thanks for sharing Moriah !

seems I've been looking for this kind of "filesystem" for some time

let's see how this develops

lessfs + reiserfs seems to be a winning (and rock-stable) combination when lessfs gets out of beta-state

_________________
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa

Hardcore Gentoo Linux user since 2004

Moriah · Posted: Thu Oct 29, 2009 8:22 pm Post subject:

I am interested in using it for a backup server.

I am not configuring any new systems with reiserfs, both because xfs tested faster in my own benchmarks that test the things I am concerned about timing-wise, and also because of support considerations -- remember where Mr. Reiser is now residing.

_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

Moriah · Posted: Thu Oct 29, 2009 8:46 pm Post subject:

I have one serious concern about lessfs: the absence of any double checking for hash colissions. I know it would affect speed, but there ought to be an option to enable it for the paranoid among us. Its not the probability of a collision that bothers me; its the possibility. :evil:

What a hash based dedup fs does is form a hash of whatever object dedup is being performed on -- file, block, etc. and then search an index to see if that hash already exists in the fs. If not, then a new block, file, whatever, is allocated and its hash is inserted into the index. If the hash matches, then there is a high probability that the new object has a duplicate already in the system. Thats "probability", *NOT* certainty! To be CERTAIN, you must compare the 2 objects, not just their hashes.

I have been working on a file-based dedup system as a background task for several years, but a block-based system might offer some advantages, if the block size were small enough, and if the overlieing file system never put more than a single file in a single block. This could be a problem with reiserfs.

Since my application is backup, file deletion only occurs in bulk as an essentially atomic operation. Since I want to be able to play some forensic tricks with the backup sets, I want to be able to get a list of all files on the backup fs that are identical in their contents. I also would like to be able to get a list of files that had only been appended to, not modified in any other way, such as log files, mail files, etc.

I had considered a dedup extension to LVM, but the block size would need to be coordinated with the block size of the filesystem, and this would typically impose unacceptable constraints on LVM.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

Cr0t · l33t Joined: 27 Apr 2002 Posts: 944 Location: USA

Does any good documentation exist on how lessfs works? How to set it up? I know how/what dedup is, because I work with it daily, but lessfs' docs are just horrible.
_________________
cya

©®0t

dreadlorde · Posted: Sun Mar 28, 2010 11:11 pm Post subject:

Interesting. Thanks for the link.
_________________
Ludwig von Mises Institute

Quote:

I am not to be a shepherd, I am not to be a grave-digger. No longer will I speak to the people; for the last time I have spoken to the dead.

Moriah · Posted: Mon Mar 29, 2010 2:55 am Post subject:

I would love to try it, but I need something stable that has an ebuild for gentoo before I can use it for anything real. I am interested in using it with my backup server, and also as a basis for vmware virtual disks.

And as I stated above, I am *VERY* concerned about the *POSSIBILITY* of collisions. I have learned from building reliable systems -- aircraft, space flight, military,and medical -- that its not the probability of an error;it is the possibility of an error. If that possibility can be addressed, then it should be. I think lessfs plays a bit too foot-loose and fancy-free with the probability of a colision, and forgets about the very real possibility of one. In some applications, just blindly trusting the hash might be ok, but in others, it is necessary to verify an exact match by bit for bit comparison. Of course you only do this if the hash matches first, which most of the time it will not, but you can't just ignore the possibility of a non-matching hash collision!

_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Mon Mar 29, 2010 8:45 pm Post subject:

Is lessfs doing block level dedup?

larger block size will probably make the hashes more unique but theoretical possibility of collision can't be ruled out without ACTUAL bit-wise comparison. Does anyone have a number for such occurrence? like once in a trillion?

I think we do live with probabilities in other areas also. Like bit errors in large hard drives. If you are really unlucky, the bits may flip in such a way that the checksums (both in hardware and software in ZFS or BTRFS for example) will stay the same, but your file is essentially screwed. So, essentially digital information is never "SAFE". You can only make it safer than the last best unsafe option by creating more copies, killing any advantages gained from dedup!

People have been talking about cosmic rays/EMI flipping bits in the Prius's RAM chips which causes unwanted acceleration.... :-)

Its all part and parcel of modern life!

But yeah, an IO intensive and slow "compare_on_collision" option should be provided for people me and you (moriah)!

Moriah · Posted: Mon Mar 29, 2010 9:08 pm Post subject:

Yes, they are doing block level dedup, using 2 different hash algorithms, although I do not remember the details. I seem to recall they are forming a 192 bit hash, but with a 512 byte sector ( == 4096 bit), there is a real possibility of collision here, and all probabalistic arguments aside, That's not good enough to satisfy me or the NSA/FIPS/NIST. It might be fine for storing scenes for a video game, but not for important data. It would never pass muster for DO178-B Level A flight controls, for instance.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Mon Mar 29, 2010 9:18 pm Post subject:

Moriah: Have you seen http://www.opendedup.org/ ? That provides access to the dedup'ed data over network, something perfect for backups!

It also provides inline or offline dedup. And also something called re-dup... :-)

Talk about marketing... ;-)

Haven't found much about the integrity guarantees! Please update here if you do faster than me.

Moriah · Posted: Mon Mar 29, 2010 10:10 pm Post subject:

I have been developing a practical backup server for a number of years now. There is a lot more to backup than reducing the amount of storage required, although that is certainly important too. You need encryption to protect data in transit over a network, whether internal or external. You need encrypted storage to protect off-line backup volumes from theft. You need physical security of the backup server as well as the off-site off-line storage facility. Moving all the data over a network is quite impractical for all bu the fastest connections, and even then, the expense per gigabyte of network traffic dwarfs the expense of a modern disk drive. You need redundancy of the on-line storage devices and of the off-line volumes; you need multiple sites for off-line volumes.

And all of this needs to be tied together by an operational process that includes software, hardware, facilities, and people. Because of the encryption of off-line storage volumes, you need proper key handling protocols. There is a lot of details that need to be documented, or else you do not have a secure system.

Dedup is important, and efficient network utilization is too, but, as a favorite quote I saw about 15 years ago said, "Never underestimate the bandwidth of a 747 full of DVDs." :wink:

_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

Moriah · Posted: Mon Mar 29, 2010 10:34 pm Post subject:

Just took a look at opendedup.org. The docs are pretty good, at least what I glanced at.

My problem is first that it is apparently written in Java, which makes it a research project, not a production filesystem, mainly because of the overhead of the JVM environment. Hard compiled code that goes directly down to machine language will always be faster.

Second, it is a total memory hog: the docs say, "uses about 8 GB of memory for every TB used at 4k blocks". That's way too much for anything other than a large server environment.

Third, it is not very fast:

4k chunks
85 MB/s Write
50 MB/s Read
140 MB/s Re-Write
1 TB of Data
10 GB of RAM

I routinely run 1.5 tb, and will soon be going to 2 tb RAID-1 mirrors. I can certainly see many small businesses needing to go to RAID-10 and 4tb or 6 tb, but I don't see too many small businesses today willing to pony up for a motherboard that will take over 8 gb of ram.

_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

dreadlorde · Posted: Tue Mar 30, 2010 1:04 am Post subject:

You should look at venti[1][2].

[1]http://plan9.bell-labs.com/sys/doc/venti/venti.html
[2]http://en.wikipedia.org/wiki/Venti
_________________
Ludwig von Mises Institute

Quote:

I am not to be a shepherd, I am not to be a grave-digger. No longer will I speak to the people; for the last time I have spoken to the dead.

Moriah · Posted: Wed Mar 31, 2010 1:27 pm Post subject:

OK, I have about convinced myself that it is acceptable and reasonable to ignore collisions. The reasoning goes like this:

We are going to compare the probability of an unchecked and undetected hash collision in a dedup file system such as lessfs with the probability of an undetected CRC error associated with a disk I/O operation.

If a crc32 fails to detect a disk error, then it means that the data in the disk block that was just read is not the same as the data that was written to that block, but the CRC still checks out good. What is the probability of this happening?

If a hash collision in a dedup filesystem causes the wrong data to be stored for a block, meaning the block will be read as the data that first generated that same hash, and not the data that caused the collision, then we have a collision error. What is the probability of this happening?

We may view the CRC as a type of hash for the purposes of this discussion, as it is a many-to-one mapping, as is a hash; the difference is primarily the number of bits involved. Thus we consider the crc32 as a 32 bit hash, whereas the Tiger hash used in lessfs is a 192 bit hash.

If we consider a disk block of d bits in length, and a hash of h bits in length, then we desire to find the probability of a hash collision where two distinct data blocks generate the same hash code. We assume that each hash code is equally likely, as is each data block.

There are 2^d different data blocks and 2^h different hash blocks. This means, assuming uniform distribution, that there are 2^d/2^h = 2^(d-h) data blocks that produce the same hash code, or that all collide with each other.

Since the number of collisions with a given block is 2^(d-h), and the total number of blocks is 2^d, then the probability of a collision is:

2^(d-h)/2^d = 2^(d-h-d) = 2^-h = 1/2^h

So the probability of a collision is independent of the block size. It depends only on the number of bits in the hash.

Therefore, the chance of an undetectable read error occurring because of a crc collision is 1/2^32, or less than 4e-9, whereas the probability of a hash collision in the dedup algorithm causing an error is 1/2^192, which is astronomically smaller (by a factor of 1/2^160) than the crc allowing an error to slip through.

Conclusion: Given a suitably long enough hash code, we do not need to worry about hash collisions causing errors in our data. Therefore, lessfs (and other dedup file systems) are justified in not performing a bit-for-bit compare when a hash matches a previously stored value. That is, the chance of an undetected crc disk read error occurring is much greater than the chance of a hash collision in the dedup algorithm. The dedup algorithm is many orders of magnitude more reliable than the disk drives it is running on.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Sat Apr 03, 2010 3:55 pm Post subject:

Has anyone tried this? Any results to share? How much savings are we talking about? This can potentially bring down the backup GB cost.

Anybody got an ebuild?

Moriah · Posted: Sat Apr 03, 2010 4:02 pm Post subject:

Unfortunately, its not yet available as a gentoo ebuild, but the general deduplication strategy has been used in commercial backup products for some time now -- several years anyway. The strategy used in lessfs, server side block level deduplication, is an established technique.

I have been using server-side file level deduplication on my backup server for a number of years now, and the results have been quite good. I have been getting 20 to 30 times more data stored on the backup drives than if I used no dedup at all. Block level should do even better, although how much better remains to be seen.

I am thinking about fetching the tarball for the lessfs sources and playing with it just to see how it behaves.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Sat Apr 03, 2010 4:20 pm Post subject:

devsk · Posted: Sat Apr 03, 2010 4:35 pm Post subject:

devsk · Posted: Sat Apr 03, 2010 4:45 pm Post subject:

Bug with ebuild posted at BGO: https://bugs.gentoo.org/show_bug.cgi?id=312997

Moriah · Posted: Sat Apr 03, 2010 5:17 pm Post subject:

Great! Thanks!

Unfortunately, due to starting a new assignment out of town this coming week, I will not get a chance to try it until one evening this coming week. I need to pack and get ready to go this weekend.

I will probably install this to a Spare SATA drive -- maybe even an SSD -- on my travelling laptop. I can make some backup runs to it and compare the storage used to the same backup runs made to my file level dedup strategy.

I also want to play with using it with vmware for virtual disks, but that will come after the backup trials.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Sat Apr 03, 2010 5:52 pm Post subject:

Good thing about this is that its built on an existing FS, so you can create a /data (or bind mount another ext[34] FS on /data) and try it right away.

One thing I already hated is that deleting from FS doesn't bring down you block database size. Add a 10GB back, remove it and your usage remains the same. Although it will be reused, you don't know how much actual space you have in the lessfs.

Moriah · Posted: Sat Apr 03, 2010 6:22 pm Post subject:

I guess that's because lessfs doesn't even know.

It depends on what you choose to write to it as to how much in can pack it in.

Of course, it would be nice to know how much *RAW* unused space was left in the underlying filesystem, but then backup stores are usually write-once anyway, so it probably wasn't a concern originally. Perhaps you could dig into it and see how lessfs could tell you how much space it had allocated from the underlying filesystem, and how much space it had held in reserve as a result of deletions. Remember, delete only frees up a block if the delete was the *LAST* remaining reference to that particular chunk of data. You can delete as lot of stuff and not actually free up any space at all.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.

devsk · Posted: Sat Apr 03, 2010 6:42 pm Post subject:

ahh...compression can't be disabled. With compression, not many duplicate blocks are found!

I have no idea what kind of data will I save to this FS. I am seeing very few duplicate blocks even with 16KB block size. In my limited testing, it doesn't save anything!

May be I am missing a setting or two.

devsk · Posted: Sat Apr 03, 2010 7:02 pm Post subject:

Here is an example of what I am talking about. My root filesystem:

when backed with tar and pigz, its size is 5.8GiB and backup takes 2m30sec.
when backed with tar onto lessfs, it occupies 7.3GiB and backup takes 6m21sec.

This basically means that there are not many blocks which are duplicate of each other after LZO compression. And of course, LZO compression is worse than zlib. I lose speed with lessfs because of hash calculations and searches, I gain speed with pigz because of parallel compression. And I gain better compression because of zlib.

So, what the heck are we talking about here? This is a lose lose situation.