Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Lzma - wow!
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Wed Jan 13, 2010 1:48 am    Post subject: Lzma - wow! Reply with quote

So, I was comparing the various compression techniques and I found it amazing that the real entropy of the file of size 210MB+ was close to just 800KB. Look at this:

The file is a text file created by cat'ing /var/log/messages over and over.

Code:
# lt /var/tmp/mytestfile*
-rw-r--r-- 1 root root 217453638 2010-01-12 00:48 /var/tmp/mytestfile2
-rw-r--r-- 1 root root  20901610 2010-01-12 00:49 /var/tmp/mytestfile2.gz
-rw-r--r-- 1 root root  20831025 2010-01-12 00:49 /var/tmp/mytestfile2.pigz
-rw-r--r-- 1 root root  15097046 2010-01-12 17:24 /var/tmp/mytestfile2.bz2
-rw-r--r-- 1 root root    816164 2010-01-12 17:27 /var/tmp/mytestfile2.lzma
Here are the times:
Code:

# time lzma -z -M max -T 8 -c -9 /var/tmp/mytestfile2 > /var/tmp/mytestfile2.lzma
real    0m41.331s

# time bzip2 -c /var/tmp/mytestfile2 > /var/tmp/mytestfile2.bz2
real    0m27.679s

# time gzip -c /var/tmp/mytestfile2 > /var/tmp/mytestfile2.gz
real    0m2.951s

# time pigz -c /var/tmp/mytestfile2 > /var/tmp/mytestfile2.pigz
real    0m0.737s


pigz wins on time but lzma massacres the competition on the size. Less than 1MB for LZMA while the rest are close to 20MB. I could reduce the time by using lower level for LZMA but then its only as good as bzip2.

Look at the decompression time:
Code:

# time lzma -d -c /var/tmp/mytestfile2.lzma > /var/tmp/mytestfile3
real    0m0.186s

# time bzip2 -d -c /var/tmp/mytestfile2.bz2 > /var/tmp/mytestfile3
real    0m2.893s

# time gzip -d -c /var/tmp/mytestfile2.gz > /var/tmp/mytestfile3
real    0m0.868s
That's a massacre by lzma. Decompression is REALLY fast.

So, LZMA seems like the best choice for the situation where you need to compress-once-and-use-forever. Livecd is the ideal candidate for this. I don't mind if it takes 5 minutes to create. As long as it decompresses fast and takes fraction of space, I am fine with it.

Now, I am patiently waiting for the squashfs with LZMA support to land in the kernel.

Notes:

1. The real entropy of the file is the starting point of the 8MB /var/log/messages which I cat'ed over and over to create this 217MB file. LZMA came close to noticing that.
2. All tests were done in RAM to avoid I/O delays (note the folder /var/tmp). This is a pure test of the compression algo.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Wed Jan 13, 2010 2:40 am    Post subject: Reply with quote

Very interesting set of tests. Does this have any implication to the usage of lzma to compress the kernel?

Personally, I have the compression in the kernel set to gzip because "apparently" gzip takes less time to decompress. Now, this "apparently" is in question :?

Also, does the compression of the 8MB original file by lzma also lead to same final size of ~800kB? This would mean that lzma is really close to the actual entropy of the source :)
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
Mike Hunt
Watchman
Watchman


Joined: 19 Jul 2009
Posts: 5287

PostPosted: Wed Jan 13, 2010 2:53 am    Post subject: Reply with quote

I needed to cat /var/log/messages over 12 million times, and boy are my fingers sore!

Code:
 # xz -z -M max -T 8 -c -9 /var/tmp/mytestfile2 > /var/tmp/mytestfile2.xz

 # ls -l /var/tmp/mytestfile2*
-rw-r--r-- 1 root root 217946214 Jan 12 21:25 /var/tmp/mytestfile2
-rw-r--r-- 1 root root     31852 Jan 12 21:28 /var/tmp/mytestfile2.xz


compressing it in xz was about 9 times faster than bzip2, decompression of xz was almost instantaneous.

app-arch/xz-utils is keyword masked and is blocked by app-arch/lzma-utils because of stable app-portage/eix
everything else is fine though because of:
Code:
DEPEND="${DEPEND}
        || ( app-arch/xz-utils app-arch/lzma-utils )"
Back to top
View user's profile Send private message
cach0rr0
Bodhisattva
Bodhisattva


Joined: 13 Nov 2008
Posts: 4123
Location: Houston, Republic of Texas

PostPosted: Wed Jan 13, 2010 3:19 am    Post subject: Reply with quote

meaningless as my test is, since this isn't real data, but rather hugely redundant

Code:

$ dd if=/dev/zero of=/home/meat/zeros
2421999+0 records in                               
2421999+0 records out                               
1240063488 bytes (1.2 GB) copied, 22.5843 s, 54.9 MB/s


I wasn't interested in time so much as overall compression

lzma -z -M max -T 8 -c -9 zeros > zeros.lzma
bzip2 zeros

yielded:

Code:

-rw-r--r-- 1 meat meat  909 Jan 12 21:04 zeros.bz2
-rw-r--r-- 1 meat meat 171K Jan 12 21:07 zeros.lzma


I don't know how useful this is at all in determining the extent to which lzma compresses massively redundant data VS bz2, but thought worth sharing.
Back to top
View user's profile Send private message
LesCoke
n00b
n00b


Joined: 02 Jun 2008
Posts: 48
Location: Denton, Tx

PostPosted: Wed Jan 13, 2010 3:56 am    Post subject: Reply with quote

Compression algorithms use various techniques to reduce the size by replacing redundancy with shorthand. I suspect that a single copy of your original log file will compress to a size very near the same as the results when using the file containing multiple copies.

Text compresses very well because each word / phrase can be replaced with a number. Frequent words get smaller numbers than less frequent words.

Files containing long sequences of identical bytes can be compresses to a shorthand form: Duplicate value XX, N times,...

I'd be more interested in the results of compressing a large e-book.

Les
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Wed Jan 13, 2010 4:35 am    Post subject: Reply with quote

ppurka wrote:
Very interesting set of tests. Does this have any implication to the usage of lzma to compress the kernel?

Personally, I have the compression in the kernel set to gzip because "apparently" gzip takes less time to decompress. Now, this "apparently" is in question :?

Also, does the compression of the 8MB original file by lzma also lead to same final size of ~800kB? This would mean that lzma is really close to the actual entropy of the source :)
Yeah. I had the intermediate file of size 36MB which I had cat'ed six times. That 36MB file gave a size of 790KB...:-) So, LZMA is pushing it almost to the limit of the source entropy with -9.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Wed Jan 13, 2010 5:19 am    Post subject: Reply with quote

Code:
/var/tmp/portage> time lzma -z -c -9 a > a.lzma
lzma -z -c -9 a > a.lzma  422.27s user 1.12s system 98% cpu 7:11.23 total
/var/tmp/portage> time gzip -c -9 a > a.gz; time bzip2  -c -9 a > a.bz2   
gzip -c -9 a > a.gz  24.46s user 0.13s system 98% cpu 24.896 total
bzip2 -c -9 a > a.bz2  54.13s user 0.16s system 98% cpu 55.103 total
/var/tmp/portage> ll
total 253M
-rw-r--r-- 1 root root 215M Jan 13 00:00 a
-rw-r--r-- 1 root root  16M Jan 13 00:14 a.bz2
-rw-r--r-- 1 root root  22M Jan 13 00:13 a.gz
-rw-r--r-- 1 root root 253K Jan 13 00:12 a.lzma
/var/tmp/portage> cp /var/log/emerge.log a-orig
/var/tmp/portage> time lzma -z -c -9 a-orig > a-orig.lzma; time gzip -c -9 a-orig > a-orig.gz; time bzip2 -c -9 a-orig > a-orig.bz2
lzma -z -c -9 a-orig > a-orig.lzma  6.53s user 0.03s system 98% cpu 6.657 total
gzip -c -9 a-orig > a-orig.gz  0.38s user 0.00s system 98% cpu 0.393 total
bzip2 -c -9 a-orig > a-orig.bz2  0.87s user 0.01s system 98% cpu 0.883 total
/var/tmp/portage> ll
total 257M
-rw-r--r-- 1 root root 215M Jan 13 00:00 a
-rw-r----- 1 root root 3.4M Jan 13 00:15 a-orig
-rw-r--r-- 1 root root 260K Jan 13 00:16 a-orig.bz2
-rw-r--r-- 1 root root 349K Jan 13 00:16 a-orig.gz
-rw-r--r-- 1 root root 222K Jan 13 00:16 a-orig.lzma
-rw-r--r-- 1 root root  16M Jan 13 00:14 a.bz2
-rw-r--r-- 1 root root  22M Jan 13 00:13 a.gz
-rw-r--r-- 1 root root 253K Jan 13 00:12 a.lzma
This is the real comparison (with -9 for both gzip and bzip2) :)
By the way, you guys have some different version of lzma. My version of lzma (lzma-utils is installed) doesn't have support for -T or -M. Secondly, you have some real good system there devsk! My lzma took over 7 minutes!
Mike Hunt wrote:
I needed to cat /var/log/messages over 12 million times, and boy are my fingers sore!
For loops to the rescue for me :)
Code:
for ((i=0;i<=60;i++)); do cat /var/log/emerge.log >> /var/tmp/portage/a; done

_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
Mike Hunt
Watchman
Watchman


Joined: 19 Jul 2009
Posts: 5287

PostPosted: Wed Jan 13, 2010 5:46 am    Post subject: Reply with quote

Actually, I used a loop like this:
Code:
for i in $(seq 1 1000000); do cat /var/log/messages >> /var/tmp/mytestfile2; done

Othetwise, I would probably be cat'ing for a couple of months! :lol:
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Wed Jan 13, 2010 5:49 am    Post subject: Reply with quote

My main box has i7 920 OCed to 4.4Ghz...:-) So, it just tears through stuff!

I have xz-utils. The -T option currently doesn't do anything. Once that gets implemented, I think the lzma compression will just fly. Think 8 threads with HT on...yeah baby! Parallel mksquashfs creates a 600MB livecd in under 40 seconds.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 3003
Location: Bay Area, CA

PostPosted: Wed Jan 13, 2010 8:13 am    Post subject: Reply with quote

I found this little utility called freearc. I am not sure why it is not there in a portage. This is extremely fast and extremely efficient in compressing. Get a load of this:

Code:
$ time ./arc create -mt8 -m9 /var/tmp/mytestfile2.arclzma /var/tmp/mytestfile2
FreeArc 0.60 creating archive: /var/tmp/mytestfile2.arclzma                   
Compressed 1 file, 217,453,638 => 703,401 bytes. Ratio 0.3%                   
Compression time: cpu 9.43 secs, real 5.66 secs. Speed 38,409 kB/s           
All OK                                                                       
real    0m5.710s

$ cd /
$ \rm  /var/tmp/mytestfile2
$ time unarc x /var/tmp/mytestfile2.arclzma
FreeArc 0.60 unpacker. Extracting archive: mytestfile2.arclzma
Extracting var/tmp/mytestfile2 (217453638 bytes)
All OK
real    0m0.692s
$ md5sum /var/tmp/mytestfile2.org /var/tmp/mytestfile2
24696247c934b7d581c156f001f362b6  /var/tmp/mytestfile2.org
24696247c934b7d581c156f001f362b6  /var/tmp/mytestfile2
So, not only this program created a file which is just 703KB (12% smaller than xz-utils) but it did it in 5.71 seconds compared to 40+ seconds for xz-utils. It decompresses in slower than xz-utils but it is still sub-second, so no big deal. Besides, its faster than gzip in decompression.

Now that's what I call compression!
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Wed Jan 13, 2010 8:54 am    Post subject: Re: Lzma - wow! Reply with quote

devsk wrote:
The file is a text file created by cat'ing /var/log/messages over and over.

Such kind of tests are bogus: It essentially measures only the size of dictionary of the compressor. If one copy of the file is longer than the dictionary (which is probably the case for most compressors you used except perhaps lzma-utils/xz-utils with -9) they are of course worse by some factors, since a compressor with a sufficiently large dictionary essentially stores only "now repeat the last thing x times". If the original file (which is repeated) gets larger, then also lzma-utils/xz-utils will "suddenly" produce results which are larger by some factors.
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Wed Jan 13, 2010 12:54 pm    Post subject: Reply with quote

ppurka wrote:
Very interesting set of tests. Does this have any implication to the usage of lzma to compress the kernel?


In fact, I know that we can use lzma for the record, but I never tested it.

Anyone tested that actually ?
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6780

PostPosted: Wed Jan 13, 2010 1:46 pm    Post subject: Reply with quote

d2_racing wrote:
ppurka wrote:
Very interesting set of tests. Does this have any implication to the usage of lzma to compress the kernel?


In fact, I know that we can use lzma for the record, but I never tested it.

On x86 systems with 512MB RAM, it usually gives an out-of-memory error when booting with grub. On amd64 it works fine. The size difference is, as to be expected, some percentage, I do not remember in the moment.
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Wed Jan 13, 2010 5:00 pm    Post subject: Reply with quote

Out of memory, well that's weird :P
Back to top
View user's profile Send private message
mikegpitt
Advocate
Advocate


Joined: 22 May 2004
Posts: 3224

PostPosted: Wed Jan 13, 2010 6:20 pm    Post subject: Reply with quote

LZMA compression time is looong. I was playing around with it on a livecd I was building, and the compression time took about 1.5+ hours. This was up from around 15-20 mins for bzip2. Annoying if you are editing and need to rebuild something many times.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9890
Location: almost Mile High in the USA

PostPosted: Wed Jan 13, 2010 6:24 pm    Post subject: Reply with quote

Gzip still has its uses but lzma is pretty nice...

Code:

doujima:/tmp# time lzma < vmlinux > vmlinux.lzma

real    0m6.763s
user    0m6.596s
sys     0m0.120s
doujima:/tmp# time bzip2 < vmlinux > vmlinux.bz

real    0m4.372s
user    0m4.280s
sys     0m0.043s
doujima:/tmp# time gzip -9 < vmlinux > vmlinux.gz

real    0m0.857s
user    0m0.827s
sys     0m0.030s
doujima:/tmp# ls -l vmlinux*
-rwxr-xr-x 1 root root 3494717 Jan 13 11:19 vmlinux*
-rw-r--r-- 1 root root 1653809 Jan 13 11:20 vmlinux.bz
-rw-r--r-- 1 root root 1685150 Jan 13 11:20 vmlinux.gz
-rw-r--r-- 1 root root 1399599 Jan 13 11:19 vmlinux.lzma

_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum