Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Call for testers: a tool to make use of multicore CPU easier
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Fri Jan 29, 2010 5:31 pm    Post subject: Call for testers: a tool to make use of multicore CPU easier Reply with quote

UPDATE: I'm proud to say that app-shells/prll is available from portage. Many thanks to jlec for adopting it. This thread serves to help development of prll. I will announce coming releases and I ask for your help in testing them. All comments are greatly appreciated. If you wish to participate, skip to the end of the thread and see what's up.


Hello!

I've written a small tool that makes running parallel jobs in a shell a breeze, and I'm looking for testing and suggestions. It's been cooking a while now and works fine for me. I want to know whether it works fine for you as well.

It's called prll, pronounced "parallel". I used to call it mapp because of the way it works (it maps a function to an array), but that name has been claimed many times over. Basically, I got fed up with the fact that bash and zsh don't include a simple tool to run N processes at a time. If I have a quad core, I want to be able to resize a 100 photos four at a time. A simple task, but to do it efficiently, you have to jump through hoops, because the shell won't do it for you. So goodbye one-liners. I love one-liners :D . And that's what prll is for.

Details and the code can be found on the homepage. There is also a longer discussion of alternative ways to do it and their pros and cons. There is an older version in sunrise because the newer hasn't been approved yet, but for testing it, you don't really need to emerge it.

If you check the "Known Issues" section in the README, you will see that the only major feature missing (taht I could come up with) is the ability to take arguments on stdin. I hope that some of you will have a different usage pattern than I do and will have other ideas on how to extend prll. I lost a lot of time trying to fight the shell in order to keep a multicore CPU busy, and I hope prll will spare others the grief.

I'd also like to make a more specific request. If you check how Ctrl+C handling is implemented, you can see that there is a possibility for a race condition. I'm not sure what happens if the jobserver receives the signal, but the shell doesn't. I would be grateful if someone can come up with a good test for that. And more general tests as well - I would like to include a 'make test' check.

Thanks


Last edited by ExZombie on Sun Apr 18, 2010 8:21 pm; edited 1 time in total
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Fri Jan 29, 2010 6:19 pm    Post subject: Reply with quote

I'll start trying it now with a few benchmarks.

I'll report back with how well it handles for me.
_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
cruzki123
Apprentice
Apprentice


Joined: 16 May 2008
Posts: 263

PostPosted: Fri Jan 29, 2010 6:45 pm    Post subject: Reply with quote

Uhmm I have looking for something like that for some time. I will test it.

Thanks
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Jan 29, 2010 8:07 pm    Post subject: Reply with quote

simply amazing !
converting 12 ogg files to mp3
Code:

time ogg2mp3 --quality=9 *.ogg
real   0m29.078s
user   0m39.508s
sys   0m1.331s
time prll -s 'ogg2mp3 --quality=9 "$1"' *.ogg
real   0m9.230s
user   0m51.224s
sys   0m1.330s


note: i need to add "" else the run was failing with spaces in filenames.

again with 74 ogg
Code:

without
real   2m56.154s
user   3m59.711s
sys   0m9.360s
with
real   0m49.728s
user   5m48.024s
sys   0m9.109s

The good point for lame or the scheduller, without prll, all my cores were working, but sadly none was busy to more than 12% (this time real bad point to my scheduler), with it, all cores are busy to 100%.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10654
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Jan 29, 2010 8:47 pm    Post subject: Reply with quote

Hmm. xargs already does this. Check out the --max-procs (-P) option. For example:
Code:
find . 'name *.ogg' | xargs -n1 -P4 ogg2mp3 --quality=9
will run ogg2mp3 on 4 cores at once. :wink:

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Jan 29, 2010 9:31 pm    Post subject: Reply with quote

don't ask me why but it's worst with xargs, my cores are even less working with it (near 2-4% only!), with your command line given, except i change to -P8 (not real cores, but they are working with prll so i was thinking it was fair to set it to 8 ) <--- edit just to add a space to prevent it showing a stupid smiley
Also, not critical, but bugging me, xargs couldn't be stop with ctrl+c when prll stop all jobs with one query.
Code:

time find . 'name *.ogg' | xargs -n1 -P8 ogg2mp3 --quality=9
real   3m27.315s
user   5m58.261s
sys   0m12.793s

But really something went wrong, when i use prll i see lame upto 8 times running in htop, wasn't the case with xargs, i only saw 1 lame and sometimes 2 ogg123 running, but really only 1 lame... I redo the prll test just after to be sure nothing really affect my computer (lol i'm working while playing with that).
Code:
real   0m48.368s
user   5m49.338s
sys   0m8.930s
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Sat Jan 30, 2010 1:06 pm    Post subject: Reply with quote

john_r_graham wrote:
Hmm. xargs already does this. Check out the --max-procs (-P) option. For example:
Code:
find . 'name *.ogg' | xargs -n1 -P4 ogg2mp3 --quality=9
will run ogg2mp3 on 4 cores at once. :wink:

- John


Indeed, xargs can do parallel execution. But it is cumbersome to use complex commands with it, whereas with prll, you give it a function which can be anything. To do the same with xargs, you have to write a script or wrap your commands in a 'bash -c' or something, and that makes interactive use unnecessarily difficult.

However, mention of xargs reminds me of an example of what kind of testing I would like you guys to do, if you would be so kind :) . If you read this, you will see that xargs has problems with correctness in some cases. I would like to make sure prll doesn't. I don't have unfettered access to a multicore yet, so I can only do real world testing occasionally. On my box, I use 'sleep', but that's not real concurrency. Having a 'make test' target that can be distributed for people to run would be very helpful.
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Wed Feb 17, 2010 1:15 pm    Post subject: Reply with quote

It's time for a bump. I've made substantial progress and have written a small suite of tests. However, the way the makefile is written is not appealing to me. It works, but I fear that a BSD user will download prll and the whole 'make test' will explode in his face because I'm not sure what is proper/portable makefile syntax :? . I thought about replacing make with a shell script, but then tests can't be run in parallel, and they are not at all fast. Also, I don't really know how to test whether graceful termination works well, i.e. whether it unsets temporary functions and such.

I'd like your opinion before I make a release.
Code:
git://prll.git.sourceforge.net/gitroot/prll/prll


Thank you
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Wed Feb 17, 2010 2:32 pm    Post subject: Reply with quote

I've been using it as I build LFS to untar all of the files and downlaod the patches I need. I've gone from 20mins of download time to about 5. Love this script.

I have a freebsd vm and openbsd vm that I can try this in. I'll report back later.
_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
albright
Advocate
Advocate


Joined: 16 Nov 2003
Posts: 2588
Location: Near Toronto

PostPosted: Wed Feb 17, 2010 4:03 pm    Post subject: Reply with quote

this is great - I convert video to run on my nokia tablet so
this is perfect.
_________________
.... there is nothing - absolutely nothing - half so much worth
doing as simply messing about with Linux ...
(apologies to Kenneth Graeme)
Back to top
View user's profile Send private message
cal22cal
n00b
n00b


Joined: 19 Jan 2006
Posts: 36

PostPosted: Fri Feb 19, 2010 7:11 am    Post subject: Reply with quote

Is it possible to get the return code from the calling function?

Say,

time prll -s 'unrar t -idq "$1"' *rar

Thx for your great tool.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Feb 19, 2010 8:49 am    Post subject: Reply with quote

ExZombie wrote:
It's time for a bump. I've made substantial progress and have written a small suite of tests. However, the way the makefile is written is not appealing to me. It works, but I fear that a BSD user will download prll and the whole 'make test' will explode in his face because I'm not sure what is proper/portable makefile syntax :? . I thought about replacing make with a shell script, but then tests can't be run in parallel, and they are not at all fast. Also, I don't really know how to test whether graceful termination works well, i.e. whether it unsets temporary functions and such.

I'd like your opinion before I make a release.
Code:
git://prll.git.sourceforge.net/gitroot/prll/prll


Thank you


some notes:
1/ you should add a version in the program, can't see what version i'm running/testing :/
2/ i'm not familiar with git, the command you gave doesn't work, adding "git thecommand" doesn't work too
3/ i have download from the your url version 0.3.1 but make test doesn't work with it (as i don't see any test reference in the Makefile i suppose that version doesn't have the test suit)
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Fri Feb 19, 2010 7:56 pm    Post subject: Reply with quote

@stobbsm: I would be very grateful for that. A friend has been testing prll on MacOS, and that has resulted in a multitude of fixes which I just commited. Since Macs have BSDish userland, I expect prll should now work normally, but still, I would appreciate your testing.

@cal22cal: prll by it's nature shouldn't and doesn't care about return values. If you wish to take a look at them, I suggest a function in the form of
Code:
do_the_things_you_want; echo $1 $?;

This way, you will have status printed on standard output, prefixed with the argument the function got.

@krinn: I'm not sure what you mean about the version. All tarballs are versioned :? . You are correct about version 0.3.1 lacking tests.
As for git, simply run
Code:
git clone git://prll.git.sourceforge.net/gitroot/prll/prll

This will create a directory called 'prll' which holds entire development history, although hidden from plain sight. If you wish to update it later, run 'git pull'.
If you wish to test prll without installing it, you need to source the prll.sh file and set your path to
Code:
PATH="/your/prll/directory:$PATH"

See one of the tests for how they do it.

Although now that I think about it, I guess I should have made a git ebuild.
Thank you for your comments :) .
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10654
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri Feb 19, 2010 8:43 pm    Post subject: Reply with quote

@ExZombie, it would be really helpful if you wrote an ebuild and, optionally, got it committed to the Sunrise overlay. It would make your project more accessible and get you more testers. :)

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Fri Feb 19, 2010 9:04 pm    Post subject: Reply with quote

It's already in sunrise, but it's not a live ebuild. I'll include a live ebuild with the next commit to sunrise if the devs agree (I'm not sure what the policy is on live ebuilds). But it's not worth hurrying about this, as getting things into the reviewed tree can take quite a while.
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Fri Feb 19, 2010 10:10 pm    Post subject: Reply with quote

Here's a good stress test, that I will try in FreeBSD shortly.
I copied all the .tar.* files from my distfiles to my Home Directory.

I have 897 archives weighing in a 3.1gb total compressed. Extracted it's 863 folders and 113 files totaling 15gb.

Disk throughput is of course a factor, but still, it's rather impressive.

I extracted them all twice, first with
Code:
time for x in `ls ../Distfiles/*.tar.*` ; do tar axf $x ; done
real   11m18.977s
user   9m2.946s
sys   1m20.274s



Second using prll:
Code:
time prll -s 'tar axf $1' ..Distfiles/*.tar.*
real   6m10.951s
user   9m20.515s
sys   1m19.797s

_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7470

PostPosted: Fri Feb 19, 2010 10:38 pm    Post subject: Reply with quote

ExZombie wrote:

@krinn: I'm not sure what you mean about the version. All tarballs are versioned :?

something more like:
prll -v, --version, --help that might output it, or when usage is display...
tarball versionning is great to grab it, but not great when you don't really remember witch one you own
So
Code:
prll --version
prll: v0.3.1
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Fri Feb 19, 2010 10:42 pm    Post subject: Reply with quote

So far with my FreeBSD testing, it is about what you'd expect. The VM is only using 2 cores instead of 4, so it's not that big of a difference, but it's big enough.

Only 3 files restricted to .tar.bz2.

Code:
time for x in `ls /usr/ports/distfiles/*.tar.bz2` ; do tar zxf $x ; done
real     0m3.717s
user    0m1.896s
sys     0m0.814s


Code:
time prll -s 'tar jxf $1' /usr/ports/distfiles/*.tar.bz2
real    0m3.080s
user   0m1.952s
sys    0m1.397s

_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
cal22cal
n00b
n00b


Joined: 19 Jan 2006
Posts: 36

PostPosted: Sat Feb 20, 2010 2:22 am    Post subject: Reply with quote

Finally, get the Exit code for unrar by change the prll.sh code to
Code:
 echo "PRLL: Job number $prll_jarg finished. Exit Code: "$? 1>&2

Thx man. :wink:
Back to top
View user's profile Send private message
tallica
Apprentice
Apprentice


Joined: 27 Jul 2007
Posts: 152
Location: Lublin, POL

PostPosted: Sat Feb 20, 2010 11:08 am    Post subject: Reply with quote

It's great! FLAC compression using Phenom II x4 3,6GHz

Code:
time flac -8V *.wav

real   1m20.008s
user   1m15.974s
sys   0m1.018s


Code:
time prll -s 'flac -8V "$1"' *.wav

real   0m23.931s
user   1m11.994s
sys   0m1.154s

_________________
Gentoo ~AMD64 | Audacious
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Sat Feb 20, 2010 11:23 am    Post subject: Reply with quote

cal22cal, krinn: your suggestions have been implemented, thanks.

stobbsm: Thanks for BSD testing. Would you mind testing it on OpenBSD as well? A 'make test' run would be sufficient. I'll release version 0.4 as soon as OpenBSD is confirmed.

I also made a live ebuild. eclasses sure make this easy :) . Grab it here.
Make sure you accept the ** keyword. If you wish to run 'make test', emerge with FEATURES="test".
Back to top
View user's profile Send private message
cal22cal
n00b
n00b


Joined: 19 Jan 2006
Posts: 36

PostPosted: Sat Feb 20, 2010 4:12 pm    Post subject: Reply with quote

ExZombie
I am still using version prll-0.3.1 and have done the followings
1. Created a RAM disk.
2. cp -R /usr/src/linux-2.6.32.7 to_the_RAM_disk
3. cd to_the_RAM_disk/linux-2.6.32.7
4. time prll -s 'md5deep -lr "$1"' * > ../pb.md5
5. time md5deep -lrX ../pb.md5 *
6. show up some missing files with check sum.

Seemed showing somethings like the xarg problem, though.
Back to top
View user's profile Send private message
ExZombie
Apprentice
Apprentice


Joined: 29 May 2004
Posts: 170

PostPosted: Sat Feb 20, 2010 7:39 pm    Post subject: Reply with quote

Oh god... I feel ashamed now. I knew about the xargs bug, and I forgot to test prll against it :x . Thanks for pointing it out.

It seems my gut feeling works very well, though. It felt like a buffering problem. Lo and behold: it is! I'll document it in the README, and explain here what I did.

First, let's make a control file. xargs works damn well for sequential execution, so let's do it.
Code:
find linux-2.6.28/ -type f -print0 | xargs -0 md5sum  > ~/xsums

The xsums file has 25254 lines, which is correct.
Next, try a parallel version of the same thing:
Code:
find linux-2.6.28/ -type f -print0 | xargs -0 -n1 -P4 md5sum > ~/xpsums

The xpsums file is quite corrupt. It has 24652 lines, some of them corrupt as shown by 'md5sum -c xpsums'.
Next, prll:
Code:
find linux-2.6.28/ -type f -print0 | prll -s 'md5sum "$1"' -0 > ~/sums 2>/dev/null

This file has 25195 lines. This isn't as bad as xargs. Also, none of the lines are corrupt, it's just that some are missing. Still, it's bad. The funny thing is, if you check prll's stderr (which the above command doesn't), you can see that all jobs execute successfully. It's just that the output on stdout doesn't make it through.

The solution proved very simple:
Code:
find linux-2.6.28/ -type f -print0 | prll -s 'md5sum "$1"' -0 2>/dev/null | bfr -b 2k > ~/sums2

Inserting a small buffer just after prll does the trick. The bfr utility is in portage.
To see if it works with xargs as well:
Code:
find linux-2.6.28/ -type f -print0 | xargs -0 -n1 -P4 md5sum | bfr -b 2k > ~/xpsums2

Yup, it does.

Results:
Code:

$ wc -l *sums*
  25195 sums
  25254 sums2
  24652 xpsums
  25254 xpsums2
  25254 xsums
 125609 total


As I said, I acted on gut feeling. I'd really like to know what's going on behind the scenes. Streams have their own buffering. That pipes would loose data like this is not funny. I guess it's a race that happens when a background job writes to stdout while it's being flushed. Maybe if they run in background, they don't block on write and output is lost.


There is something I wish to point out. Using prll to md5sum the kernel tree is a fine test for prll, but is not something prll is good at. The md5sum program is made to get an argument list of files. It then loops over this list, which is much faster than having a shell (or, equivalently, prll) do it. And creating argument lists is exactly what xargs is made for, and it does it very well.
prll, on the other hand, is made for processing arguments individually. It focuses on executing complex commands without putting them into scripts or wrapping them into 'bash -c'. Commands where you have more arguments after $1, commands with multiple subcommands, pipelines and such.
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Sat Feb 20, 2010 7:44 pm    Post subject: Reply with quote

cal22cal wrote:
ExZombie
I am still using version prll-0.3.1 and have done the followings
1. Created a RAM disk.
2. cp -R /usr/src/linux-2.6.32.7 to_the_RAM_disk
3. cd to_the_RAM_disk/linux-2.6.32.7
4. time prll -s 'md5deep -lr "$1"' * > ../pb.md5
5. time md5deep -lrX ../pb.md5 *
6. show up some missing files with check sum.

Seemed showing somethings like the xarg problem, though.


I did the same kind of test, but on downloaded files using md5sum (I don't have md5deep).

I had no data loss whatsoever. What does md5deep do that md5sum doesn't?
_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
stobbsm
Guru
Guru


Joined: 23 May 2004
Posts: 452

PostPosted: Sun Feb 21, 2010 2:30 am    Post subject: Reply with quote

In openbsd I have 1 error so far. When in the root directory and running make test, it has an error.

It seems the make -C tests doesn't chdir like it does on FreeBSD and Linux.

When I enter the tests directory it works like a charm!

I'll look into the chdir option for freebsd.
_________________
Sysadmin of Ubuntu systems and servers....
Although my own server is gentoo....
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum