I HATE Portage

Hypnos · Posted: Mon Jan 22, 2007 7:25 am Post subject:

First, it's obvious that crap code in C can certainly be slower than good code in Python.

Second, I personally can't think of a single counterexample where the *same* algorithm, from top to bottom, is faster in Python than C. Taking smart Python code with standard idioms like list comprehensions, and if you re-implement the Python code and CPython library/interpreter routines in pure C, you'll get a significant performance boost. The only thing I've seen that puts a dent into this gap is Psyco because it can introduce machine-level optimizations; things get interesting when it's integrated with the interpreter, as in IronPython.

If this is wrong, I'm glad to learn.
_________________
Personal overlay | Simple backup scheme

ferringb · Retired Dev Joined: 03 Apr 2003 Posts: 357

ferringb · Retired Dev Joined: 03 Apr 2003 Posts: 357

Hypnos · Posted: Tue Jan 23, 2007 3:04 am Post subject:

Your point that the language implementation != language, and that each implementation has its strengths and weaknesses, is well taken. Thanks for the detailed and informative post.

I don't want to get into a semantic debate about what an algorithm is

I would call it moving bits around, and my original point is that Python, being so abstract and dynamic, makes it tough to ensure that your particular code is being implemented efficiently at the level of moving bits around. If a particular implementation is doing something that really speeds up the execution of your code, that can be re-implemented outside of Python.

In the specific example you give, what if you use glib's representation for strings rather than char*, and g_alloc instead of the malloc? I believe, like CPython, you would have the penalty of calling library routines, but possibly win with the introspection on the string object and buffered allocation of memory. Again, the code would look rather similar.
_________________
Personal overlay | Simple backup scheme

ferringb · Retired Dev Joined: 03 Apr 2003 Posts: 357

Hypnos · Posted: Wed Jan 24, 2007 5:40 am Post subject:

Definitely extra cruft or "daft work" becomes a problem when you have a bottleneck -- bandwidth, memory, CPU time. How do you package large numbers of similar, but not identical, data records?

I've seen the following with large (> 1GB) scientific data sets:

* When bandwidth limited, use an inscrutable packing format, compression and error checking.

* When memory limited, use raw, flat representations with accompanying "status" or "logging" info to cut it up one chunk at a time.

* When CPU limited, use records aggregate objects like structs with machine types.

In the default Portage cache backend, you run into a memory limitation, but also require text file storage, human readability, and reasonable speed. So you come to the project of optimizing Python's string handling?
_________________
Personal overlay | Simple backup scheme

bLUEbYTE84 · Posted: Wed Jan 24, 2007 10:31 pm Post subject:

Portage hates you too.
_________________
Advanced Signature Camouflage System®(ASCS) v0.1

V-Man · n00b Joined: 19 Jun 2003 Posts: 38 Location: A chair

steveL · Posted: Mon Jan 29, 2007 7:22 pm Post subject:

ferringb · Retired Dev Joined: 03 Apr 2003 Posts: 357

steveL · Posted: Wed Feb 07, 2007 11:40 am Post subject: