Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
I HATE Portage
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Mon Jan 22, 2007 7:17 am    Post subject: Reply with quote

ferringb wrote:
Point to Point, or working from revs means you know the exact state of the starting point, and the target point; thus you can generate just instructions to change the data from start to target.

Rsync is able to generate the starting point on the fly via sending a set of chksum data over; this is why theres a 2.4mb hit for any rsync connection for gentoo-x86 for example; pushing the chksums to the server.
Yeah, should have realised rsync was working on checksums, not mtimes, what with all the problems with network timing. The thing you wrote- why is that not the standard? (It seems to offer a huge bandwidth reduction.)
Quote:
Re: pulling manifests, manifest (under manifest2) holds the chksum data for fetching distfiles, including size reqs- so... useful to have. Offhand... less fetches, moves the potential for a man in the middle attack to just the sync, rather then every pkg merge.

steveL wrote:
ferringb wrote:
Finally... there are a set of very very very stupid ebuilds that reach outside of their specific pkg dir (thus violating any manifest protection), that (due to their untracked access) are rather 'hard' to download as needed.
Blimey sounds like they need to yanked then. Any important ones, in your opinion?

Haven't looked lately. Just do a find $PORTDIR -name '*.ebuild' -maxdepth 3 -mindepth -print0 | xargs --null grep -H PORTDIR # and dig through.
Haven't seen relative addressing for it, plus that's harder to look for... so neh.
Heh, running that cmd (slightly modified) produced 6 pkgs of the following types:
einfo/ elog - babytrans-0.92, portage, skype
epatch - gecko-sdk-1.7.8
kdir ${PORTDIR} - baselayout (couldn't find this command, guessing it's a bash function.)
sed - dev-embedded/sdcc
None of these seemed outrageous in what they were doing to my inexperienced eye. (gecko-sdk was using patches from other versions.)
Quote:
steveL wrote:
ferringb wrote:
Additional re: fetching just metadata that slipped the mind, is that the metadata is bound to a specific date/time; can't use rsync to pull down the data (since it may pull down *newer* data then the metadata is for, thus requiring recalc per pkg), thus would have to pull specific revs (cvs).
Now I am lost- I thought the idea was just to get the latest set.

Sync'ing pulls down the metadata from a specific date/time. The resolver (local) works agains that data, calculating it's build plan. If the ebuilds you download have different metadata (say, since you synced a new required DEPEND was added), it would require the resolver to detect the change, and go back and recalculate it's plan.
Worst case, it's a quad set of recalcs, with reaching out and pulling ebuilds off a server being the slow point per N. Fairly nasty.

So... you have to pull the ebuilds from the same data/time as the metadata you pulled.
I can understand that; thanks for the lucid explanation- it really helps! Would ebuild checksums (as another metadatum) be able to solve this? I appreciate that the resolver would have to work out its plan again, and don't know how serious that is (I am a bit lost with the `quad set of recalcs', I'll happily admit ;) I guess I'm hoping that it'd be infrequent, or that if we had a lightweight sync (only rsyncing metadata plus the extra stuff discussed) then an emerge could feasibly always sync first. Or perhaps that part of the process could be to sync pkgs that are going to be installed. Just throwing out ideas here, sorry if they sound stupid.

TBH the stuff you've written makes me think it might well just be too much hassle.

One thing that springs to mind, tho, is that metadata is always calculated locally. I appreciate that this is by design (ie you can have any backend cache as you noted) but since the vast majority of users use the standard file-based approach, it doesn't seem necessary. Rebuilding the cache seems to be getting slower for some reason.
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2889
Location: Omnipresent

PostPosted: Mon Jan 22, 2007 7:25 am    Post subject: Reply with quote

First, it's obvious that crap code in C can certainly be slower than good code in Python.

Second, I personally can't think of a single counterexample where the *same* algorithm, from top to bottom, is faster in Python than C. Taking smart Python code with standard idioms like list comprehensions, and if you re-implement the Python code and CPython library/interpreter routines in pure C, you'll get a significant performance boost. The only thing I've seen that puts a dent into this gap is Psyco because it can introduce machine-level optimizations; things get interesting when it's integrated with the interpreter, as in IronPython.

If this is wrong, I'm glad to learn.
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
ferringb
Retired Dev
Retired Dev


Joined: 03 Apr 2003
Posts: 357

PostPosted: Mon Jan 22, 2007 7:33 am    Post subject: Reply with quote

steveL wrote:
ferringb wrote:
Point to Point, or working from revs means you know the exact state of the starting point, and the target point; thus you can generate just instructions to change the data from start to target.

Rsync is able to generate the starting point on the fly via sending a set of chksum data over; this is why theres a 2.4mb hit for any rsync connection for gentoo-x86 for example; pushing the chksums to the server.
Yeah, should have realised rsync was working on checksums, not mtimes, what with all the problems with network timing. The thing you wrote- why is that not the standard? (It seems to offer a huge bandwidth reduction.)

Cause it operates on intermediate tarballs, instead of an rsync tree directly. I could mangle the bugger to work on trees directly, but that would require adding a new delta format (not hard with diffball since it's the only delta compressor out there that handles multiple binary formats, but still, bit of work).

steveL wrote:
ferringb wrote:
steveL wrote:
ferringb wrote:
Finally... there are a set of very very very stupid ebuilds that reach outside of their specific pkg dir (thus violating any manifest protection), that (due to their untracked access) are rather 'hard' to download as needed.
Blimey sounds like they need to yanked then. Any important ones, in your opinion?

Haven't looked lately. Just do a find $PORTDIR -name '*.ebuild' -maxdepth 3 -mindepth -print0 | xargs --null grep -H PORTDIR # and dig through.
Haven't seen relative addressing for it, plus that's harder to look for... so neh.
Heh, running that cmd (slightly modified) produced 6 pkgs of the following types:
einfo/ elog - babytrans-0.92, portage, skype
epatch - gecko-sdk-1.7.8
kdir ${PORTDIR} - baselayout (couldn't find this command, guessing it's a bash function.)
sed - dev-embedded/sdcc
None of these seemed outrageous in what they were doing to my inexperienced eye. (gecko-sdk was using patches from other versions.)

Would have to go looking again; recall in the past spotting pkgs that reached into other pkgs directories for files; that's the evilness I'm referencing :)

steveL wrote:
ferringb wrote:
steveL wrote:
ferringb wrote:
Additional re: fetching just metadata that slipped the mind, is that the metadata is bound to a specific date/time; can't use rsync to pull down the data (since it may pull down *newer* data then the metadata is for, thus requiring recalc per pkg), thus would have to pull specific revs (cvs).
Now I am lost- I thought the idea was just to get the latest set.

Sync'ing pulls down the metadata from a specific date/time. The resolver (local) works agains that data, calculating it's build plan. If the ebuilds you download have different metadata (say, since you synced a new required DEPEND was added), it would require the resolver to detect the change, and go back and recalculate it's plan.
Worst case, it's a quad set of recalcs, with reaching out and pulling ebuilds off a server being the slow point per N. Fairly nasty.

So... you have to pull the ebuilds from the same data/time as the metadata you pulled.

I can understand that; thanks for the lucid explanation- it really helps! Would ebuild checksums (as another metadatum) be able to solve this? I appreciate that the resolver would have to work out its plan again, and don't know how serious that is (I am a bit lost with the `quad set of recalcs', I'll happily admit ;) I guess I'm hoping that it'd be infrequent, or that if we had a lightweight sync (only rsyncing metadata plus the extra stuff discussed) then an emerge could feasibly always sync first. Or perhaps that part of the process could be to sync pkgs that are going to be installed. Just throwing out ideas here, sorry if they sound stupid.

TBH the stuff you've written makes me think it might well just be too much hassle.

Idea wise it's not bad, but it really requires relying on pulling specific revs from a vcs. Regarding chksums, thats why I'd pulled the manifest- gives you checksums.

Meanwhile, what I mean by quad is that if a resolver plan involves say 700 pkgs, none of which have ebuilds locally, the worst case is quadratic bound (N^2) in terms of forced re-resolution. Can do tricks to avoid worst case, but the issue itself is bad enough that it's better to just rely on pulling the exact vcs rev of the ebuild/files, or handing all calculation off to a server that hands you the required files as it goes.

steveL wrote:
One thing that springs to mind, tho, is that metadata is always calculated locally. I appreciate that this is by design (ie you can have any backend cache as you noted) but since the vast majority of users use the standard file-based approach, it doesn't seem necessary. Rebuilding the cache seems to be getting slower for some reason.

Not necessarily calculated locally. Rsync'd trees give you (at least gentoo-x86 does) metadata/cache, pregenerated metadata cache.

Re: rebuilding the cache getting slower, profile it and find out why. Top of the head, I'd expect it's due to zac backing out some of the optimizations I introduced rather then fixing an issue with rsync propagation (if size is the same, and mtime the same, the rsync setup won't push the update file; relies on mtime/size for checksum).
Back to top
View user's profile Send private message
ferringb
Retired Dev
Retired Dev


Joined: 03 Apr 2003
Posts: 357

PostPosted: Mon Jan 22, 2007 9:32 am    Post subject: Reply with quote

Hypnos wrote:
First, it's obvious that crap code in C can certainly be slower than good code in Python.

If it's so obvious, why did it take two retorts to point out the fallacy in your statement, and to back off absolute statements like "foo in language $DAR is always faster then in language $BAR" ;)

Why further, does this particular issue (stupid language wars based on native speed) keep coming up ;)

Hypnos wrote:
Second, I personally can't think of a single counterexample where the *same* algorithm, from top to bottom, is faster in Python than C.


If it's exactly the same, as in all invocation it triggers abuse same algorithms, etc, yep. Thing is, you'll never find that- what you're requesting is in reality approximations of the algo (in reality, exact same invocations would mean making the same bit of code twice anyways).

Say your algo is a piece of code that repeatedly allocates, and releases a struct. You can only approximate this in python- python does ref counting, thus can hold onto the obj. Further, depending on the desired obj, it may be using a different heap manager internally, avoiding releasing the memory back thus dodging the pricey malloc invocation. Actual example of this would be allocation of a single char string; cpython implementation internally effectively treats single char strings as singletons, surface level it looks the same, but what is actually occuring (python skipping malloc/initialization and instead just incref'ing and returning an existing ptr vs c having to trigger a malloc call) can differ greatly.

Alternate example, would be summing the total length of a series of strings. What we'll label the algo, in python code (close enough to pseudocode that it'll serve as the algo)-
Code:

def foo(strings)
  i = 0
  for s in strings:
     i += len(s)
  return i

Crazily enough, is able to edge out the "same top to bottom" c code
Code:

unsigned long foo2(char **strings)
{
  unsigned long i=0;
  char **p;
  for(p = strings; *p; p++)
    i += strlen(*p);
  return i;
}

over time and with appropriate threshold for avg string len. Why? Top to bottom, they're the "same algo", after all.

The reason comes down to the fact that the "same algo" snippet doesn't factor in instantiation, nor data structure- python string objects store their length upon instantiation, so the 'algo' approximation for python is in reality just adding ints; for c, it's a linear walk of each string looking for NULL, adding that length.

Surface level, looks the same, the actual execution of it will always differ though, which is where the real speed issues can crop up. You wanted an example, there you go.

And just so we're clear; I'm not insane, properly written c/c++ code should be able to edge out python code without too much issue (assuming the limiter is cpu, rather then IO) due to avoiding implementation overhead (namely, native execution versus interpretation). There's a difference between that statement, and the original "c implementations are always faster then python implementations" you daftly stated.

For example-
Hypnos wrote:
That said, Python will certainly be slower than pure-C tools, even if things are written in CPython modules.

Which... hey, if you forgot to qualify it, that's fine. I'd lay off. You're getting the third degree however because you're bandying around absolutes as if they were holy writ, said absolutes being easy enough to prove as inaccurate generalizations (already disproved each of those quotes after all). Specifically, you're getting it due to daft

Hypnos wrote:
Python + raw filesystem storage has not scaled well with the size of the tree; a pure C tool like Paludis or make (a la BSD) does scale better than a Python tool.

pkgcore code, when operating with the same set of caches actually is neck and neck with paludis. The differences mostly come down to instantiation startup (python machinery starting up is a bit of a hit, although that's one time cost not scaling). Irony is that if app startup overhead is factored out, pkgcore is faster ;).

Meanwhile continuing with the original point, enable paludis's names cache, it starts edging out a bit- not because of the language, but because of a more efficient algo (namely reading a single file instead of doing multiple readdir calls). Can add the same thus negating the advantage mind you, the point is that the language for this scenario (namely typically io bound ops) isn't the issue, it's the algo.

As said; you're getting the third degree because of stating an absolute, claiming python is the scaling fault when in reality it's the portage implementation, internal algos that are at fault, not the language. Brute speed of the language isn't a solution for dumb ass algos. Can keep arguing if you like, but the initial (and ensuing) statements were daft generalizations whether you like it or not, and those daft generalizations folks have a habit of throwing around because it's simpler to say "pythons slow" then "portages implementation is not exactly efficent due to reasons 1, 2, 3, 4, 5, 6...". Might be easier throwing around generalizations of that sort, but it still is plain ignorance for the most part.
[/flame off]
Hypnos wrote:
Taking smart Python code with standard idioms like list comprehensions, and if you re-implement the Python code and CPython library/interpreter routines in pure C, you'll get a significant performance boost.

First of all, CPython *is* C code. I hope you meant "implement it without the python OO machinery", else this conversation is daft. Speaking from experience, pushing down to cpy. is *not* guranteed to give you a "significant boost". It all depends upon your bottlenecks (see the previous comment about the language imbuing the code with certain characteristics). Unless you convert completely over to standalone c (ie, no python machinery), you still have to write the code to play nice with the machinery, meaning you still have abstraction related overhead.

Translating python code down into c code yields variable gain. Completely dependant upon what your real bottlenecks are, pushing a pull from a dict snippet of python code down into cpython is only going to spare you the cost of passing the equiv bytecode through the interpreter; the interpreter winds up running the exact same code after all. If your bottlenecks are in cpy code already, translating down isn't going to make dick all difference, since you're targetting micro-optimizations rather then the real issue.

Also, list comps generally suck for most folks actual usage. gen expressions rule you (avoids intermediate allocation of a list, linear population of the list, get iter, finish iter, deref- another linear walk of the list), then dealloc versus allocation of a frame obj, and popping in/out of it (essentially). Not saying it always wins out, but lot of folk use list comps when they're just after a genexp, thus forcing daft extra work.

Hypnos wrote:
The only thing I've seen that puts a dent into this gap is Psyco because it cay n introduce machine-level optimizations; things get interesting when it's integrated with the interpreter, as in IronPython.

If this is wrong, I'm glad to learn.

Psyco is double edged however- the inspection it does (continuous live inspection) to determine when to patch in optimized code is enough of a hit on good python code, it's not usually worth it (in my experience). Ironpython's interpretation works differently, so the cost of that inspection is mostly a one time hit.

If I recall correctly, psyco works by introducing type specific invocations instead of actual machine level instructions also. Ironpython (clr in general iirc), is able to do machine level instructs.

The interesting one is pypy. Translate it single time, doing the optimization, pushing it into c/llvm/whatever backend.


Last edited by ferringb on Tue Jan 23, 2007 9:45 am; edited 1 time in total
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2889
Location: Omnipresent

PostPosted: Tue Jan 23, 2007 3:04 am    Post subject: Reply with quote

Your point that the language implementation != language, and that each implementation has its strengths and weaknesses, is well taken. Thanks for the detailed and informative post.

I don't want to get into a semantic debate about what an algorithm is :) I would call it moving bits around, and my original point is that Python, being so abstract and dynamic, makes it tough to ensure that your particular code is being implemented efficiently at the level of moving bits around. If a particular implementation is doing something that really speeds up the execution of your code, that can be re-implemented outside of Python.

In the specific example you give, what if you use glib's representation for strings rather than char*, and g_alloc instead of the malloc? I believe, like CPython, you would have the penalty of calling library routines, but possibly win with the introspection on the string object and buffered allocation of memory. Again, the code would look rather similar.
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
ferringb
Retired Dev
Retired Dev


Joined: 03 Apr 2003
Posts: 357

PostPosted: Tue Jan 23, 2007 9:42 am    Post subject: Reply with quote

Hypnos wrote:
My original point is that Python, being so abstract and dynamic, makes it tough to ensure that your particular code is being implemented efficiently at the level of moving bits around.

Would tend to disagree on that one; remember that python is just a c app; thus, memcpy and friends are obviously abused internally. Further, strings are immutable- trying to copy strings actually just refs the strings, you have to break it apart and recombine it to get a true, seperate mem location.

Key thing to remember with python is that native python code deals only in refs effectively; list comps generate new lists, slicing a list, same thing, but actual copying is fairly limited, most ops are just inspecting data, literally pushed down to primitives which are c based.

The hit with python is in the individual instruction execution speed- the spots where it really rears it's head imo is in doing char level inspection of strings. Reasonings pretty simple- it involves a lot of native python instructions to do so, and the abstraction layer can mildly get in the way.

Key to fast python code isn't avoiding abstraction or dynamism; it's shunting the real work (from a total time required to run standpoint) off to the builtins, which are implemented in c. Weird to view it thus, but remembering you're basically dealing in glue binding together c invocations is a good way to look at it.

Hypnos wrote:
If a particular implementation is doing something that really speeds up the execution of your code, that can be re-implemented outside of Python.

Not always actually easily though- the abstraction python shoves in (accessing an attr on an instance actually triggers instance.__getattribute__(self, attr) for example) allows you to do some fairly fun tricks; namely, proxying, or delayed instantiation of objects/attrs. Why I say it's hard to do that elsewhere, is if the target language doesn't have that abstraction, you either need to add it, or are screwed for doing some of the stuff python allows. Trying to do delayed instantiation of an object in c, *without* changing the target interface for example (think of it as a seriously bastard form of polymorphism) is pretty much impossible, unless you're throwing around func ptrs everywhere or have built up your own obj. implementation.

Sounds insane, but tricks like that can make a pretty heavy difference in run time- you can maintain the same (sane) api, but delay loading of data via it till it's actually required inside the func- if the func doesn't need it, no data loaded.

Hypnos wrote:
In the specific example you give, what if you use glib's representation for strings rather than char*, and g_alloc instead of the malloc? I believe, like CPython, you would have the penalty of calling library routines, but possibly win with the introspection on the string object and buffered allocation of memory. Again, the code would look rather similar.

Note I said the comment about "threshold/avg length" ;)
Was intended to cover my ass for overhead from cpy func invocation ;)

Meanwhile, using a gstring, yes, the runtime behaviour for the algo snippet (and just the snippet) would have the same runtime characteristics and ought to be unmatchable for native python code vs glib.

Not to hammer on ya too hard, but that was one of the points of the last few posts- data structures can make a massive difference, namely via avoiding daft work when dealing with the obj. Not saying a gstring obj is the best way (if you have to do linear walks of the char, never relying on len fex, it's wasting 4-8 bytes per instance dependant on arch).

Mind you, that last bit about the 4-8 per is being extraordinarily anal about performance/mem, but a case where it *does* rear it's head is in dealing with the cache backend- gentoo-x86 has (local check) 23,434 ebuilds, there are 16 metadata keys that need to be tracked per cache entry (technically there are more keys, but they're no longer used); that puts it at 374,944 strings in memory if you are loading up all metadata for gentoo-x86.

For 32bit, the len (if never used) is a wasted ~1.5mb of mem; 3mb or so for 64bit. Doesn't sound like much, but the raw total size of the cache is around 20mb (literal cat'ing of the data together and then a char count). So... stuff like that matters, unless 15% waste is fine in your books ;)

(technically, the overhead is higher for python/others due to the fact it's not counting the overhead of the ptr to the str and other misc. things, but oh well).
Back to top
View user's profile Send private message
Hypnos
Advocate
Advocate


Joined: 18 Jul 2002
Posts: 2889
Location: Omnipresent

PostPosted: Wed Jan 24, 2007 5:40 am    Post subject: Reply with quote

Definitely extra cruft or "daft work" becomes a problem when you have a bottleneck -- bandwidth, memory, CPU time. How do you package large numbers of similar, but not identical, data records?

I've seen the following with large (> 1GB) scientific data sets:

* When bandwidth limited, use an inscrutable packing format, compression and error checking.

* When memory limited, use raw, flat representations with accompanying "status" or "logging" info to cut it up one chunk at a time.

* When CPU limited, use records aggregate objects like structs with machine types.

In the default Portage cache backend, you run into a memory limitation, but also require text file storage, human readability, and reasonable speed. So you come to the project of optimizing Python's string handling?
_________________
Personal overlay | Simple backup scheme
Back to top
View user's profile Send private message
bLUEbYTE84
Guru
Guru


Joined: 21 Jul 2006
Posts: 566
Location: universe.tar.gz, src/earth.h, struct homo_sapiens_table

PostPosted: Wed Jan 24, 2007 10:31 pm    Post subject: Reply with quote

Portage hates you too.
_________________
Advanced Signature Camouflage System®(ASCS) v0.1
Back to top
View user's profile Send private message
V-Man
n00b
n00b


Joined: 19 Jun 2003
Posts: 38
Location: A chair

PostPosted: Thu Jan 25, 2007 6:12 pm    Post subject: Re: I HATE Portage Reply with quote

Pythonhead wrote:
A part of the problem is that people don't take the time to report the bugs where they should (bugzilla). They'd get fixed a lot quicker if people posted them there instead of (or as well as) in the forums.
Quoted because this can't be overstated. If you fix it, or need it fixed, tell the people that care, so they can help others.
_________________
To err is human; To really foul things up requires a computer.

www.zoto.com
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Mon Jan 29, 2007 7:22 pm    Post subject: Reply with quote

ferringb wrote:
Cause it operates on intermediate tarballs, instead of an rsync tree directly. I could mangle the bugger to work on trees directly, but that would require adding a new delta format (not hard with diffball since it's the only delta compressor out there that handles multiple binary formats, but still, bit of work).
Well, you clearly work on enough already. It just seems like such a sweet bandwidth saving.
Quote:
Idea wise it's not bad, but it really requires relying on pulling specific revs from a vcs. Regarding chksums, thats why I'd pulled the manifest- gives you checksums.

Meanwhile, what I mean by quad is that if a resolver plan involves say 700 pkgs, none of which have ebuilds locally, the worst case is quadratic bound (N^2) in terms of forced re-resolution. Can do tricks to avoid worst case, but the issue itself is bad enough that it's better to just rely on pulling the exact vcs rev of the ebuild/files, or handing all calculation off to a server that hands you the required files as it goes.
Ok, got you now.
Quote:
steveL wrote:
One thing that springs to mind, tho, is that metadata is always calculated locally. I appreciate that this is by design (ie you can have any backend cache as you noted) but since the vast majority of users use the standard file-based approach, it doesn't seem necessary. Rebuilding the cache seems to be getting slower for some reason.

Not necessarily calculated locally. Rsync'd trees give you (at least gentoo-x86 does) metadata/cache, pregenerated metadata cache.

Re: rebuilding the cache getting slower, profile it and find out why. Top of the head, I'd expect it's due to zac backing out some of the optimizations I introduced rather then fixing an issue with rsync propagation (if size is the same, and mtime the same, the rsync setup won't push the update file; relies on mtime/size for checksum).
Well I'm a bit out of my depth (again!) I don't understand what exactly it's doing in rebuilding the cache. I asked on IRC and was told it's working on the edb which is (now) supposed to be the same format as the metadata cache.

Don't suppose you could recommend what to use to profile python? (I know, `jfgi'.)
Back to top
View user's profile Send private message
ferringb
Retired Dev
Retired Dev


Joined: 03 Apr 2003
Posts: 357

PostPosted: Mon Jan 29, 2007 7:51 pm    Post subject: Reply with quote

steveL wrote:
I don't understand what exactly it's doing in rebuilding the cache. I asked on IRC and was told it's working on the edb which is (now) supposed to be the same format as the metadata cache.
Don't suppose you could recommend what to use to profile python? (I know, `jfgi'.)

The rebuilding is actually cache transferance- format wise, whats in /var/cache/edb/dep is a format called flat_hash, which is basically KEY=VAL pairs; whats in $PORTDIR/metadata/cache is flat_list, which is a specific ordering of key lookup (line 8 is DESCRIPTION, for example).

Profiling... assuming you're using python2.4,
Code:
python -m profile -o dump.stats /path/to/script args for script
is a decent way to profile.

If using python2.5, would use cProfile instead of profile (less overhead, thus bit more accurate).
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Feb 07, 2007 11:40 am    Post subject: Reply with quote

ferringb wrote:
The rebuilding is actually cache transferance- format wise, whats in /var/cache/edb/dep is a format called flat_hash, which is basically KEY=VAL pairs; whats in $PORTDIR/metadata/cache is flat_list, which is a specific ordering of key lookup (line 8 is DESCRIPTION, for example).
OK. On zmedico's advice i've switched to using portdbapi.auxdbmodule = cache.metadata_overlay.database in /etc/portage/modules and FEATURES="-metadata-transfer" in make.conf (man portage /metadata for anyone else who's interested) which gets rid of that step altogether. Yay!

Thanks for the info on the profiling; i'll use it if i ever do any python coding.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8
Page 8 of 8

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum