another optimization on top of this I was thinking of here w Pants #pex

another optimization on top of this I was thinking...

rough-minister-58256

10/09/2019, 9:02 PM

another optimization on top of this I was thinking of here would be to pre-process wheel metadata and construct either a per-file or per-repo index that the resolver could use to avoid xfer/IO costs. this could sit alongside the wheels in a peer dir or file mode in their find-links binary store.

aloof-angle-91616

10/09/2019, 9:15 PM

this is a very good idea and i was thinking of implementing it before actually for coursier which is pex for jvm

aloof-angle-91616

10/09/2019, 9:15 PM

by "i was thinking of" i of course mean like "something vaguely similar to this", i do not claim to know anything

rough-minister-58256

10/09/2019, 9:22 PM

its possible that this already exists by way of “index” vs “find-links” mode, but havnet looked closely.

aloof-angle-91616

10/09/2019, 9:23 PM

ok. i'd be interested in following up on this. it would be really nice to make adding a 3rdparty dep instantaneous too

aloof-angle-91616

10/09/2019, 9:23 PM

i also just love those graph algorithms i will not lie

aloof-angle-91616

10/09/2019, 9:23 PM

but i don't need to flex where not necessary

aloof-angle-91616

10/09/2019, 9:32 PM

highest priority is shipping real code

aloof-angle-91616

10/09/2019, 9:32 PM

to solve real problems

aloof-angle-91616

10/09/2019, 10:16 PM

this optimization i think is really natural to do after the other stuff because like the one thing the branch i linked doesn't even try to do is make resolves faster, it still waits on that in a single thread etc. if there was a way to cache the work of resolves like you're describing, that would mean a user wouldn't necessarily even have to wait a long time to add a huge dep like tensorflow. it seems like a really natural step to take afterwards to get to "instant" feedback

aloof-angle-91616

10/09/2019, 10:18 PM

to be clear -- a single resolve (as in when every requirement and constraint is the same) is cached with the branch i linked, which means redeploying is hopefully very fast once that requirement is in the remote cache, but if any 3rdparty requirement is changed, that requires at least one machine to wait however many minutes to redo a resolve from scratch

red-television-97006

10/10/2019, 2:25 AM

i’m also very very interested in this, this is a huge pain for us

red-television-97006

10/10/2019, 2:25 AM

particularly because we only really use the resolver, and pytorch for linux is like, 700mb?

Open in Slack

Previous Next