another optimization on top of this I was thinking...
# pex
r
another optimization on top of this I was thinking of here would be to pre-process wheel metadata and construct either a per-file or per-repo index that the resolver could use to avoid xfer/IO costs. this could sit alongside the wheels in a peer dir or file mode in their find-links binary store.
a
this is a very good idea and i was thinking of implementing it before actually for coursier which is pex for jvm
by "i was thinking of" i of course mean like "something vaguely similar to this", i do not claim to know anything
r
its possible that this already exists by way of “index” vs “find-links” mode, but havnet looked closely.
a
ok. i'd be interested in following up on this. it would be really nice to make adding a 3rdparty dep instantaneous too
i also just love those graph algorithms i will not lie
but i don't need to flex where not necessary
highest priority is shipping real code
to solve real problems
this optimization i think is really natural to do after the other stuff because like the one thing the branch i linked doesn't even try to do is make resolves faster, it still waits on that in a single thread etc. if there was a way to cache the work of resolves like you're describing, that would mean a user wouldn't necessarily even have to wait a long time to add a huge dep like tensorflow. it seems like a really natural step to take afterwards to get to "instant" feedback
to be clear -- a single resolve (as in when every requirement and constraint is the same) is cached with the branch i linked, which means redeploying is hopefully very fast once that requirement is in the remote cache, but if any 3rdparty requirement is changed, that requires at least one machine to wait however many minutes to redo a resolve from scratch
r
i’m also very very interested in this, this is a huge pain for us
particularly because we only really use the resolver, and pytorch for linux is like, 700mb?