I find it a bit strange that while I’m iterating o...
# general
c
I find it a bit strange that while I’m iterating on a test (using
--loop test ...
) why it all of a sudden starts resolving constraints.txt again.. ?
Copy code
⠁ 18.66s Resolving constraints.txt
i.e. I’ve only touched on the test code in a
test_foo.py
file, nothing else. Is this expected?
h
No, I don't think that is expected. And that is a painful operation that we want to avoid very much, so figuring this out is high priority Do you still have
.pants.d/pants.log
? Can you please save that and if not confidential save it as a Gist/Pastebin and link here?
c
DM..
😂 1
h
I've had weird cache results like this too where things seemingly have to get rebuilt. If it happens again, I'll try to summarize and pass along anything useful.
👍 1
I'm also always really surprised that resolving constraints takes so long when I have a fully resolved lockfile thanks to
pip-tools
. I know this is mostly a
pip
thing, but I'd really love to know what makes that so inefficient. Poetry seemed to handle that process much better when I tried their tool out.
c
Agree with that assessment. On a hunch, that could potentially be due to if Pants resolves in a clean env every time, where as Poetry re-resolves in an already populated env, hence
pip
will only have to apply any deltas to what is already present. So, perhaps room for improvements in how the constraints are resolved (by reusing a previous resolve as base, could speed things up perhaps)
h
Part of the slowness issue is that
pip
will still access the network even with a fully resolved lockfile and a fully populated download cache. I suspect that it does this to check for new versions that match requirement ranges, even though there are no ranges involved in this case.
But yes, the bigger part is that Pants re-resolves a clean venv every time instead of applying deltas to an existing venv
This is something we're looking into improving cc @helpful-jackal-12093
But, separately, it's also very bad if we're re-resolving when we shouldn't be, so pinpointing that would be excellent
This has been reported anecdotally often enough that we need to solve it, I will try and bang on it this afternoon. But a consistent way of reproducing the problem would be really helpful!
🔨 1
c
I’ll keep an extra close eye out for what may be triggering it, if I can find a way to reproduce this.
Come to think of it, isn’t there a cache gc process.. what if some required input for the constraints are evicted, is that logged?
👀 1
h
Interesting
b
regarding
Copy code
pip will still access the network even with a fully resolved lockfile and a fully populated download cache
isn't this what
--no-index --find-links=<cache dir>
is for? We should know whether the full set of 3rd party deps have changed since the last resolve, no?
w
@curved-television-6568: yes, it is possible for something to be garbage collected from the store… but it should be LRU
👍 1
the cache itself isn’t LRU, but all of the content of the cache entry is, and if any of it isn’t present, you miss the cache
@best-florist-45041: yes. but that stretches the definition of a cache… “most relevant/recent cache entry” is a fuzzy match that isn’t usually applied to caches
https://github.com/pantsbuild/pants/issues/14127 is open on this topic, but how a “fuzzy lookup to some recent + similar thing” should work is an open question there
everywhere else, pants is using very precise cache keys, so there is … no fuzz, heh
h
AFAICT
pip
itself insists on going to the network in all cases. This is sort-of documented here: https://pip.pypa.io/en/stable/topics/caching/?highlight=cache#http-responses
Copy code
While this cache attempts to minimize network activity, it does not prevent network access altogether.
I guess you're right,
--no-index --find-links=
should fix that
But either way, the bigger issues are A) resolving unnecessarily for some reason and B) re-resolving from scratch
b
@witty-crayon-22786 True, that's not really a proper cache. I suppose my mental model was in the vein of: 1) Have a full constraints.txt file. 2) Resolve all deps and generate wheels. 3) For each subset of dependencies needed for a particular
goal + targets
, create the venv with
--no-index --find-links=...
with just that subset. This takes seconds instead of minutes. With (1,2) only needing to be rerun whenever the dependency list changes.
w
3) For each subset of dependencies needed for a particular 
goal + targets
, create the venv with 
--no-index --find-links=...
 with just that subset. This takes seconds instead of minutes.
yea, that is precisely what we do.
…except via pants’
--pex-repository
feature, which results in symlinkking out of one exploded pex and into another
b
Ah, cool!
w
while there are further performance improvements, https://github.com/pantsbuild/pants/issues/14127 is mostly about your steps 1 and 2
h
Yes, I should clarify that the thing that takes time is 2). 3) is very fast.
Although in cases where there are a great many such subsets, and those times add up to something unacceptable, you can also run against the entire lockfile: https://www.pantsbuild.org/docs/reference-python#section-run-against-entire-lockfile