I am seeing some really unusual behavior when gene...
# general
g
I am seeing some really unusual behavior when generating a pants resolve lockfile. For some reason even when user requests a specific lib (i.e. numpy==1.21.5) I am seeing it resolve to 1.17.x which fails to download/install/compile and then it fails. I can give you the raw command being run. ๐Ÿงต pants: 2.21.0 pip: 24.2 pex: 2.20.2
Untitled
I don't know if this is a pip bug or a pex bug. I'm curious if anyone has every seen this behavior before.
It came up out of nowhere.
c
โ€ข Does it fail during lockfile generation, or when you try to use it? โ€ข Is the above log the failure you are seeing when you say it does not resolve? โ€ข Are you on some arch that numpy does not provide a wheel for?
g
It fails during generation. When I use --keep-sandboxes=on_failure and then edit the pex command to add --pip-log and look at the log it seems like it's trying to compile an older version of numpy and I have no idea why. The version of numpy we are using provides wheels: https://pypi.org/project/numpy/1.21.5/#files It's failing on both amd64 and arm64
c
hmm. I could imagine that pip was trying to backtrack because some other transitive dep was in conflict with
numpy==1.21.5
, but it is difficult to reason about the pip internals. Does something like get you any better of an error message?
Copy code
[python.resolves_to_only_binary]
your-resolve = ["numpy"]
g
Ah.. I didn't know about that option. Let me try it.
So I tried it and it confirms that it's trying to pull in an old version of the library, but I can't figure out why.
This is becoming a serious issue for us. We cannot re-generate a lock for 3 out of 9 of our monorepo resolves. I know that if I put these requirements in uv or poetry, they are able to create a lock and are not pulling in the old version of numpy. It seems specific to pants/pex/pip. Is there anyone that can help me better understand how to diagnose if this is a problem with pants, pex or pip?
g
You can strip away one layer at a time, passing the same set of requirements to
pex
, and if that fails, moving on to passing it to
pip
. If you can reduce this to an independent repository, this would also be helpful for others to look at the issue.
It's weird that there's even an attempt to backtrack to an incompatible numpy version. I've seen very slow resolves if there's incompatible or very narrow solve sets, but never attempts at a completely incompatible version.
(FWIW: pex reproduces this with the depset from your log. Quite sure you just have a bad dependency somewhere that forces numpy 1.17.3 into consideration. Trying to reduce this down to a failing subset.)
g
It's just weird that when I have numpy==1.21.5 and nothing else that there isn't an error, but it actually resolves to something else.
Like I would expect the resolution to fail, not try to install/compile some other version.
c
Is this something you could extract as a public repro case?
g
I mean the pex command is there, do you need the pants repro or is the direct pex repro enough?
said differently, from what I can tell it's not an issue with pants itself, but rather pex/pip
I'd gladly post a clean pex reproduction.
g
Out of curiosity, is that whole list of deps your actual direct deps?
g
yep
insane, I know.
It's a monorepo with ~100 developers and doing a bunch of crazy things in databricks amongst other things.
g
I'm honestly surprised it ever worked, I've had things fall apart with just two of these LMM frameworks in one repository due to conflicting dependencies.
I'm down to 8 of those dependencies and still reproing at this point, as well.
g
It worked great for 9 months, until all of a sudden.
This seems like a bug in either pex or pip (I can't tell which) because both uv and poetry have no problems resolving it.
I'm currently engineering something to basically load up all python_requirements for a resolve into a uv pyproject.toml, generate a lock (
uv lock
), then convert it to pex lockfile. It takes pex about 6 minutes to pass/fail. It takes uv ~30 seconds.
g
Interesting. I wonder what PDM would do. I know it has been stricter in some situations in the past.
g
I'm not familiar with PDM, what is that?
g
g
oh, just another package manager.
Interesting. I wonder what PDM would do. I know it has been stricter in some situations in the past.
I'm going to try pdm now.
It's official. uv is the fastest python dep resolver on earth. ๐Ÿ™‚
pdm is fairly slow in comparison. It's still working on resolving the tree.
c
What are your interpreter constraints?
g
Copy code
>=3.9,<3.11
I know it's taboo to have it so wide.
@gorgeous-winter-99296 I'm hopeful pdm will reveal the root cause. It's taking a long time to resolve. Seems hopeful of finding out information.
c
I think this is tangential, but are the various almost-dupe reqs like:
Copy code
aiohttp<4.0.0,>=3.8.1
aiohttp<4.0.0,>=3.8.4
intentional?
g
Copy code
numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0
is the minimal reproduction set.
g
@curved-manchester-66006 in short, it's an outcome of the chaos of our pyproject.toml, but "normal" yes ๐Ÿ™‚
We found spacy is the issue, but don't know why...
specifically, don't know why pex/pip isn't able to say, "hey, conflict exists HERE!"
g
The issue is specifically spacy allowing an incompatible thinc version as it has no upper bound. This causes pip to start building ancient versions of pyarrow, scikit-learn, etc, to figure out (I think) if they're compatible since they lack the wheel metadata about dependencies.
It's literally discarding thousands of various configs until it hits a point where it starts building wheels and then that build explodes.
g
Interesting. I wonder what uv and poetry are doing to make this not fail.
@gorgeous-winter-99296 I'm super curious how you found out the minimal repro set.
g
Simple bisection ๐Ÿ™‚ Remove half the deps, see if it builds. If it does, restore and remove a quarter.
โค๏ธ 1
Did PDM work btw? I know a bit about how that resolver works, and I expect it to succeed based on now knowing why PIP fails.
c
hmm, would be curious if https://github.com/pypa/pip/pull/13017 makes a difference in this case
g
Oh, forgot to write:
thinc<8.3
fixes everything. ๐Ÿ™‚ Solves the whole reqs file in 2m55 with Pex.
c
hmm, would be curious if https://github.com/pypa/pip/pull/13017 makes a difference in this case
Nope! I've tried to consolidate a report at: https://github.com/pypa/pip/issues/13037
๐Ÿ™Œ 1
g
Ya'll are super hero's โœจ Thanks for the help. I'm now able to generate locks after removing the offending package!
๐Ÿ”’ 1
c
@gentle-flower-25372 The pip issue you inspired has had quite the uptake!