I've got a question on calling pants from inside a...
# general
b
I've got a question on calling pants from inside a script that is part of a run_shell_command target. Is this possible without setting the PANTS_CONCURRENT=True? If we do set pants concurrent to True, what are the implications?
f
would you be able to share a bit more about what you are trying to achieve? Making a call to Pants from a Pants invocation sounds intriguing 🙂
b
There are limitations on running generate-lockfiles in terms of the environment it runs in. Basically we have a docker_image that can run the appropriate steps to get the generate-lockfiles to run. The problem is, if we are already running in a docker environment I want to be able to have the existing script test for that condition, and if it is true, skip starting up the docker container and just run the pants generate-lockfiles target. That means that the local environment would call pants generate-lockfiles internally causing the pants invocation of the run_shell_command to call pants again.
e
There are limitations...
@blue-city-97042 this should not be the case with modern Pants / Pex / Pip. Can you link to a thread or issues that documents the issue you're seeing?
b
To generate lockfiles from a distribution that doesn't have a binary wheel, it builds things from the sdist. In order to do this, you have to have a full complement of build tools available. We don't have the ability to set up each and every python developers Mac system with all of the C compilers, libraries, etc for building complex libraries like tensorflow and spacy. Our target environment is a Centos7/python 38 environment. In this case we rely on docker images to do the builds of those wheels. The generate-lockfiles target doesn't allow for an environment specifier which means we have to run the command via external shell script. But if you happen to be one of the lucky devs that is using a Linux machine we are trying to avoid having to spin up a docker container to do the generate-lockfile target. In other words, we want to have a single generate-lockfile goal run that will somehow do the dependency locks for us. We are trying to get it so that the deps are built before we get to the generate lockfile stage, but we have a similar bind in getting those binaries built.
The fact that it has to build the sdist to glean the dependencies is beyond my understanding.
e
What you say is, in general, untrue. Can you please provide the Pants version and relevant ICs and requirements?
b
I really hope that I'm wrong 🙂 We are using a Mac system natively and a Centos7/python38 combo for our docker env. Here are the constraints and requirements files: Constraints.txt
Copy code
wheel>0.32.0,<0.33.0
numpy<2
numpy<1.25.0; python >=3.7,<3.9
cymem>=2.0.2,<2.1.0
preshed>=2.0.1,<2.1.0
murmurhash>=0.28.0,<1.1.0
thinc>=7.0.8,<7.1.0
Requirements.txt
Copy code
dkpro-cassis==0.2.1
spacy==2.1.9
numpy>=1.11.0,<1.25.0
joblib<1.0.0
scikit-learn==0.22.2.post1
setuptools>=41.0.0
Where this fails is when the setup.py is being called for scikit-learn. It's expecting the numpy package to be installed. Unfortunately it's not a part of the build requirements, so unless you do a custom build with it added as a build dependency, it'll fail.
And thanks for the help. I totally understand we have a witches brew of old technology with packaging that doesn't make sense.
IMO most of the packages didn't follow best practices when setting up their build requirements, which is why things are borken badly.
e
@blue-city-97042 what are your ICs? The
numpy<1.25.0; python >=3.7,<3.9
environment marker implies at least
>=3.7,<3.9
but it would be good to know what is actually written down in your
pants.toml
config.
b
sorry, that shouldn't read numpy<1.25.0, but rather just numpy<1.25.0. Our interpreter constraints are set to 3.8.* for the particular resolve we are looking at.
e
Ok, thank you.
And @blue-city-97042 did I miss the Pants version above? I can't find it.
b
oh shoot! it's 2.17.0
e
Thanks.
I have to do some running around, but I'll have this investigated later today and report back.
b
awesome, thanks a ton!
e
Ok, yeah, the issue boils down to these 4 projects: + blis (0.2.4):
setup_requires=['numpy>=1.15.0'],
+ preshed (2.0.1):
setup_requires=['wheel>=0.32.0,<0.33.0'],
+ spacy (2.1.9):
Copy code
[build-system]
    requires = ["setuptools",
                "wheel>0.32.0,<0.33.0",
                "Cython",
                "cymem>=2.0.2,<2.1.0",
                "preshed>=2.0.1,<2.1.0",
                "murmurhash>=0.28.0,<1.1.0",
                "thinc>=7.0.8,<7.1.0",
                ]
    build-backend = "setuptools.build_meta"
+ thinc (7.0.8):
setup_requires=["numpy>=1.7.0"],
They all have build requirements that must 1st be installed to even produce metadata (3 via old-school deprecated
setup_requires
, 1 via PEP-518). Of the 4, 3 are problematic (preshed works out fine since wheel is available as a universal wheel). The 3 problematic need at least a numpy wheel as a build requirement; so if there is no numpy available for the locking machine as a pre-built wheel, that build requirement forces numpy to be built. I think it's clear, but in case not - the issue here is build requirements, not install (runtime) requirements. You have 3 packages that need a pre-built numpy (and more), to even themselves build.
b
yup, but since there isn't an easy way of specifying what additional build packages should be included, we end up having to do a bespoke build first, then have the pex stuff do the generate lockfile. I'm not sure that this is a feature, nor if it can be accomplished. For the time being I'm doing the first option to do the lockfiles, but it's less than ideal as it's not going through and verifying that all platforms will work.
e
Yeah, I don't think there is any feature there to be had. These sort of packages can only be fixed or worked around. In many situations they can be fixed via a fork + vcs requirement, but when they are not runtime requirements, but, instead, build requirements, even that trick cannot be used.
I'm sorry there is nothing I can do here. You are put in a corner by 2 things: insistence on macs - I'm sure you don't have control over that, it's part of attracting devs these days I guess, and libs you don't have time to fix / can't practically.
b
so far I've had to use a witches brew of --no-build-isolation and exporting PIP_CONSTRAINTS and using pypa/build
the scripting is quiet ugly 😞
e
Yeah.
b
btw, how does Johnny dep get around some of these issues?
johnnydep
e
I'm missing that joke, but maybe with a well trained parrot?
At any rate, back to PANTS_CONCURRENT, yeah - you must use that currently and it is a potentially massive speed hit (run
time pants --no-pantsd --help
to get an idea of the noop startup time of Pants with no pants daemon).
b
no, it's a dependency walker that does dependency resolution so you can create freezes
e
Aha - never heard of that. Before I dive in, you're saying you can run johnnydep on these same macs to create a valid linux lock?
Afaict it just downloads wheels and walks wheel metadata to download more. If there is no wheel, it builds one; at which point you're dead in the same way.
Pex uses
pip download
to create the lock. When it gets an sdist it does not build the wheel, it uses the PEP-517
prepare_metadata_for_build_wheel
optional API to get at it from the sdists. Only if the build backend doesn't implement that method (which setuptools does), does it really build a wheel. But the issue here is the build_requirements themselves need to be built for real. That's what sinks this. I don't see any code in johnnydep that works around that either.
Yeah, and johnnydep still looks a bit fresh: https://github.com/wimglenn/johnnydep/blob/main/johnnydep/lib.py#L414 So it knowingly currently is not a sound dep resolver.
b
thanks for the heads up. I realized that I had used johnnydep on spacy and didn't try it for scikit-learn.