:wave: Hello team! We use wheel files to distribu...
# general
r
👋 Hello team! We use wheel files to distribute python code to a spark cluster. One problem that we have run into is that the
requires
used to generate our wheel files have loose constraints (in general, we use
~=
instead of
==
), which means that dependency versions are set when the wheel file is installed, not when it is built, so 2 spark clusters created at 2 different points in time could have a different set of dependencies. This is not a problem in our the
pants
monorepo because
pants
freezes all dependency versions in lockfiles. This is also not a problem for our docker containers because in that case, the dependency is installed in the container at build time (unfortunately, our spark cloud provider puts a lot of limitations on docker so this is not an option for spark). Is there an option for the
python_distribution
target (or the
setup_py
target inside the
python_distribution
target) to modify
requires
to use
==
constraints for versions in the lockfile? To put this another way, I want a
requirements.txt
file that has loose dependency constraints, but a wheel file with hard-pinned
==
constraints, the specific version used being the one that’s set in the lockfile. Would it be possible to write a custom target that does this? If so, then where would I start?
r
If it's just about distributing it, I think the other option could be build PEX and put it on spark cluster. https://github.com/da-tubi/pants-pyspark-pex
c
all the data going into the
setup()
call is calculated in this rule: https://github.com/pantsbuild/pants/blob/main/src/python/pants/backend/python/util_rules/package_dists.py#L588-L724 so looking at that may be a first good step at figuring out a way to customize it…
r
Re: @refined-addition-53644 Unfortunately, our spark cloud provider also prevents us from using a
pex
file 😢
😢 1
Re: @curved-television-6568 Thanks for the link, I’ll take a look!
c
however it will not override/replace the one pants provides, so may not work well enough… 🤷
r
Can I use a plugin to overwrite an existing rule?
c
not sure if the locked version are available there, otherwise combining a resolve with the list from that rule may be what you want…
unfortunately no.
so if there is no ready union to hook into, a pants patch is needed.
or clever use of targets/fields.
☝️ 1
may be an option depending on what change is required
r
Right, I could create a new target with a new rule that uses this rule and then combines the resolve
c
worth exploring
h
Interesting, we have an option to control how we generate the requirement strings for first-party wheel deps: https://www.pantsbuild.org/docs/reference-setup-py-generation#first_party_dependency_version_scheme
🤯 1
Sounds like we want something similar for third-party ones?
Well, not quite similar, more like the opposite...
But I imagine this would not be too hard to implement
c
there it was. I knew we had an option for that but couldn’t find it.
yea, figured the same.
r
Based on that description, it sounds like
third_party_dependency_version_scheme=exact
is exactly what we want
h
Something like that. In the first-party case there are three values: "use `==`", "use `~=`" and "use no constraint at all", in the third-party case there would also be "use whatever is in the requirements.txt"