Started a question on this yesterday but it was ov...
# general
p
Started a question on this yesterday but it was overshadowed by a 2nd question. I'm trying to get my build to work on ARM. One issue I'm having is that we use
opencv-python
but there isn't an ARM build for that on PyPi. You can
apt-get install python3-opencv
but pants won't pick that up. I do understand why: it breaks the hermetic build idea as you're using an outside dependency that could be updated via
apt
at any time. You can't can't correctly generate a constraints file because whatever tool you're using (e.g.
pip freeze
or
pip-compile
) wouldn't be aware of one of the dependencies and it's requirements/version. But, getting it to build on ARM looks like it's going to be a big time suck. Is there an "escape hatch" that allows one to use a globally installed Python library? ("no" is a perfectly reasonable answer here I think, was just hoping...)
h
The closest I can think of is a local requirement: https://www.pantsbuild.org/docs/python-third-party-dependencies#version-control-and-local-requirements Not sure if that's already what you're saying though, that it would hard to convert this apt-built requirement into a local
.whl
file
p
@hundreds-father-404 if you install opencv via
apt
it just installs an
.so
file:
Copy code
# dpkg -L python3-opencv
/.
/usr
/usr/lib
/usr/lib/python3
/usr/lib/python3/dist-packages
/usr/lib/python3/dist-packages/cv2.cpython-39-aarch64-linux-gnu.so
/usr/share
/usr/share/doc
/usr/share/doc/python3-opencv
/usr/share/doc/python3-opencv/changelog.Debian.gz
/usr/share/doc/python3-opencv/copyright
but I don't get any of the metadata (e.g. it's 3rd party dependencies, etc.) that I'd need to build a wheel. To build it into a wheel I think the quickest path to success would be to checkout the OpenCV source and try to
setup.py bdist_wheel
but I suspect that's going to end up taking a bunch of time (e.g. ARM was far less popular when the version of OpenCV we need was released) so I was hoping for a "morally wrong but expedient" way to tell Pants to just use the dependency as installed by
apt
. Does that make sense?
If I do have to build a wheel I know how to get that to install. We already maintain a cloud hosted PyPi-like server for some other dependencies we have to compile.
h
I think that makes sense. I don't know of any way to do that, though - iiuc Pex needs to have the whole wheel to work properly, not just
.so
I think you'd need to build the wheel
f
There's a lot of issues trying to use something like pants to deal with globally installed libs. I've spent a lot of time thinking about it, and I ended up trying to get around it mostly be running build stages in docker containers. Pants wants hermeticity at every step. I'd recommend building the wheel and using local reqs or just pushing it to your devpi server
1
This isn't expedient for you, but I do expect to spend a good amount of time on this problem in the coming months, as my own org is highly tied to the RPM ecosystem for everything, I imagine I'm going to have to construct an entire subgraph of rules/targets for this model of executing python to work but I think it will pay off eventually. That said, this may be easier in RPM-based distros since the support for alternate install roots is better. If you have support for alternate install roots, you should be able to snapshot what files installing particular system deps creates and then let pants cache those partial trees for use in subsequent rule executions. I'm not sure how you'd do that in DEB-based systems without a lot of chroot work (and root permissions)
I expect this to be a dicey problem in general though. If anyone has ideas, I'm interested to hear them!
I think Buildah is another potential approach, as it lets you incrementally build up containers over several commands without root privileges, and can export results to arbitrary locations on the filesystem, which might get us a lot closer to pants' notion of hermeticity than docker does. The big downside is that this approach is extremely linux-centric
h
We could come up with a principled way to poke specific holes in the sandbox
It sounds like that can be needed at times
When you apt-get install
python3-opencv
what does that actually do?
How does the system python find and consume that requirement?
f
the system python in this case would be configured to have a certain set of directories installed as part of its site configuration. Doing
apt-get install python3-opencv
will install the required files in one of those configured dirs (probably something like
/usr/lib/python3.x/site-packages
)
p
Hey all, sorry for dopping out: had a meeting. Thanks all for the thoughts. Yes, using globally installed Python things is wrong. Just trying to get something working without spending a ton of time on it but fully understand why I shouldn't be doing what I'm trying to do 🙂 @happy-kitchen-89482
How does the system python find and consume that requirement?
Copy code
# dpkg -L python3-opencv
/.
/usr
/usr/lib
/usr/lib/python3
/usr/lib/python3/dist-packages
/usr/lib/python3/dist-packages/cv2.cpython-39-aarch64-linux-gnu.so
/usr/share
/usr/share/doc
/usr/share/doc/python3-opencv
/usr/share/doc/python3-opencv/changelog.Debian.gz
/usr/share/doc/python3-opencv/copyright
It puts in
/usr/lib/python3/dist-packages
which is on
$PYTHONPATH
.
f
I don't think it would be too hard to poke a hole in the sandbox here and say "this dep is provided by this system lib", but I do think you'd want to give pants some help to understand which python it should be looking for
Yes, using globally installed Python things is wrong.
It's not wrong, it just a choice that has its benefits and costs. For a lot of people these days sticking with system-installed packages has far more costs than benefits, so that often gets distilled into the idea that it's wrong. I think it's a reasonable thing for Pants to want to support some day.
1
Maybe there's a way to express this as a target that can be depended on...
Copy code
python_system_library(
  name = "opencv-system",
  python_binaries = ["/usr/bin/python3.8"],
  packages = ["cv2"],
  install_command = "sudo apt-get install -y python3-opencv2",
)
I just don't know how this would interact with pants' interpreter searching
h
Yeah, if a user needs to do it then it's not "wrong" almost by definition
1
We try very hard to be the tool that works with you, not against you... But that still makes the tradeoffs explicit
💯 1
I think we can tell pex to not scrub certain things from site-packages
@enough-analyst-54434 thoughts on that?
Some sort of "provided" dependency scope
And then of course ensure that the right python interpreter is used
e
Thoughts on which? From the Pex side there are already ways to use things in the sys path or tack on whole new paths.
To inherit
--inherit-path {false,prefer,fallback}
or
PEX_INHERIT_PATH
and then
PEX_EXTRA_SYS_PATH
to add arbitrary things
h
Thoughts on how to use this in Pants to poke holes in the sandbox and allow specific requirements to be provided by the system python
p
related: I found a pre-build ARM wheel for verion 3.4.13 of opencv but it depends on a version of numpy that is greater than what the old version of TensorFlow can support. It's not great, but I'm pretty sure it'd actually be OK. The
requirements.txt
docs say the only solution is to fork the .whl: https://pip.pypa.io/en/stable/topics/dependency-resolution/#loosen-the-requirements-of-your-dependencies. Does pants give me a way to say "loosen opencv's constraint on numpy"?
e
Not really. The only option I can think of, Pants aside, is to generate a complete list of transitive requirements and ask a resolver to resolve those intransitively. Both Pex and Pip support this. Concocting the full transitive dependency list though will be fiddly partly by-hand work.
p
😭
thanks
e
You basically have to: 1. Create your own wheel with a hacked requirement 2. Setup the resolve to look at your wheels 1st.
p
yup.
e
@happy-kitchen-89482 Ah. I'll think about that. My opinions on UI / UX are generally not useful. The only technical piece is how to specify / find the interpreter that has the system installed dists. That could be some code that searched and tested interpreters to see which one had access to the needed dist. That I could think more on.
But, more simply, you could just use existing Pants mechanisms to constrain the interpreter search path when you already know the answer / have a uniform machine fleet where Pants will be run.
f
What about a route of doing local wheel build off a source tarball or python sdist? It would be painful like once per machine, because it could be cached. It would take a plugin of some kind, but it might be a more flexible solution to some of these problems, especially if you run into dependency hell like @plain-carpet-73994 is describing.
h
@enough-analyst-54434 Don't we scrub the site-packages though? That's the part I'm not clear on
e
@happy-kitchen-89482 we do, but
--inherit-path
adds everything in the interpreter's natural
sys.path
right back.
h
Right. So I guess I'm wondering about doing that selectively, for just some dists. Which I guess would require symlink farming or something.
Or we could just do it globally, that would solve the user's problem in practice
at the expense of hermeticity, but that's what you're signing up for if you use this
p
What about a route of doing local wheel build off a source tarball or python sdist?
That would be super helpful!! I've already hit another case where I had to manually build a wheel 'cause all I could find was an sdist. And then I had to set up a private pypi repo just for that one dependency, figure out the config, etc. In the end it cost me a day or two where an sdist option would have eliminated all that work.
Or we could just do it globally, that would solve the user's problem in practice
at the expense of hermeticity
Personally, I'd prefer a targetted "just this one dependency" thing to limit the potential impact of loss of hermeticity.
@happy-kitchen-89482 @enough-analyst-54434 thanks again to both of you. You've both been so helpful!
❤️ 1
e
I've re-read all this to better grok what's going on. I got pulled in yesterday and only paged in maybe 20%. So, fwict what you need @plain-carpet-73994 is support for Pip's
--no-binary opencv-python
and plumbing for it in Pants, because today I can turn of wheels globally with Pex just fine (this uses Pip's
--no-binary :all:
under the covers) albeit at a glacial pace since opencv-python and numpy get compiled from sdists:
Copy code
$ pex --no-wheel opencv-python -o opencv-python.pex
$ pex-tools opencv-python.pex info -i4
{
    "bootstrap_hash": "8fb7e13eebcc79897c76921aa95cab8d49a4f71b",
    "build_properties": {
        "pex_version": "2.1.52"
    },
    "code_hash": "da39a3ee5e6b4b0d3255bfef95601890afd80709",
    "distributions": {
        "numpy-1.21.2-cp39-cp39-linux_x86_64.whl": "01f5a73964e180ad641ad496e8d412501eba2ae5",
        "opencv_python-4.5.3.56-cp39-cp39-linux_x86_64.whl": "32593f38258b17ce59ec9395aba04a9991f6a64f"
    },
    "emit_warnings": true,
    "ignore_errors": false,
    "includes_tools": false,
    "inherit_path": "false",
    "interpreter_constraints": [],
    "pex_hash": "a669293b22db7cbd5dd3c11f9d225e859a701658",
    "pex_path": null,
    "requirements": [
        "opencv-python"
    ],
    "strip_pex_env": true,
    "venv": false,
    "venv_bin_path": "false",
    "venv_copies": false,
    "pex_root": "/home/jsirois/.pex"
}
Here's a very old issue tracking plumbing Pex's
--build / --no-build / --wheel / --no-wheel
though in Pants: https://github.com/pantsbuild/pants/issues/5862. That still just gets you universal sdist use though (via
--no-wheel
like I used above). For just selected sdist use, there's https://github.com/pantsbuild/pants/issues/12090.