Running into an interesting lockfile generation issue with ` Pants #general

Running into an interesting lockfile generation is...

rhythmic-battery-45198

01/23/2023, 11:38 PM

Running into an interesting lockfile generation issue with

torch

package. 🧵

✅ 1

rhythmic-battery-45198

01/23/2023, 11:40 PM

Was seeing the following error raised after adding a

torch

dependency

Copy code

stderr:
Failed to resolve requirements from PEX environment @ /home/ci/.cache/pants/named_caches/pex_root/unzipped_pexes/a5788e2aff6bc00ba7d5959665bfbcc6e8421d1c.
Needed cp39-cp39-manylinux_2_35_x86_64 compatible dependencies for:
 1: nvidia-cuda-runtime-cu11==11.7.99; platform_system == "Linux"
    Required by:
      torch 1.13.1
    But this pex had no ProjectName(raw='nvidia-cuda-runtime-cu11', normalized='nvidia-cuda-runtime-cu11') distributions.
 2: nvidia-cudnn-cu11==8.5.0.96; platform_system == "Linux"
    Required by:
      torch 1.13.1
    But this pex had no ProjectName(raw='nvidia-cudnn-cu11', normalized='nvidia-cudnn-cu11') distributions.
 3: nvidia-cublas-cu11==11.10.3.66; platform_system == "Linux"
    Required by:
      torch 1.13.1
    But this pex had no ProjectName(raw='nvidia-cublas-cu11', normalized='nvidia-cublas-cu11') distributions.
 4: nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == "Linux"
    Required by:
      torch 1.13.1
    But this pex had no ProjectName(raw='nvidia-cuda-nvrtc-cu11', normalized='nvidia-cuda-nvrtc-cu11') distributions.

rhythmic-battery-45198

01/23/2023, 11:40 PM

Checked the lockfile. As expected, those nvidia distributions were missing.

Copy code

"requires_dists": [
            "opt-einsum>=3.3; extra == \"opt-einsum\"",
            "typing-extensions"
]

rhythmic-battery-45198

01/23/2023, 11:41 PM

Did some pex poking around. Isolated the interesting behavior

rhythmic-battery-45198

01/23/2023, 11:42 PM

With

--style universal

and

--target-system linux

options set, the lockfile is generated without the expected nvidia libs. This is the configuration used by pants when the error is raised.

Copy code

pex3 lock create --style=universal --resolver-version pip-2020-resolver --target-system linux torch==1.13.1

rhythmic-battery-45198

01/23/2023, 11:44 PM

If I instead use the default

--style strict

on my linux x86_64 machine, the nvidia distributions make it into the lockfile

Copy code

pex3 lock create --style=strict --resolver-version pip-2020-resolver torch==1.13.1

Copy code

"requires_dists": [
            "nvidia-cublas-cu11==11.10.3.66; platform_system == \"Linux\"",
            "nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == \"Linux\"",
            "nvidia-cuda-runtime-cu11==11.7.99; platform_system == \"Linux\"",
            "nvidia-cudnn-cu11==8.5.0.96; platform_system == \"Linux\"",
            "opt-einsum>=3.3; extra == \"opt-einsum\"",
            "typing-extensions"
          ],

rhythmic-battery-45198

01/23/2023, 11:46 PM

The root issue seems to be that different

torch

wheels list different transitive dependencies. It looks like the transitive dependencies are picked up from the first artifact listed in the lockfile, which looks like the same order that is listed on pypi. https://pypi.org/project/torch/1.13.1/#files

rhythmic-battery-45198

01/23/2023, 11:48 PM

With universal style, https://files.pythonhosted.org/packages/86/08/41315a205bcd103a9698fa8afafbb73a234db8791c[…]b10243a7/torch-1.13.1-cp39-cp39-manylinux2014_aarch64.whl is the first artifact. This whl lacks nvidia dependencies in its

METADATA

file.

Copy code

Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: typing-extensions
Provides-Extra: opt-einsum
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'

rhythmic-battery-45198

01/23/2023, 11:49 PM

With strict style, https://files.pythonhosted.org/packages/81/58/431fd405855553af1a98091848cf97741302416b01[…]09d3c422b3/torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl is the artifact. This whl does include nvidia dependencies in its

METADATA

file.

Copy code

Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: typing-extensions
Requires-Dist: nvidia-cuda-runtime-cu11 (==11.7.99) ; platform_system == "Linux"
Requires-Dist: nvidia-cudnn-cu11 (==8.5.0.96) ; platform_system == "Linux"
Requires-Dist: nvidia-cublas-cu11 (==11.10.3.66) ; platform_system == "Linux"
Requires-Dist: nvidia-cuda-nvrtc-cu11 (==11.7.99) ; platform_system == "Linux"
Provides-Extra: opt-einsum
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'

rhythmic-battery-45198

01/23/2023, 11:50 PM

I can probably work around this easily by adding explicit dependencies for the

torch

python_requirement

(https://www.pantsbuild.org/docs/python-third-party-dependencies#requirements-with-undeclared-dependencies)

rhythmic-battery-45198

01/23/2023, 11:52 PM

I am wondering if this is a • just a misbehaving package (not sure if all wheels should list the same dependencies) • something that I can configure pants/pex to handle today • a case not currently handled by pants/pex /end

enough-analyst-54434

01/24/2023, 12:26 AM

A core assumption to make locks feasible at all is the assumption you've discovered - all distributions for a given version, sdist or whl, have the same deps. Without that assumption, you'd be forced to download every wheel for a given version and that is prohibitive in time and bandwidth. The discourse thread for this rejected PEP talks about this necessary assumption: https://peps.python.org/pep-0665/

enough-analyst-54434

01/24/2023, 12:27 AM

Really the assumption has nothing to do with making locks feasible, it has to do with making resolving in general feasible.

enough-analyst-54434

01/24/2023, 12:28 AM

So, yeah, this is really torch doing something "legal" that is, however, hostile to tooling.

enough-analyst-54434

01/24/2023, 12:31 AM

@rhythmic-battery-45198 hopefully you can work around as you suggested - I really have no ideas on how to approach a fix for this. It would require making Pip work in a way that it does not currently.

enough-analyst-54434

01/24/2023, 12:36 AM

They use environment markers; so the state of those wheels is a bit strange. They could have uniform dependency metadata with more use of environment markers. So its not that the maintainers reject the use of environment markers. It seems more like they are unaware the pain they are causing or ... not sure.

rhythmic-battery-45198

01/24/2023, 12:42 AM

Ok thanks for the background! That all makes sense to me and is what I was suspecting. Workaround should be pretty painless

enough-analyst-54434

01/24/2023, 12:49 AM

Thanks for digging on that one. Definitely an interesting case. Concerning too. Torch is pretty popular. This is going to be a wider problem.

gentle-painting-24549

01/26/2023, 7:46 PM

I ran into the same issue here and resolved it by manually overriding my lock file’s

torch::requires_dists

array to this:

Copy code

"project_name": "torch",
"requires_dists": [
  "opt-einsum>=3.3; extra == \"opt-einsum\"",
  "typing-extensions",
  "nvidia-cublas-cu11==11.10.3.66; platform_system == \"Linux\"",
  "nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == \"Linux\"",
  "nvidia-cuda-runtime-cu11==11.7.99; platform_system == \"Linux\"",
  "nvidia-cudnn-cu11==8.5.0.96; platform_system == \"Linux\""
],

I’m testing out using dependency overrides in my BUILD files instead now to see if I can do that instead. Curious if you ever got this to work @rhythmic-battery-45198?

gentle-painting-24549

01/26/2023, 7:52 PM

Side note, I’m seeing those dependencies exposed from PyPI’s JSON endpoint, I wonder what PyTorch is doing to expose them there to not also expose that information to where pants needs it:

Copy code

curl -s <https://pypi.org/pypi/torch/1.13.1/json> | jq -r ".info.requires_dist[]"

enough-analyst-54434

01/26/2023, 8:01 PM

The json endpoint has the same problem I'd guess. If the METADATA is not the same in each wheel, it picks one and displays it. And they picked one you happen to like.

enough-analyst-54434

01/26/2023, 8:02 PM

That might change tomorrow if some - you'd think unimportant - sorting changes. New wheel gets picked as the random provider of METADATA for that version and you lose.

🤔 1

enough-analyst-54434

01/26/2023, 8:03 PM

I remember a thread with Donald Stufft where he points out this problem. He maintains PyPI.

🤔 1

rhythmic-battery-45198

01/26/2023, 8:03 PM

I added the nvidia packages to my requirements file and these overrides to my python_requirements target.

Copy code

overrides={
        "torch": {
            "dependencies": [
                "#nvidia-cuda-runtime-cu11",
                "#nvidia-cudnn-cu11",
                "#nvidia-cublas-cu11",
                "#nvidia-cuda-nvrtc-cu11"
            ]
        }
    }

🙌 2

gentle-painting-24549

01/26/2023, 9:01 PM

Perfect, this worked for me and is much better than manually editing my Lock file. Thank you! It’s not actually an issue, but I’m surprised that pants doesn’t inject these dependencies into the underlying

setup.py

files for the resulting python distribution -

torch==1.13.x

is the only dependency that shows up there.

gentle-painting-24549

01/26/2023, 10:15 PM

Even more painful, this list of requirements needs to be carefully maintained when upgrading to new torch versions. Looks like they are doubling the size of nvidia packages needed on the next release https://github.com/pytorch/pytorch/pull/89944 It looks like they inject dependencies at the wheel level instead of the package level since since they build CUDA, ROCm, and CPU versions of the wheels.

adamant-magazine-16751

03/16/2023, 11:40 AM

Hi. I faced this issue today. I see that it has already been discussed. The option to manually specify all requirements seems to solve the problem, but I'm looking for a potential way to simplify it. I'm don't mind building and running only on linux. I saw that there are parameters to

pex lock create

that control the lockfile generation, namely

--style=strict

and

--target-system=linux

. I couldn't find a way to set this in pants though. I'm also not sure what they do exactly, and what are the potential caveats of using them. Is it documented somewhere? So for example would running

pex lock create --style=strict

on CentOS make it unusable on other distros due to some manylinux compatibility issues (I'm sorry if this sound silly but I'm not well-read in this area 😓)

enough-analyst-54434

03/16/2023, 1:53 PM

Yeah, Pants does not allow you to pick lock style or the list of target systems. If it did, the strict style would have the problem you guessed. As to documentation, there is just CLI help for locking:

Copy code

$ pex.venv/bin/pex3 lock create --help
usage: pex3 lock create [-h] [--style {strict,sources,universal}]
                        [--target-system {linux,mac,windows}]
                        [--path-mapping PATH_MAPPINGS] [-o PATH]
                        [--indent INDENT] [-r FILE or URL]
                        [--constraints FILE or URL] [--python PYTHON]
                        [--python-path PYTHON_PATH]
                        [--interpreter-constraint INTERPRETER_CONSTRAINT]
                        [--platform PLATFORMS]
                        [--complete-platform COMPLETE_PLATFORMS]
                        [--manylinux [ASSUME_MANYLINUX]]
                        [--resolve-local-platforms]
                        [--resolver-version {pip-legacy-resolver,pip-2020-resolver}]
                        [--pip-version {vendored,20.3.4-patched,22.2.2,22.3,22.3.1,23.0,23.0.1}]
                        [--allow-pip-version-fallback] [--pypi] [-f PATH/URL]
                        [-i URL] [--retries RETRIES] [--timeout SECS]
                        [--proxy PROXY] [--cert PATH] [--client-cert PATH]
                        [--cache-ttl DEPRECATED] [-H DEPRECATED] [--pre]
                        [--wheel] [--build] [--prefer-wheel] [--force-pep517]
                        [--build-isolation] [--transitive] [-j JOBS]
                        [--preserve-pip-download-log] [-v] [--emit-warnings]
                        [--pex-root PEX_ROOT] [--disable-cache]
                        [--cache-dir CACHE_DIR] [--tmpdir TMPDIR]
                        [--rcfile RC_FILE]
                        [requirements ...]

optional arguments:
  -h, --help            show this help message and exit
  --style {strict,sources,universal}
                        The style of lock to generate. The 'strict' style is the
                        default and generates a lock file that contains exactly
                        the distributions that would be used in a local PEX
                        build. If an sdist would be used, the sdist is included,
                        but if a wheel would be used, an accompanying sdist will
                        not be included. The 'sources' style includes locks
                        containing both wheels and the associated sdists when
                        available. The 'universal' style generates a universal
                        lock for all possible target interpreters and platforms,
                        although the scope can be constrained via one or more
                        --interpreter-constraint. Of the three lock styles, only
                        'strict' can give you full confidence in the lock since
                        it includes exactly the artifacts that are included in
                        the local PEX you'll build to test the lock result with
                        before checking in the lock. With the other two styles
                        you lock un-vetted artifacts in addition to the 'strict'
                        ones; so, even though you can be sure to reproducibly
                        resolve those same un-vetted artifacts in the future,
                        they're still un-vetted and could be innocently or
                        maliciously different from the 'strict' artifacts you can
                        locally vet before committing the lock to version
                        control. The effects of the differences could range from
                        failing a resolve using the lock when the un-vetted
                        artifacts have different dependencies from their sibling
                        artifacts, to your application crashing due to different
                        code in the sibling artifacts to being compromised by
                        differing code in the sibling artifacts. So, although the
                        more permissive lock styles will allow the lock to work
                        on a wider range of machines /are apparently more
                        convenient, the convenience comes with a potential price
                        and using these styles should be considered carefully.
  --target-system {linux,mac,windows}
                        The target operating systems to generate the lock for.
                        This option applies only to `--style universal` locks and
                        restricts the locked artifacts to those compatible with
                        the specified target operating systems. By default,
                        'universal' style locks include artifacts for all
                        operating systems.

gentle-painting-24549

03/23/2023, 9:17 PM

🙃 back again with this issue for a completely different package,

open3d

. The dependencies you’ll find at

curl -s <https://pypi.org/pypi/open3d/0.17.0/json> | jq -r ".info.requires_dist[]"

are wildly different than the dependencies on the wheel we need @

open3d-0.17.0-cp38-cp38-manylinux_2_27_x86_64.whl

. In our case the results from

generate-lockfiles

changed in a matter of days (I believe this was triggered by a new wheel upload to PyPI) leading to a breaking change where it removed a large number of packages from a lock file.

enough-analyst-54434

03/23/2023, 9:59 PM

Not much I know how to do. What are you hoping for @gentle-painting-24549?

gentle-painting-24549

03/23/2023, 10:01 PM

Oh nothing at all - I was able to resolve the issue using

overrides

. Just wanted to leave a note about it here in case someone runs into the same issue with open3d. The issue was super perplexing for the team - it was very useful to have known about this.

enough-analyst-54434

03/23/2023, 10:03 PM

Gotcha.

10 Views

Open in Slack

Previous Next