Hi all, we're seeing an issue with using privately...
# general
g
Hi all, we're seeing an issue with using privately hosted git repos as sources for code. We can create a
.lock
file without an issues (example artifact):
Copy code
{
  "artifacts": [
    {
      "algorithm": "sha256",
      "hash": "03c92a0689e4ba4670c705d8acdcf57fa545ba6ead2bd12813e1499b192ad947",
      "url": "git+ssh://****@bitbucket.org/my_org/my_package@v3.9.0#egg=my_package[pydantic]"
    }
  ],
  "project_name": "my_package",
  "requires_dists": [
    "jsonref; extra == \"pydantic\"",
    "pydantic>=2.4.0; extra == \"pydantic\""
  ],
  "requires_python": ">=3.9",
  "version": "3.9.0"
}
However, when I run my tests I see the following issue when installing requirements:
Copy code
stderr:

There was 1 error downloading required artifacts:

1. my_package 3.9 from git+ssh://****@bitbucket.org/my_org/my_package@v3.9.0#egg=my_package[pydantic]

    Expected sha256 hash of 03c92a0689e4ba4670c705d8acdcf57fa545ba6ead2bd12813e1499b192ad947 when downloading my_package but hashed to 4f4ef92aac4eaaf9559528b64085a2c7dee2ee45441e6a4bb4d83b35f85a7ae3.
If we replace the hash in the lockfile with the expected hash at installation, then the process works. But it breaks again every time someone regenerates the lockfile. Is anyone able to explain what is happening with the hashing and what might be different between the lockfile generation and the insallation?
b
Hmmmm, sorry for the trouble. I am not exactly sure what might be the difference between the two times. Maybe different environment variables are leading to different packages being built or something? To debug, I think you might need to look in the sandboxes for the processes in question and reduce to a minimal example https://www.pantsbuild.org/2.21/docs/using-pants/troubleshooting-common-issues#debug-tip-inspect-the-sandbox-with---keep-sandboxes
p
Hm, I'm running into this too. Every time I run the following `__run.sh`:
Copy code
#!/usr/bin/env bash
# This command line should execute the same process as pants did internally.
env \
  -i \
  CPPFLAGS= LANG=en_US.UTF-8 LDFLAGS= \
  PATH=$'/opt/homebrew/lib/ruby/gems/3.3.0/bin:/opt/homebrew/opt/ruby/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin' \
  PEX_IGNORE_RCFILES=true \
  PEX_PYTHON=/Users/jake/Library/Caches/nce/d52b8d4e72c7ef97d778057e19dcd2dcb2c048c9ac07385c5ea9cc2954831d4b/bindings/venvs/2.21.0/bin/python3.9 \
  PEX_ROOT=.cache/pex_root \
  PEX_SCRIPT=pex3 \
  /Users/jake/Library/Caches/nce/d52b8d4e72c7ef97d778057e19dcd2dcb2c048c9ac07385c5ea9cc2954831d4b/bindings/venvs/2.21.0/bin/python3.9 \
  ./pex lock create --tmpdir .tmp --no-emit-warnings \
  --python-path $'/opt/homebrew/lib/ruby/gems/3.3.0/bin:/opt/homebrew/opt/ruby/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin' \
  $'--output=lock.json' \
  $'--style=universal' \
  --pip-version \
  24.0 \
  --resolver-version pip-2020-resolver \
  --target-system linux \
  --target-system mac \
  $'--indent=2' \
  --no-pypi \
  $'--index=<https://pypi.org/simple/>' \
  --manylinux manylinux2014 \
  --interpreter-constraint $'CPython==3.12.*' $'unstructured[pdf]@ <git+ssh://git@github.com/jake-normal/unstructured@e1a652ad06f8fc1d819ceb2ace73bf0ee285cf9e>'
followed by this, which gets the hash of `unstructured`:
Copy code
cat lock.json | jq '.locked_resolves | map(.locked_requirements) | first | map(select(.project_name == "unstructured")) | first | .artifacts | first | .hash'
I get a different hash result. Why might that be? Perhaps this has to do with nondeterminism in `unstructured`'s build script?
(I've slimmed down
__run.sh
a bit to narrow the problem space)
I've thought this through a bit. I imagine many packages have some nondeterminism in their builds that would theoretically result in different shas, but that this doesn't normally cause problems because PyPI etc. are immutable artifact caches, so even if the build step results in a different hash every time, fetching an artifact always returns the same cache. However, this falls apart when you use a git reference. Does this sound right? A counterpoint might be building packages that don't have wheels, which I would expect to run into the same issue?
Does anyone with more experience with pex have a better sense of what is happening here?
f
We found a potential solution to this in the end. Within the requirements.txt file it became:
my_package[package_name] @ <git+ssh://git@bitbucket.org/path/to/repo@v3.9.0>
We added this to the pants.toml file and ensure users had ssh key for pulling the package.
Copy code
[subprocess-environment]

env_vars.add = ["SSH_AUTH_SOCK","HOME"]
good luck 👍
🙏 1