Hello :wave: I’m getting this error when I run `....
# general
r
Hello 👋 I’m getting this error when I run
./pants test ::
in a docker container:
Copy code
02:23:24.27 [ERROR] 1 Exception encountered:

  ProcessExecutionFailure: Process 'Building 17 requirements for requirements.pex from the build-support/databricks_lock.txt resolve: boto3==1.16.7, enigma-namedframes~=1.0.2, matplotlib==3.4.2, mlflow==1.20.2, numpy<1.24,>=1.20, pandas==1.2.4, plotly==5.1.0, probablepeople, protobuf==3.17.2, pyarrow==4.0.0, pyspark-test, pyspark==3.1.2, pytest, scikit-learn==0.24.1, scipy~=1.6.0, tldextract, types-setuptools' failed with exit code 1.
stdout:

stderr:
Build of BuildRequest(target=LocalInterpreter(id='usr.bin.python3.8', platform=Platform(platform='manylinux_2_27_x86_64', impl='cp', version='3.8.0', version_info=(3, 8, 0), abi='cp38'), marker_environment=MarkerEnvironment(implementation_name='cpython', implementation_version='3.8.0', os_name='posix', platform_machine='x86_64', platform_python_implementation='CPython', platform_release='5.15.49-linuxkit', platform_system='Linux', platform_version='#1 SMP Tue Sep 13 07:51:46 UTC 2022', python_full_version='3.8.0', python_version='3.8', sys_platform='linux'), interpreter=PythonInterpreter('/usr/bin/python3.8', PythonIdentity('/usr/bin/python3.8', 'cp38', 'cp38', 'manylinux_2_27_x86_64', (3, 8, 0)))), source_path='/root/.cache/pants/named_caches/pex_root/downloads/5e25ebb18756e9715f4d26848cc7e558035025da74b4fc325a0ebc05ff538e65/pyspark-3.1.2.tar.gz', fingerprint='5e25ebb18756e9715f4d26848cc7e558035025da74b4fc325a0ebc05ff538e65') produced 2 artifacts; expected 1:
0. cp38-cp38-manylinux_2_27_x86_64.3ec80d51ce26438a86c97b71d562e96e
1. pyspark-3.1.2-py2.py3-none-any.whl
It looks like there is some function that is expected to create a single artifact (likely 1 above), but winds up creating 2 artifacts in this environment. If I include the parameter
--keep-sandboxes=on_failure
, then it preserves a directory with the following files:
Copy code
./__run.sh
./source_files
./.tmp
./pex
./.cache
./.cache/pex_root
./build-support
./build-support/databricks_lock.txt
__run.sh
includes this command:
Copy code
/usr/bin/python3.8 ./pex --tmpdir .tmp --jobs 6 --python-path $'/databricks/python3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' --output-file requirements.pex --no-emit-warnings --python /usr/bin/python3.8 $'--sources-directory=source_files' $'boto3==1.16.7' $'enigma-namedframes~=1.0.2' $'matplotlib==3.4.2' $'mlflow==1.20.2' $'numpy<1.24,>=1.20' $'pandas==1.2.4' $'plotly==5.1.0' probablepeople $'protobuf==3.17.2' $'pyarrow==4.0.0' pyspark-test $'pyspark==3.1.2' pytest $'scikit-learn==0.24.1' $'scipy~=1.6.0' tldextract types-setuptools --lock build-support/databricks_lock.txt --no-pypi $'--index=<https://pypi.org/simple/>' $'--index=https://*****:*****@**********/pypi/pypi-local/simple' --manylinux manylinux2014 --layout packed
which outputs the
stderr
message from above. Although this fails in a docker container locally, this same command in the same docker image works in CI/CD. No one else that I work with has reported the same error and I have tried clearing all of my caches. Installing all of the dependencies listed in that command in
__run.sh
with
pip install
works.. Lastly, this all used to work for me until very recently (I think earlier this week).= Any thoughts?
e
Can you `rm -rf`the directory
PEX_ROOT
is set to in
__run.sh
and then run
PEX_VERBOSE=9 ./__run.sh
and provide the full output?
FWIW 3.8.0 is an odd / scary version of Python to be using. Both very old and very new; i.e. 3.8 is all the way up to 3.8.16 - there have been many bugs fixed IOW.
And if you could provide the Pants version, that would be helpful.
@rich-london-74860 FWIW, pyenv 3.8.0 worked fine for me. This is using latest Pex (2.1.125):
Copy code
$ pyenv install 3.8.0
Downloading Python-3.8.0.tar.xz...
-> <https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tar.xz>
Installing Python-3.8.0...
Installed Python-3.8.0 to /home/jsirois/.pyenv/versions/3.8.0
$ pex --python ~/.pyenv/versions/3.8.0/bin/python "pyspark==3.1.2" --no-binary --intransitive --ignore-errors -o pyspark-no-deps.pex

# The error is expected - I built the PEX with no deps, but the backtrace proves that pyspark comes from a "wheel" within the PEX.
$ ~/.pyenv/versions/3.8.0/bin/python pyspark-no-deps.pex -c 'import pyspark'
Traceback (most recent call last):
  File "/home/jsirois/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jsirois/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/__main__.py", line 106, in <module>
    bootstrap_pex(__entry_point__, execute=__execute__, venv_dir=__venv_dir__)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex_bootstrapper.py", line 615, in bootstrap_pex
    pex.PEX(entry_point).execute()
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 560, in execute
    sys.exit(self._wrap_coverage(self._wrap_profiling, self._execute))
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 467, in _wrap_coverage
    return runner(*args)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 498, in _wrap_profiling
    return runner(*args)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 581, in _execute
    return self.execute_interpreter()
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 663, in execute_interpreter
    return self.execute_content("-c <cmd>", content, argv0="-c")
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 774, in execute_content
    return cls.execute_ast(name, program, argv0=argv0)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/pex.py", line 792, in execute_ast
    exec_function(program, globals_map)
  File "/home/jsirois/.pex/unzipped_pexes/64a6095bab8c21d882462ac5fe40fb4c06230499/.bootstrap/pex/compatibility.py", line 109, in exec_function
    exec (ast, globals_map, locals_map)
  File "-c <cmd>", line 1, in <module>
  File "/home/jsirois/.pex/installed_wheels/9490046e8f900d1d1cadb2ac3da6151c739a01c031079b8a4760254dba0ed3bd/pyspark-3.1.2-py2.py3-none-any.whl/pyspark/__init__.py", line 53, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/home/jsirois/.pex/installed_wheels/9490046e8f900d1d1cadb2ac3da6151c739a01c031079b8a4760254dba0ed3bd/pyspark-3.1.2-py2.py3-none-any.whl/pyspark/rdd.py", line 34, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/home/jsirois/.pex/installed_wheels/9490046e8f900d1d1cadb2ac3da6151c739a01c031079b8a4760254dba0ed3bd/pyspark-3.1.2-py2.py3-none-any.whl/pyspark/java_gateway.py", line 29, in <module>
    from py4j.java_gateway import java_import, JavaGateway, JavaObject, GatewayParameters
ModuleNotFoundError: No module named 'py4j'
r
Here’s the output from
PEX_VERBOSE=9 ./__run.sh
FWIW 3.8.0 is an odd / scary version of Python to be using. Both very old and very new; i.e. 3.8 is all the way up to 3.8.16 - there have been many bugs fixed IOW.
This is in fact not the first time that you’ve brought up the problems with 3.8 to me 😆 https://pantsbuild.slack.com/archives/C046T6T9U/p1676431474327309?thread_ts=1676429638.205149&amp;cid=C046T6T9U Unfortunately, moving off of 3.8 would be a large endeavor
Pants version is 2.14.0
e
The prior point was about going from 3.8 to 3.9 or higher. This point is specifically about using 3.8.0 vs, say 3.8.16
You're using the oldest, buggiest, least secure (presumably) version of 3.8 possible.
@rich-london-74860 that output you provided contains no error. Is that the right output?
r
Yes, and as I mentioned in that thread, we do not really have a choice in the matter, this is the python version set by the platform we are using. Yes, that output does not contain an error and I am positive that it is the right output
e
Ok, I'm confused. The OP shows an error. Are you saying the error only happens for you when Pants runs but not when you execute the sandbox script manually?
r
When I run
./pants test ::
I get the error. If I
cd
to the sandbox temp directory and run
__run.sh
, then error also happens If I do this:
Can you `rm -rf`the directory
PEX_ROOT
is set to in
__run.sh
and then run
__run.sh
, it works
e
Ok. Can you do it your way (i.e. don't
rm -rf
the PEX_ROOT) then, but include an export of PEX_VERBOSE=9? Basically I'd like to see the error you're seeing, but with more detail. That's all I'm aiming for here.
r
Here is the output without removing
PEX_ROOT
e
Ok, thank you. I'll dig into this.
Ok, nothing obvious pops out @rich-london-74860. Could you try adding this snippet to your
pants.toml
and re-running though?:
Copy code
[pex-cli]
version = "v2.1.125"
known_versions = [
    "v2.1.125|macos_arm64|1da1ef933429f15b218c98c6b960f30adfd0221fc5284c1d8facac09923692f8|4080732",
    "v2.1.125|macos_x86_64|1da1ef933429f15b218c98c6b960f30adfd0221fc5284c1d8facac09923692f8|4080732",
    "v2.1.125|linux_x86_64|1da1ef933429f15b218c98c6b960f30adfd0221fc5284c1d8facac09923692f8|4080732",
    "v2.1.125|linux_arm64|1da1ef933429f15b218c98c6b960f30adfd0221fc5284c1d8facac09923692f8|4080732"
]
This will just rule out some Pex fix between 2.1.108 and now. I don't remember anything related, but it seems worth a quick shot if you're game to try that.
Yes, and as I mentioned in that thread, we do not really have a choice in the matter, this is the python version set by the platform we are using.
Sorry about that. I forgot that context and skimmed quick. I didn't realize Databricks was not only stuck on 3.8, but stuck on 3.8.0!
r
Add the
pex-cli
configuration, but it doesn’t seem to make a difference
stderr3
e
Hrm. Ok. Can you provide the PEX_VERBOSE=9 output using that new Pex version? That should be all I need from you to investigate further this weekend.
Oh - you did it!
I'm a slow typist.
Thanks. This will probably take a while, but I'll respond back here with more info or a Pex issue when I know more.
👍 1
Ok, yeah - the debug output you provided points straight at a design bug. This issue occurs when two separate Pants Process invocations try to build a PEX from the same lockfile and both of those PEX builds overlap on the same sdist. In that case there is an AtomicDirectory mechanism used, but it is used in race mode (vs exclusive lock mode which is what is used everywhere else in the codebase). This means the two processes can safely race. The issue is the work directory (
cp38-cp38-manylinux_2_27_x86_64.3ec80d51ce26438a86c97b71d562e96e
) of the racing process is visible when the failing process goes to collect its wheel. That code naively assumes the directory will be empty save for the built wheel - not taking into account racing process sibling workdirs. I'll file an issue here shortly and get out a fix. Thanks for finding this one. Its very old - goes back to ~2018 fall; so I'm surprised no one has hit this yet!
@rich-london-74860 the fix is out in Pex 2.1.126. You can use it by updating your
pants.toml
with:
Copy code
[pex-cli]
version = "v2.1.126"
known_versions = [
    "v2.1.126|macos_arm64|3bfd60f037b2edd4149067266536e37b4c67263d0db681e492e6071cb1a9adda|4080751",
    "v2.1.126|macos_x86_64|3bfd60f037b2edd4149067266536e37b4c67263d0db681e492e6071cb1a9adda|4080751",
    "v2.1.126|linux_x86_64|3bfd60f037b2edd4149067266536e37b4c67263d0db681e492e6071cb1a9adda|4080751",
    "v2.1.126|linux_arm64|3bfd60f037b2edd4149067266536e37b4c67263d0db681e492e6071cb1a9adda|4080751"
]
1
r
That worked! Thank you!