I am encountering errors when running pants comman...
# general
l
I am encountering errors when running pants commands in my local pyenv environment. The same commands work fine in CI/CD, and other users can run them locally. The errors occur when trying to build
requirements.pex
from lock files, and certain packages cannot be installed. Here's and example of the error logs:
Copy code
21:45:36.76 [ERROR] 1 Exception encountered:

  ProcessExecutionFailure: Process 'Building 22 requirements for requirements.pex from the build-support/databricks_lock.txt resolve: boto3==1.16.7, cleanco<2.2, enigma-data-catalog-sdk, enigma-enrichment, enigma-namedframes~=1.0.2, enigma-pyspark-commons, fuzzywuzzy, jellyfish, matplotlib==3.4.2, mlflow==1.20.2, numpy<1.24,>=1.20, pandas==1.2.4, plotly==5.1.0, probablepeople, protobuf==3.17.2, pyarrow==4.0.0, pyspark-test, pyspark==3.1.2, pytest, scikit-learn==0.24.1, scipy~=1.6.0, tldextract' failed with exit code 1.
stdout:

stderr:
There were 4 errors downloading required artifacts:
1. pandas 1.2.4 from <https://files.pythonhosted.org/packages/e8/81/f7be049fe887865200a0450b137f2c574647b9154503865502cfd720ab5d/pandas-1.2.4.tar.gz>
    ERROR: Command errored out with exit status 1: /Users/tom/.cache/pants/named_caches/pex_root/venvs/cac1718c056bb509f51fcdcc0c376b33deaaa8ec/80d537a6fbdf98a843c78d59de5b2b2db73ee10d/bin/python /Users/tom/.cache/pants/named_caches/pex_root/venvs/cac1718c056bb509f51fcdcc0c376b33deaaa8ec/80d537a6fbdf98a843c78d59de5b2b2db73ee10d/lib/python3.8/site-packages/pip install --ignore-installed --no-user --prefix /Users/tom/.cache/pants/named_caches/pex_root/pip_cache/.tmp/pip-build-env-nq8fgxoc/overlay --no-warn-script-location -v --no-binary :none: --only-binary :none: -i <https://pypi.org/simple/> --extra-index-url https://****@repo.artifactory.enigma.com/artifactory/api/pypi/pypi-local/simple -- setuptools wheel 'Cython>=0.29.21,<3' 'numpy==1.16.5; python_version=='"'"'3.7'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.17.3; python_version=='"'"'3.8'"'"' and platform_system!='"'"'AIX'"'"'' 'numpy==1.16.5; python_version=='"'"'3.7'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy==1.17.3; python_version=='"'"'3.8'"'"' and platform_system=='"'"'AIX'"'"'' 'numpy; python_version>='"'"'3.9'"'"'' Check the logs for full command output.
the packages that fail are
pandas==1.2.4
,
pyarrow==4.0.0
and
scikit-learn=0.24.1
any ideas how to debug this? so far I have tried: • deleting pants cache • creating a fresh
pyenv
virtualenv • uninstalling/reinstalling
pyenv
• regenerating lock files with
./pants generate-lockfiles
e
So, let's take Pandas as the exemplar. 1st notice the URL - an sdist (.tar.gz).
So Pandas is platform specific. That means no platform specific pre-built wheel matches your machine. Please verify that is true.
And, if true, that means your machine needs to be equipped to build the sdist. That means gcc or clang / various env vars, etc.
So, clearly you are on Mac ARM - right?
l
yes
verifying now... I created I new virtualenv and ran
pip install pandas==1.2.4
e
Ok, if that succeeds, then you need to figure out which env vars are critical to that success and let Pants see them with: https://www.pantsbuild.org/docs/reference-subprocess-environment#env_vars
But, if your org has > some number of Mac ARM users, each one needs to go through this build. At that point you might consider pre-builing M1 wheels and hosting on an internal PyPI you add here: https://www.pantsbuild.org/docs/reference-python-repos#indexes
👀 1
OSS folks will generally not do this for you since Mac ARM is expensive and not CI friendly. They have to shell out for Mac ARM resources in other words.
l
the install succeeded, so I can try adding the env vars
I know we have other M1 users, and I don't think they had the same issue, I can confirm this
for context I work with @rich-london-74860 @polite-angle-82480 at Enigma
👋 1
after some further testing, turns out I can install
pandas==1.2.4
in a fresh 3.9 virtualenv, but it fails in 3.8 with the same errors as in pants
🤯 1
e
Ok. So, what are your repo's interpreter constraints set at?
M1's seem to be generally known to only work well with Python 3.9+
Does your repo need to work with 3.8?
Note the default - horribly too wide a range for most projects.
@lively-zebra-24587 this implies your M1 co-workers aren't so unfortunate as to have Python 3.8 installed (or at least visible to Pants). Do you need 3.8 installed for some reason? Its much more hacky to uninstall, but another option. Ideally you have ICs configured that work for all your dev base.
r
Does your repo need to work with 3.8?
Unfortunately, yes. We have a lot of code that runs on Spark managed by Databricks and they set the runtime environment. We’ve standardized on using their 9.1 LTS runtime environment, which uses python 3.8
@lively-zebra-24587 I think the simplest solution for you might be to run pants inside of the docker image. If you need to install
libpostal
, then that would have been the necessary solution anyways. That’s what runs in CI. @square-city-8441 also uses M1 and I think he actually may have encountered the same problem. He mostly runs the pants repo in the docker image.
e
So, one option folks use for M1s IIRC is https://www.pantsbuild.org/docs/options#pantsrc-file
Define your own personal IC and make it ">=3.9,<X" where X is your corporate upper bound.
The
~/.pants.rc
is
pants.toml
format, but just for you, not checked in, and it can over-ride values. Since Python 3.8 is basically broken on all M1s, this is a safe global setting. Never use 3.8 for any Pants built project.
❤️ 1
@rich-london-74860 and @lively-zebra-24587 hopefully that works for you and all other M1 users.
If you need to deploy to databricks from your machine of course, that won't work and you need a container. But for non-deploy work, maybe useful.
l
hmmm I tried the
.pants.rc
fix but that didn't work either - still getting the same errors even though it's now using 3.9
pip
(it was using 3.8 before)
Copy code
ERROR: Command errored out with exit status 1: /Users/tom/.cache/pants/named_caches/pex_root/venvs/cac1718c056bb509f51fcdcc0c376b33deaaa8ec/bb37135616d3b3888558cb7ec70550dd4fca7685/bin/python /Users/tom/.cache/pants/named_caches/pex_root/venvs/cac1718c056bb509f51fcdcc0c376b33deaaa8ec/bb37135616d3b3888558cb7ec70550dd4fca7685/lib/python3.9/site-packages/pip install
e
You can add
--keep-sandboxes=on_failure
to your command line. Then , just before the backtrace you'll see a log line about preserving a
/tmp/pants-sandbox*
dir. You can then cd there and edit
__run.sh
to include
--preserve-pip-download-log
. Now run
./__run.sh
and look for another "preserving ..." line at the top of the output. That will be the full pip log file and you can get details on the error.
Now, one thing different about
__run.sh
from real Pants sandboxing - it lets your env vars leak in. So if
__run.sh
succeeds, that cements an env var issue and further edits to
__run.sh
to use
env -i A=B C=D ... command line
will be needed to simulate that.
l
Thanks for all your help John. I will use the docker route for now, and investigate the sandbox files in the morning