crooked-country-1937
01/23/2023, 1:58 PMModuleNotFoundError: No module named 'pyarrow'
when running tests.
Repo reproducing the issue: https://github.com/adityav/pants-python-tryoutscrooked-country-1937
01/23/2023, 2:01 PM➜ ./pants test helloworld/sparkjob/hellospark_test.py
19:14:59.81 [INFO] Completed: Building 4 requirements for requirements.pex from the python-default.lock resolve: pandas==1.5.1, pyarrow==6.0.1, pyspark[sql]==3.3.1, pytest==6.2.5
19:15:01.36 [INFO] Completed: Building pytest_runner.pex
19:15:11.84 [INFO] Completed: Run Pytest - helloworld/sparkjob/hellospark_test.py:tests succeeded.
✓ helloworld/sparkjob/hellospark_test.py:tests succeeded in 10.35s.
On adding a constraints file in `pants.toml`:
[python.resolves_to_constraints_file]
python-default = "constraints-3.10.txt"
Getting error:
➜ ./pants generate-lockfiles
19:17:41.08 [INFO] Initializing scheduler...
19:17:41.40 [INFO] Scheduler initialized.
19:18:01.38 [INFO] Completed: Generate lockfile for python-default
19:18:01.39 [INFO] Wrote lockfile for the resolve `python-default` to python-default.lock
./pants test helloworld/sparkjob/hellospark_test.py
19:19:25.82 [ERROR] Completed: Run Pytest - helloworld/sparkjob/hellospark_test.py:tests failed (exit code 2).
============================= test session starts ==============================
platform darwin -- Python 3.10.9, pytest-7.0.1, pluggy-1.0.0
rootdir: /private/var/folders/0t/dmh8ynt13pbc2y2stvb0by6c0000gn/T/pants-sandbox-aA4jsU
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 0 items / 1 error
==================================== ERRORS ====================================
___________ ERROR collecting helloworld/sparkjob/hellospark_test.py ____________
ImportError while importing test module '/private/var/folders/0t/dmh8ynt13pbc2y2stvb0by6c0000gn/T/pants-sandbox-aA4jsU/helloworld/sparkjob/hellospark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
helloworld/sparkjob/hellospark_test.py:1: in <module>
from helloworld.sparkjob import hellospark
helloworld/sparkjob/hellospark.py:5: in <module>
import pyarrow as pa
E ModuleNotFoundError: No module named 'pyarrow'
- generated xml file: /private/var/folders/0t/dmh8ynt13pbc2y2stvb0by6c0000gn/T/pants-sandbox-aA4jsU/helloworld.sparkjob.hellospark_test.py.tests.xml -
=========================== short test summary info ============================
ERROR helloworld/sparkjob/hellospark_test.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 0.82s ===============================
✕ helloworld/sparkjob/hellospark_test.py:tests failed in 1.65s.
pyarrow is shown as a dependency, so not sure whats causing it:
➜ ./pants dependencies --transitive helloworld/sparkjob/hellospark_test.py
//:reqs#pandas
//:reqs#pyarrow
//:reqs#pyspark
//:reqs#pytest
//requirements.txt:reqs
helloworld/sparkjob/hellospark.py
crooked-country-1937
01/23/2023, 9:56 PMpants.toml
.
[pytest]
version = "pytest==6.2.5"
lockfile = "pytest.lock"
I don’t know how how pytest and constraints file interact, but I noticed the tests were using pytest-7.0.1
, while the constraints file specified 6.2.5.enough-analyst-54434
01/24/2023, 12:18 AMsrc/
. The key thing being the pytest tool PEX and the 3rdparty requirements PEX are seperate. Spark does complicated things and if it loads pytest from the one PEX and not the other, I expect that affects what it can see in terms of other dependencies. By aligning the pytest versions, you don't force spark to look in the wrong PEX for other dependencies. Again - a handwave. There are lots of details there to pn down and prove.enough-analyst-54434
01/24/2023, 12:20 AMcrooked-country-1937
01/24/2023, 5:21 AMcrooked-country-1937
01/24/2023, 5:22 AMenough-analyst-54434
01/24/2023, 5:34 AM