Hey pants, some of our tests inconsistently fail i...
# general
a
Hey pants, some of our tests inconsistently fail in CI and I was wondering if I could get some guidance debugging this. We are running
./pants\ --print-stacktrace\ -ldebug\ --pex-verbosity=9\ run\ ci/src/python/pynest_ci:runner\ --\ regular_unit_test
, but it doesn’t seem to give enough information to figure out the issue. It looks like
requirements.pex
builds fine, but when the test tries to build
pytest_runner.pex
it can fail.
Copy code
17:25:57.58 [INFO] Starting: Building requirements.pex with 31 requirements: GPUtil==1.4.0, boto3-stubs[s3]==1.18.35, boto3==1.17.112, botocore==1.20.112, catboost==0.26.1, daemonize==2.5.0, dask-gateway==0.9.0, dask[dataframe]==... (434 characters truncated)
17:26:25.49 [INFO] Completed: Building requirements.pex with 31 requirements: GPUtil==1.4.0, boto3-stubs[s3]==1.18.35, boto3==1.17.112, botocore==1.20.112, catboost==0.26.1, daemonize==2.5.0, dask-gateway==0.9.0, dask[dataframe]==... (434 characters truncated)
17:26:25.49 [INFO] Starting: Building pytest_runner.pex
17:26:25.52 [ERROR] Exception caught: (pants.engine.internals.scheduler.ExecutionError)
...
pid 2787 -> /root/.cache/pants/named_caches/pex_root/venvs/8427a8787e07d8e0828aa91a7c0695bba322863d/f28b3dbba3c9dae1b4357adde5b079b8b3ca9fac/pex --disable-pip-version-check --no-python-version-warning --exists-action a --isolated -q --cache-dir /root/.cache/pants/named_caches/pex_root --log /tmp/process-executionHBj8Tl/.tmp/tmph9shut22/pip.log download --dest /tmp/process-executionHBj8Tl/.tmp/tmplh07wwwf/usr.local.bin.python3.7 GPUtil==1.4.0 boto3-stubs[s3]==1.18.35 boto3==1.17.112 botocore==1.20.112 catboost==0.26.1 daemonize==2.5.0 dask-gateway==0.9.0 dask[dataframe]==2021.7.2 distributed==2021.7.2 future==0.18.2 hypothesis==6.17.4 jellyfish==0.8.8 moto==1.3.14 numpy<1.21 pandas==1.2.5 parmap==1.5.2 protobuf==3.17.3 pyarrow==5.0.0 pydantic==1.7.4 python-json-logger==2.0.2 pyyaml==5.4.1 requests==2.22.0 scikit-learn==0.24.0 shap==0.31.0 simplejson==3.17.5 snowflake-sqlalchemy==1.2.3 torch==1.9.0 tqdm==4.62.2 types-PyYAML==5.4.6 types-protobuf==3.17.4 types-requests==2.25.6 --index-url <https://pypi.org/simple/> --extra-index-url <https://pypi.cbhq.net/> --find-links ~/wheelhouse/ --retries 5 --timeout 15 exited with -9 and STDERR:
None
I attached the output for the failing tests as well.
e
A
-9
exit code is a SIGKILL which, on Linux, is a sign of the OOM Killer. Yu can tweak some Pants options specifically for CI to help get past this: https://www.pantsbuild.org/docs/using-pants-in-ci#tuning-resource-consumption-advanced But, to 1st verify this is what's going on, you could check your kernel logs for OOM Killer messages.
👍 1