Hey pants, some of our tests inconsistently fail i...
# general
Hey pants, some of our tests inconsistently fail in CI and I was wondering if I could get some guidance debugging this. We are running
./pants\ --print-stacktrace\ -ldebug\ --pex-verbosity=9\ run\ ci/src/python/pynest_ci:runner\ --\ regular_unit_test
, but it doesn’t seem to give enough information to figure out the issue. It looks like
builds fine, but when the test tries to build
it can fail.
Copy code
17:25:57.58 [INFO] Starting: Building requirements.pex with 31 requirements: GPUtil==1.4.0, boto3-stubs[s3]==1.18.35, boto3==1.17.112, botocore==1.20.112, catboost==0.26.1, daemonize==2.5.0, dask-gateway==0.9.0, dask[dataframe]==... (434 characters truncated)
17:26:25.49 [INFO] Completed: Building requirements.pex with 31 requirements: GPUtil==1.4.0, boto3-stubs[s3]==1.18.35, boto3==1.17.112, botocore==1.20.112, catboost==0.26.1, daemonize==2.5.0, dask-gateway==0.9.0, dask[dataframe]==... (434 characters truncated)
17:26:25.49 [INFO] Starting: Building pytest_runner.pex
17:26:25.52 [ERROR] Exception caught: (pants.engine.internals.scheduler.ExecutionError)
pid 2787 -> /root/.cache/pants/named_caches/pex_root/venvs/8427a8787e07d8e0828aa91a7c0695bba322863d/f28b3dbba3c9dae1b4357adde5b079b8b3ca9fac/pex --disable-pip-version-check --no-python-version-warning --exists-action a --isolated -q --cache-dir /root/.cache/pants/named_caches/pex_root --log /tmp/process-executionHBj8Tl/.tmp/tmph9shut22/pip.log download --dest /tmp/process-executionHBj8Tl/.tmp/tmplh07wwwf/usr.local.bin.python3.7 GPUtil==1.4.0 boto3-stubs[s3]==1.18.35 boto3==1.17.112 botocore==1.20.112 catboost==0.26.1 daemonize==2.5.0 dask-gateway==0.9.0 dask[dataframe]==2021.7.2 distributed==2021.7.2 future==0.18.2 hypothesis==6.17.4 jellyfish==0.8.8 moto==1.3.14 numpy<1.21 pandas==1.2.5 parmap==1.5.2 protobuf==3.17.3 pyarrow==5.0.0 pydantic==1.7.4 python-json-logger==2.0.2 pyyaml==5.4.1 requests==2.22.0 scikit-learn==0.24.0 shap==0.31.0 simplejson==3.17.5 snowflake-sqlalchemy==1.2.3 torch==1.9.0 tqdm==4.62.2 types-PyYAML==5.4.6 types-protobuf==3.17.4 types-requests==2.25.6 --index-url <https://pypi.org/simple/> --extra-index-url <https://pypi.cbhq.net/> --find-links ~/wheelhouse/ --retries 5 --timeout 15 exited with -9 and STDERR:
I attached the output for the failing tests as well.
exit code is a SIGKILL which, on Linux, is a sign of the OOM Killer. Yu can tweak some Pants options specifically for CI to help get past this: https://www.pantsbuild.org/docs/using-pants-in-ci#tuning-resource-consumption-advanced But, to 1st verify this is what's going on, you could check your kernel logs for OOM Killer messages.
👍 1