Has anyone seen this error in CI before? Our tests...
# general
o
Has anyone seen this error in CI before? Our tests run just fine locally on our desktops but we get this error in CI on 2.5.1rc1 and 2.5.2rc2.
Copy code
Has anyone seen this error in CI before?

Traceback (most recent call last):
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/bin/local_pants_runner.py", line 234, in _run_inner
    return self._perform_run(goals)
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/bin/local_pants_runner.py", line 173, in _perform_run
    return self._perform_run_body(goals, poll=False)
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/bin/local_pants_runner.py", line 195, in _perform_run_body
    poll_delay=(0.1 if poll else None),
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/init/engine_initializer.py", line 136, in run_goal_rules
    goal_product, params, poll=poll, poll_delay=poll_delay
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/engine/internals/scheduler.py", line 530, in run_goal_rule
    self._raise_on_error([t for _, t in throws])
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/engine/internals/scheduler.py", line 494, in _raise_on_error
    wrapped_exceptions=tuple(t.exc for t in throws),
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:
Engine traceback:
  in select
  in pants.core.goals.test.run_tests
  in pants.core.goals.test.enrich_test_result (###)
  in pants.backend.python.goals.pytest_runner.run_python_test (###)
  in pants.backend.python.goals.pytest_runner.setup_pytest_for_target
  in pants.backend.python.util_rules.pex.create_pex
  in pants.backend.python.util_rules.pex.build_pex (requirements.pex)
  in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
  File "/cicache/.cache/pants/setup/bootstrap-Linux-x86_64/2.5.1rc2_py37/lib/python3.7/site-packages/pants/engine/process.py", line 254, in fallible_to_exec_result_or_raise
    description.value,
pants.engine.process.ProcessExecutionFailure: Process 'Building requirements.pex with 20 requirements: google-api-core==1.29.0; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.6.0", google-cloud-storage==1.38.0; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.6.0", grpcio==1.38.0, kfp==1.6.0; python_full_version >= "3.5.3", opentelemetry-api==0.17b0; python_version >= "3.5", opentelemetry-exporter-opencensus==0.17b0; python_version >= "3.5", opentelemetry-exporter-prometheus==0.17b0; python_version >= "3.5", opentelemetry-instrumentation-grpc==0.17b0; python_version >= "3.5", opentelemetry-instrumentation-sqlalchemy==0.17b0; python_version >= "3.5", opentelemetry-instrumentation==0.17b0; python_version >= "3.5", opentelemetry-sdk==0.17b0; python_version >= "3.5", prometheus-client==0.10.1; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.4.0", protobuf==3.17.2, pytest==6.2.4; python_version >= "3.6", python-json-logger==0.1.11; python_version >= "2.7", pytorch-lightning==1.2.7; python_version >= "3.6", pyyaml==5.3.1; python_full_version >= "3.5.3" and python_version >= "3.6", retrying==1.3.3, setuptools==57.0.0, torch==1.8.1; python_full_version >= "3.6.2"' failed with exit code 1.
stdout:
stderr:
pid 378 -> /cicache/.cache/pants/named_caches/pex_root/venvs/648cade9a5d2ba49503a6052c1f39aef29a05513/cc48858524bf3820a737c19c7f14d57d4a5c4208/pex --disable-pip-version-check --no-python-version-warning --exists-action a --isolated -q --cache-dir /cicache/.cache/pants/named_caches/pex_root --log /tmp/process-executionsnJ0oz/.tmp/tmpucx12map/pip.log download --dest /tmp/process-executionsnJ0oz/.tmp/tmp3u2ob2jw/usr.local.bin.python3.7 --constraint requirements.txt google-api-core==1.29.0; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.6.0" google-cloud-storage==1.38.0; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.6.0" grpcio==1.38.0 kfp==1.6.0; python_full_version >= "3.5.3" opentelemetry-api==0.17b0; python_version >= "3.5" opentelemetry-exporter-opencensus==0.17b0; python_version >= "3.5" opentelemetry-exporter-prometheus==0.17b0; python_version >= "3.5" opentelemetry-instrumentation-grpc==0.17b0; python_version >= "3.5" opentelemetry-instrumentation-sqlalchemy==0.17b0; python_version >= "3.5" opentelemetry-instrumentation==0.17b0; python_version >= "3.5" opentelemetry-sdk==0.17b0; python_version >= "3.5" prometheus-client==0.10.1; (python_version >= "2.7" and python_full_version < "3.0.0") or python_full_version >= "3.4.0" protobuf==3.17.2 pytest==6.2.4; python_version >= "3.6" python-json-logger==0.1.11; python_version >= "2.7" pytorch-lightning==1.2.7; python_version >= "3.6" pyyaml==5.3.1; python_full_version >= "3.5.3" and python_version >= "3.6" retrying==1.3.3 setuptools==57.0.0 torch==1.8.1; python_full_version >= "3.6.2" --index-url <https://pypi.org/simple/> --retries 5 --timeout 15 exited with -9 and STDERR:
None
FATAL: exception not rethrown
/bin/bash: line 191:    59 Aborted                 (core dumped) ./pants --print-stacktrace --tag="-integration_tests" test ::
w
it looks like the pex-embedded-pip crashed in native code, which is pretty surprising.
that’s most likely to do with the python install that is being used in this environment. do you know which python you’re expecting to use in the CI environment? https://www.pantsbuild.org/docs/python-interpreter-compatibility
e
@witty-crayon-22786 are you sure about that? It looks like Pip got SIGKILLed (OOM killer?) and then pants coredumped.
w
oh, yikes. yea, you’re right.
e
I'm trying to rig a repro of a Process that gets killed to see what happens.
w
@enough-analyst-54434: although, how do we know that pip was killed here? it exited with 1 rather than with a signal per-se. i could believe that the Abort was due to low memory though.
e
exited with -9
w
ahh, pex exited with 1, but pip with -9. good call.
e
I missed the outer Pex exit code 1 - that makes trying to repro unlikely. I thought maybe Pants was not handling an unexpected :rocess death well, but the Process death was fully normal.
w
yea. feels like two victims of low memory potentially
e
Yup.
@orange-beach-75711 - in conclusion we're guessing your CI run of Pants is running into a low-memory condition resulting in the Linux OOMKiller (https://www.kernel.org/doc/gorman/html/understand/understand016.html) killing processes to make memory available. In this case it looks like a Pip resolve spawned by Pex (which was Spawned by Pants), gets picked out for killing. Subsequently it seems Pants core dumps. If our guess is right, the 1st event - Pip being OOMKilled, is "normal", but can be avoided by ensuring either more memry is available in CI or Pants is configured to use less there (tune down various parallelism options). The 2nd event is a bit more troubling. Pants has coredumped which it shouldn't really ever do. This could be due to a failure to allocate memory not handled well, but its unclear.
See here for some tips for configuring Pants for CI if you haven't already. It points out some of the resource consumption knobs you can tweak: https://www.pantsbuild.org/docs/using-pants-in-ci#configuring-pants-for-ci-pantscitoml-optional
o
oh wow, this is so helpful @enough-analyst-54434 and @witty-crayon-22786! Really appreciate the super quick turnaround and the analysis. Let me try increasing the memory available to our CI hosts. Will also check out the CI doc! Will get back to you guys in a day or two after testing this out again. Thanks so much!