Hello, I’m running into some issues in CI which I ...
# general
a
Hello, I’m running into some issues in CI which I can’t reproduce locally when running tests. Wondering if someone here knows what the cause might be. More details in the thread.
When running:
Copy code
./pants --no-pantsd --print-stacktrace -ldebug --pex-verbosity=9 --changed-since=origin/main --changed-dependees=transitive test
I get the error:
Copy code
15:07:14.13 [ERROR] 1 Exception encountered:
Engine traceback:
  in select
  in pants.core.goals.test.run_tests
  in pants.backend.python.goals.pytest_runner.run_python_test (airflow-dags/tests/airflow/unit/utils/operators/dms_migration_task_operator_test.py:../../../../../tests)
  in pants.backend.python.goals.pytest_runner.setup_pytest_for_target
  in pants.backend.python.util_rules.pex.create_pex
  in pants.backend.python.util_rules.pex.build_pex (requirements.pex)
  in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
  File "/builds/datalake/sbp-datalake/.cache/pants/setup/bootstrap-Linux-x86_64/2.14.0_py39/lib/python3.9/site-packages/pants/engine/process.py", line 275, in fallible_to_exec_result_or_raise
    raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Building 5 requirements for requirements.pex from the lockfiles/airflow-lock.txt resolve: apache-airflow-providers-amazon==2.4.0, apache-airflow==2.1.4, boto3<2,>=1.18.65, botocore<2,>=1.12.201, pytest==7.2.0' failed with exit code -9.
stdout:
stderr:
e
-9 is SIGKILL. Either a human
kill -9 ...
Or you got OOMKilled
Certainly the latter!
a
Interesting
Because when I run all tests for the
main
branch using
./pants test ::
the same test is run and there is no problem at all. I would expect to fail then as well
e
Also in CI with same hardware / VM / container?
a
yep
so I’m pretty confused
e
Well, the OOMKiller is a bit probabilistic/heuristic in its design.
Your induced differences here are: + Lots of logging + Lots of graph calculations
Via -ldebug + changed since
a
Ah, well the -ldebug I only introduced for debugging this problem
but the changed since was there to begin with
didn’t think that could be of influence
e
Ok, then you vaguely have the answer then Almost certainly extra memory overhead from changed since
a
hmm
I read in the docs about the
pantsd_max_memory_usage
setting, could that be used to avoid the issue?
e
a
because I already tried some values for that setting but without any luck
yes
but last time I looked was v2.12
we upgraded to 2.14 recently
e
Ok. Nothing fundamentally new.
Have you tried bumping CI mem limits / do you know what those are?
a
Wouldn’t know how to do it myself, I would have to ask one of the guys that maintains Gitlab to do it
but he told me that the current memory limit is at 2GiB
not sure if that also means that 2GiB is available when running the job
but thank you for the help @enough-analyst-54434! At least I now understand what the issue is and how to work around it now and hopefully solve in the long term 👍
one final question though: does pants need more memory since 2.13/2.14? I never had these issues when we were still on 2.12
e
By implication it does. I do not have the details in my grasp to explain that difference though with relevant changes in Pants.