flaky-artist-57016
08/25/2022, 9:44 PMtailor
, lint
, and test
in the same runner. We’ve found that this intermittently results in the failure of one or more of the jobs with the error message ModuleNotFoundError: No module named 'pants'
. Re-running the failed job typically allows it to complete successfully so we suspect that this is being caused by our use of concurrent stages. We are running concurrently to reduce the time the pipeline runs, but perhaps it makes more sense to run these steps serially (i.e., to catch missing BUILD files before running the lint and test steps). Is there a better way we should be running pants in CI? Thanks in advance!enough-analyst-54434
08/25/2022, 9:46 PMenough-analyst-54434
08/25/2022, 9:46 PMenough-analyst-54434
08/25/2022, 9:47 PMenough-analyst-54434
08/25/2022, 9:50 PMflaky-artist-57016
08/25/2022, 11:15 PM$ ./pants --version
Traceback (most recent call last):
File "/home/gitlab-runner/builds/s3DRJHKh/0/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/bin/pants", line 7, in <module>
from pants.bin.pants_loader import main
ModuleNotFoundError: No module named 'pants'
This is the output in the tailor
job and you can see it didn’t even get to the tailor
command. The test
job also failed with a different error:
$ ./pants check test ::
Traceback (most recent call last):
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/bin/pants", line 10, in <module>
sys.exit(main())
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/pants_loader.py", line 115, in main
PantsLoader.main()
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/pants_loader.py", line 111, in main
cls.run_default_entrypoint()
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/pants_loader.py", line 93, in run_default_entrypoint
exit_code = runner.run(start_time)
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/pants_runner.py", line 89, in run
return remote_runner.run(start_time)
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/remote_pants_runner.py", line 117, in run
return self._connect_and_execute(pantsd_handle, start_time)
File "/home/gitlab-runner/builds/s3DRJHKh/1/directory/subdirectory/.cache/pants/setup/bootstrap-Linux-x86_64/2.12.0_py38/lib/python3.8/site-packages/pants/bin/remote_pants_runner.py", line 151, in _connect_and_execute
return PyNailgunClient(port, executor).execute(command, args, modified_env)
native_engine.PantsdClientException: The pantsd process was killed during the run.
If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). You can set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`). You can also disable pantsd with the global option `--no-pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.
If neither of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (<https://www.pantsbuild.org/docs/getting-help>).
The third job (lint
) completed successfully.enough-analyst-54434
08/25/2022, 11:21 PM.../.cache/pants/setup/...
; so, if it is the case that multiple jobs use that same directory, there is likely a bootstrap race we're vulnerable to. Is it in fact the case that all 3 jobs see /home/gitlab-runner/builds/s3DRJHKh/0/directory/subdirectory/.cache/pants/setup
? (Excuse my gitlab CI ignorance)flaky-artist-57016
08/26/2022, 1:27 PMpants.ci.toml
we have set local_store_dir
and named_caches_dir
to .cache/pants/lmdb_store
and .cache/pants/named_caches
so they are within the git repository rather than in the default $HOME/.cache
location. Could this be a problem?enough-analyst-54434
08/26/2022, 1:39 PM.cache/pants/setup
) - the lmdb_store and named_caches have been battle tested. Those get hammered concurrently even in a single Pants run let alone by concurrent Pants runs. It is just the Pants install bootstrap that can't handle concurrency here. We implicitly assume install is serial and here for you it is not. 3 jobs try to install Pants to the same directory in parallel and that - and that alone - is what sometimes fails.flaky-artist-57016
08/26/2022, 1:40 PMenough-analyst-54434
08/26/2022, 1:40 PM./pants -V
?flaky-artist-57016
08/26/2022, 1:40 PMenough-analyst-54434
08/26/2022, 1:40 PMenough-analyst-54434
08/26/2022, 1:41 PM.../.cache/pants/setup
directory.flaky-artist-57016
08/26/2022, 1:43 PMPANTS_SETUP_CACHE: "$CI_PROJECT_DIR/.cache/pants/setup"
as a global variable for the pants jobs in our .gitlab-ci.yml
enough-analyst-54434
08/26/2022, 1:43 PMflaky-artist-57016
08/26/2022, 2:20 PMflaky-artist-57016
08/29/2022, 9:00 PMflaky-artist-57016
08/29/2022, 9:05 PM.gitlab-ci.yml
file that was giving us trouble:
.pants_base:
stage: pants
# Global variables for pants goals
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
PANTS_SETUP_CACHE: "$CI_PROJECT_DIR/.cache/pants/setup"
PANTS_CONFIG_FILES: pants.ci.toml
cache:
paths:
- .cache/pip
- .cache/pants/setup
- .cache/pants/named_caches/pex_root/pip.pex
- .cache/pants/named_caches/pex_root/http
- .cache/pants/named_caches/pex_root/built_wheels
- .cache/pants/lmdb_store
tags:
- bash
pants_bootstrap:
extends: .pants_base
script:
- './pants -V'
# the following jobs run in parallel after the bootstrap job finishes successfully
pants_tailor:
extends: .pants_base
needs: ["pants_bootstrap"]
script:
- './pants --version'
- './pants tailor --check update-build-files --check'
pants_test:
extends: .pants_base
needs: ["pants_bootstrap"]
script:
- './pants --version'
- './pants check test ::'
coverage: '/(?i)total.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$/'
artifacts:
paths:
- coverage.xml
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
pants_lint:
needs: ["pants_bootstrap"]
extends: .pants_base
script:
- './pants --version'
- './pants lint ::'
enough-analyst-54434
08/29/2022, 9:25 PM.cache/pants/setup
?enough-analyst-54434
08/29/2022, 9:26 PMflaky-artist-57016
08/29/2022, 9:40 PM.pants_base
section outlines the caches used by the subsequent jobs (via extends: .pants_base
) so I believe that removing the .cache/pants/setup
entry from that section would result in the tailor/test/lint jobs performing the bootstrap process as well as they each restore the cache. Testing now to confirm.flaky-artist-57016
08/30/2022, 1:14 PMenough-analyst-54434
08/30/2022, 2:04 PM