Hello, friends. This morning, for no apparent reas...
# general
a
Hello, friends. This morning, for no apparent reason, pants started raising SIGSEGV.
👀 1
😮 1
Copy code
❯ pants --no-pantsd --print-stacktrace -ldebug package src/dz:dz-deps
Bootstrapping Pants 2.16.0 using cpython 3.9.16
Installing pantsbuild.pants==2.16.0 into a virtual environment at /home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0
New virtual environment successfully created at /home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0.
10:49:50.67 [DEBUG] File handle limit is: 10000
10:49:50.69 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: SwitchedCommandRunner { .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: SwitchedCommandRunner { .. }, .. }, .. }] for process execution.
10:49:54.11 [DEBUG] Launching 1 roots (poll=false).
10:49:54.13 [DEBUG] computed 1 nodes in 0.015810 seconds. there are 34 total nodes.
10:49:54.13 [DEBUG] Launching 1 roots (poll=false).
10:49:54.13 [DEBUG] computed 1 nodes in 0.000284 seconds. there are 36 total nodes.
10:49:54.14 [DEBUG] /home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/backend/awslambda/python/rules.py:105: Rule visitor failed to inspect assignment expression for ['py_major', 'py_minor'] - typing.Optional[typing.Tuple[int, int]]:

Parameters to generic types must be types. Got 0.
10:49:54.31 [DEBUG] File handle limit is: 10000
10:49:54.33 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: SwitchedCommandRunner { .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: SwitchedCommandRunner { .. }, .. }, .. }] for process execution.
10:49:58.56 [DEBUG] specs are: Specs(includes=RawSpecs(description_of_origin='CLI arguments', address_literals=(AddressLiteralSpec(path_component='src/dz', target_component='dz-deps', generated_component=None, parameters=FrozenDict({})),), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=True, from_change_detection=False), ignores=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=False, from_change_detection=False))
10:49:58.56 [DEBUG] changed_options are: ChangedOptions(since=None, diffspec=None, dependents=<DependentsOption.NONE: 'none'>)
10:49:58.56 [DEBUG] Launching 1 roots (poll=false).
10:49:58.58 [DEBUG] computed 1 nodes in 0.014821 seconds. there are 34 total nodes.
10:49:58.58 [DEBUG] Launching 1 roots (poll=false).
10:49:58.58 [DEBUG] computed 1 nodes in 0.000780 seconds. there are 42 total nodes.
10:49:58.58 [DEBUG] Launching 1 roots (poll=false).
10:49:58.58 [DEBUG] computed 1 nodes in 0.000113 seconds. there are 42 total nodes.
10:49:58.58 [DEBUG] requesting <class 'pants.core.goals.package.Package'> to satisfy execution of `package` goal
10:49:58.58 [DEBUG] Launching 1 roots (poll=false).
10:49:58.58 [DEBUG] Completed: Find targets from input specs
10:49:58.58 [DEBUG] Completed: pants.backend.python.goals.package_pex_binary.package_pex_binary
10:49:58.63 [DEBUG] Completed: Generate `python_requirement` targets from requirements.txt or PEP 621 compliant pyproject.toml
10:49:58.67 [DEBUG] Completed: Extracting an archive file
10:49:58.67 [DEBUG] Completed: pants.core.util_rules.external_tool.download_external_tool
10:49:58.67 [DEBUG] Completed: Prepare environment for running PEXes
10:49:58.67 [DEBUG] Completed: acquire_command_runner_slot
10:49:58.67 [DEBUG] Running Find interpreter for constraints: CPython<3.10,>=3.9.16 under semaphore with concurrency id: 1, and concurrency: 1
10:49:58.68 [DEBUG] Completed: setup_sandbox
10:49:58.69 [DEBUG] spawned local process as Some(28095) for Process { argv: ["/home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0/bin/python", "./pex", "--tmpdir", ".tmp", "--pip-version", "23.0.1", "--python-path", "/home/bob/.pyenv/versions/3.10.9/bin:/home/bob/.pyenv/versions/3.7.16/bin:/home/bob/.pyenv/versions/3.9.12/bin:/home/bob/.pyenv/versions/foo/bin:/home/bob/.pyenv/versions/modin/bin:/home/bob/.pyenv/versions/pants-dev/bin:/home/bob/.pyenv/versions/ray-test/bin:/home/bob/.pyenv/shims:/home/bob/.pyenv/bin:/home/bob/.cargo/bin:/home/bob/.npm-global/bin:/home/bob/.npm-global:/home/bob/.local/bin:/usr/local/bin:/usr/bin", "--interpreter-constraint", "CPython<3.10,>=3.9.16", "--", "-c", "import hashlib, os, sys\n\npython = os.path.realpath(sys.executable)\nprint(python)\n\nhasher = hashlib.sha256()\nwith open(python, \"rb\") as fp:\n  for chunk in iter(lambda: fp.read(8192), b\"\"):\n      hasher.update(chunk)\nprint(hasher.hexdigest())\n"], env: {"CPPFLAGS": "", "LDFLAGS": "", "OTEL_SDK_DISABLED": "true", "PATH": "/home/bob/.pyenv/shims:/home/bob/.pyenv/bin:/home/bob/.cargo/bin:/home/bob/.npm-global/bin:/home/bob/.npm-global:/home/bob/.local/bin:/usr/local/bin:/usr/bin", "PEX_IGNORE_RCFILES": "true", "PEX_PYTHON": "/home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0/bin/python", "PEX_ROOT": ".cache/pex_root", "WR_ENGINE": "python", "WR_MEMORY_FORMAT": "pandas"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<bdb5690b221578e63c3bfc6ab8fa2bcdf7ed0613c69bd1971241243a231079e0>, size_bytes: 254 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<0452d4b7e682d014f1385c692c97eca39a8bdf20668cc176b4c31d21311f1333>, size_bytes: 158 }, tree: "Some(..)" }, immutable_inputs: {RelativePath(".python-build-standalone"): DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Find interpreter for constraints: CPython<3.10,>=3.9.16", level: Debug, append_only_caches: {CacheName("pex_root"): RelativePath(".cache/pex_root")}, jdk_home: None, cache_scope: PerRestartSuccessful, execution_environment: ProcessExecutionEnvironment { name: None, platform: Linux_x86_64, strategy: Local }, remote_cache_speculation_delay: 0ns }
10:49:59.91 [DEBUG] Completed: Find interpreter for constraints: CPython<3.10,>=3.9.16
10:49:59.91 [DEBUG] Completed: Scheduling: Find interpreter for constraints: CPython<3.10,>=3.9.16
10:49:59.91 [DEBUG] Completed: Find Python interpreter for constraints - CPython<3.10,>=3.9.16 - Selected /usr/bin/python3.9 to run PEXes with.
10:49:59.91 [DEBUG] Completed: Scheduling: Determine Python dependencies for src/dz/recsys/customers/cui/features.py
10:49:59.91 [DEBUG] Completed: Scheduling: Determine Python dependencies for src/dz/recommend/handler.py
10:49:59.91 [DEBUG] Completed: Scheduling: Determine Python dependencies for src/learning/feature_transforms.py
10:49:59.91 [DEBUG] Completed: Scheduling: Determine Python dependencies for src/dz/recsys/customers/nie/features.py
10:49:59.91 [DEBUG] Completed: Scheduling: Determine Python dependencies for src/dz/soft_sensor/handler.py
⠄ 1.23s Find interpreter for constraints: CPython<3.10,>=3.9.16
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄                                                                                                                                                                                                                    fish: Job 1, 'pants --no-pantsd --print-stack…' terminated by signal SIGSEGV (Address boundary error)
I don't think I even changed anything, it just started failing. My first thought was that the plugin I added was borked, but if I remove that from pants.toml, and build a target that doesn't mention it, I get the same problem.
I have tried changing the pants version to 2.16, 2 17, and 2.18 with the same result. i'm currently killing random processes to see if something is interfering somehow
Copy code
Dec 22 10:47:11 fedora python3.12[26243]: detected unhandled Python exception in 'interactive mode (python -c ...)'
Dec 22 10:47:11 fedora python3.12[26238]: detected unhandled Python exception in 'interactive mode (python -c ...)'
Dec 22 10:47:11 fedora python3.12[26233]: detected unhandled Python exception in 'interactive mode (python -c ...)'
Dec 22 10:47:11 fedora python3.12[26279]: detected unhandled Python exception in 'interactive mode (python -c ...)'
Dec 22 10:47:12 fedora audit[26155]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=26155 comm="tokio-runtime-w" exe="/home/bob/.cache/nce/2b6e146234a4>
Dec 22 10:47:12 fedora audit: BPF prog-id=149 op=LOAD
Dec 22 10:47:12 fedora audit: BPF prog-id=150 op=LOAD
Dec 22 10:47:12 fedora audit: BPF prog-id=151 op=LOAD
Dec 22 10:47:12 fedora systemd[1]: Started systemd-coredump@18-26329-0.service - Process Core Dump (PID 26329/UID 0).
Dec 22 10:47:12 fedora audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@18-26329-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostn>
Dec 22 10:47:13 fedora (sd-parse-elf)[26335]: Could not parse number of program headers from core file: invalid `Elf' handle
Dec 22 10:47:13 fedora systemd-coredump[26330]: [🡕] Process 26155 (pants) of user 1000 dumped core.

                                                Module /home/bob/.cache/nce/2b6e146234a4ef2a8946081fc3fbfffe0765b80b690425a49ebe40b47c33445b/cpython-3.9.16+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz/py>
                                                Module /home/bob/.cache/nce/2b6e146234a4ef2a8946081fc3fbfffe0765b80b690425a49ebe40b47c33445b/cpython-3.9.16+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz/py>
                                                Module /home/bob/.cache/nce/2b6e146234a4ef2a8946081fc3fbfffe0765b80b690425a49ebe40b47c33445b/cpython-3.9.16+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz/py>
                                                Module libcrypt.so.1 from rpm libxcrypt-4.4.36-2.fc39.x86_64
                                                Stack trace of thread 26171:
                                                #0  0x00007fdc57df2288 n/a (/home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0/lib/python3.9/site-packages/>
                                                #1  0x00007fdc57df9d73 n/a (/home/bob/.cache/nce/260e9f180e257368873660af8dd93ef1ae670cb61bde99eea1fd914ad6e534bb/bindings/venvs/2.16.0/lib/python3.9/site-packages/>
                                                ELF object binary architecture: AMD x86-64
Is the journal entry
It's not just package,
pants run
fails if I import anything, ie,
pants run foo.py
works for foo==
Copy code
print("hello world")
and segfaults for foo==
Copy code
import os
print("hello world")
It only fails when I run it in our monorepo. I have some other random projects lying around, and
pants run foo.py
works perfectly well in those... hmmm...
If I do a clean checkout of the monorepo, I can run foo.py.
So I guess I can just nuke this directory and everything will be fine, but ... why?
I've killed ~/cache/pants and ~/cache/nce, I've killed ~/myproject/.pantsd and .pants-bootstrap and it still blows up. Maybe ... maybe it's haunted?
w
Did you have any parts of your system/packaged update today?
a
Not that I'm aware of. I did update after the issue began happening as part of a flurry of general trouble shooting
w
Do you know which version of
pants
you’re using? Also, when wiping out pants stuff, you can delete the
.pids
as well too
a
I want to diff the files in that directory against the working one. It's still broken, so I renamed the dir and checked out again
w
Was there anything in the
.pants.d/logs
?
a
Nope
A bunch of empty exception.logs
I reproduced with 3 different versions
w
?
a
Plausible, but i wasn't using pants debug at the outset. I ran
pants run
a bunch of things, then ran
pants package
on something else and got a segfault ¯\_(ツ)_/¯
w
What version of scie-pants are you using?
a
and then from that point on, all pants commands in that directory fail. I definitely want to open a bug, but i'd like to narrow it down beyond witchcraft or cosmic rays
How do I ascertain
,
w
I believe it’s
PANTS_BOOTSTRAP_VERSION=report scie-pants
or
PANTS_BOOTSTRAP_VERSION=report pants
?
a
0.10.1
w
Think it’s worth trying an update? https://github.com/pantsbuild/scie-pants/releases
a
Same behaviour.
w
hmm, same error? I thought it bombed out on libcrypt?
a
Apparently so
Copy code
Dec 22 15:19:33 fedora systemd-coredump[99692]: [🡕] Process 99524 (pantsd [/home/b) of user 1000 dumped core.

                                                Module /home/bob/.cache/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-i>
                                                Module /home/bob/.cache/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-i>
                                                Module /home/bob/.cache/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-i>
                                                Module libcrypt.so.1 from rpm libxcrypt-4.4.36-2.fc39.x86_64
                                                Stack trace of thread 99529:
                                                #0  0x00007f47ae5f2288 n/a (/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/pyt>
                                                #1  0x00007f47ae5f9d73 n/a (/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/pyt>
                                                ELF object binary architecture: AMD x86-64
w
After updating pants, did you wipe out the cache and re-try?
a
I did not, but that doesn't seem to help
Copy code
wtf on  fix/plant-data-deploy:main [$✘!?⇡] via  v20.10.0 via 🐍 v3.9.12 on ☁️  (eu-west-2) took 2h11m57s
❯ rm -rf ~/.cache/pants/ ~/.cache/nce/ .pants.d/ .pids/

wtf on  fix/plant-data-deploy:main [$✘!?⇡] via  v20.10.0 via 🐍 v3.9.12 on ☁️  (eu-west-2) took 5s
❯ pants run exp/users/bob/parquet_formats/hello.py
Bootstrapping Pants 2.16.0
Installing pantsbuild.pants==2.16.0 into a virtual environment at /home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0
New virtual environment successfully created at /home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0.
17:32:52.15 [INFO] waiting for pantsd to start...
17:32:55.26 [INFO] pantsd started
17:32:55.65 [INFO] Initializing scheduler...
17:33:03.48 [INFO] Scheduler initialized.
⠤ 0.01s Find all targets in the project
                                                                                                                                                                                     Traceback (most recent call last):
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/bin/pants", line 8, in <module>
    sys.exit(main())
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/pants_loader.py", line 123, in main
    PantsLoader.main()
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/pants_loader.py", line 110, in main
    cls.run_default_entrypoint()
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/pants_loader.py", line 92, in run_default_entrypoint
    exit_code = runner.run(start_time)
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/pants_runner.py", line 89, in run
    return remote_runner.run(start_time)
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/remote_pants_runner.py", line 123, in run
    return self._connect_and_execute(pantsd_handle, executor, start_time)
  File "/home/bob/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.16.0/lib/python3.9/site-packages/pants/bin/remote_pants_runner.py", line 165, in _connect_and_execute
    return PyNailgunClient(port, executor).execute(command, args, modified_env)
native_engine.PantsdClientException: The pantsd process was killed during the run.

If this was not intentionally done by you, Pants may have been killed by the operating system due to memory overconsumption (i.e. OOM-killed). If you keep seeing this error message, try the troubleshooting steps below. If none of those help, please consider filing a GitHub issue or reaching out on Slack so that we can investigate the possible memory overconsumption (<https://www.pantsbuild.org/docs/getting-help>).
 - Exit other applications, including applications running in the background.
 - Set the global option `--pantsd-max-memory-usage` to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run `./pants help-advanced global`).
 - Disable pantsd with the global option `--no-pantsd` to avoid persisting memory across Pants runs, although you will miss out on additional caching.

wtf on  fix/plant-data-deploy:main [$✘!?⇡] via  v20.10.0 via 🐍 v3.9.12 on ☁️  (eu-west-2) took 1m5s
❯ SCIE_BOOT=update pants
No new releases of scie-pants were found.
w
Is this the same stack trace as before? If it's different, that's a good start 🙂
And had you had a chance to try:
``` - Set the global option
--pantsd-max-memory-usage
to reduce Pantsd's memory consumption by retaining less in its in-memory cache (run
./pants help-advanced global
).
- Disable pantsd with the global option
--no-pantsd
to avoid persisting memory across Pants runs, although you will miss out on additional caching.```
?
a
I think it's the same. The first instance I posted was with --no-pantsd, I've not tried to limit memory, but I don't think that's the issue. This looks like an address violation
w
Yeah, just weird that it would be on "0.01s Find all targets in the project". And I believe you said there was no info in the logs otherwise. And I think you mentioned earlier that when you clone a fresh repo, you don't get this same problem? So it was particularly on this worked-on variant of your repo?
b
Have you resolved this? If not, can you file an issue with as much of a stack trace as you can get? It looks like the coredump output is truncated above?