<@U051221NF> A few sandboxer stack traces. They se...
# development
c
@happy-kitchen-89482 A few sandboxer stack traces. They seem to happen "often" in my internal repo, but I have not been able to figure out a clean or consistent repro case 🧵
Copy code
14:52:00.58 [ERROR] 1 Exception encountered:

Engine traceback:
  in root
    ..
  in pants.core.goals.check.check
    `check` goal

Traceback (most recent call last):
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/core/goals/check.py", line 221, in check
    all_results = await MultiGet(
                  ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 363, in MultiGet
    return await _MultiGet(tuple(__arg0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 170, in __await__
    result = yield self.gets
             ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/python/typecheck/mypy/rules.py", line 419, in mypy_typecheck
    partitioned_results = await concurrently(
                          ^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 363, in MultiGet
    return await _MultiGet(tuple(__arg0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 170, in __await__
    result = yield self.gets
             ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/rules.py", line 71, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/python/typecheck/mypy/rules.py", line 230, in mypy_typecheck_partition
    closure_sources, requirements_venv_pex, file_list_digest = await concurrently(
                                                               ^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 408, in MultiGet
    return await _MultiGet((__arg0, __arg1, __arg2))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 170, in __await__
    result = yield self.gets
             ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/rules.py", line 71, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/python/util_rules/pex.py", line 1120, in create_venv_pex
    venv_pex_result = await build_pex(seeded_venv_request, **implicitly())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/rules.py", line 71, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/python/util_rules/pex.py", line 801, in build_pex
    result = await fallible_to_exec_result_or_raise(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/rules.py", line 71, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/intrinsics.py", line 107, in execute_process
    return await native_engine.execute_process(process, process_execution_environment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
native_engine.IntrinsicError: materialize_directory() request to sandboxer process failed: status: FailedPrecondition, message: "Failed to create hardlink to /home/ecsb/.cache/pants/lmdb_store/immutable/files/e6/e627c0c9fdaf3944a23c5bcd254ad07c86f62e414da4427ca34a099f0f6469c7 at /home/ecsb/src/e/helios/pants.d/tmp/pants-sandbox-N9W3aX/pex: File exists (os error 17)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 09 Sep 2025 18:52:00 GMT", "content-length": "0"} }


14:52:00.58 [INFO] Completed: Scheduling: Generating Python sources from libraries/proto/eq_types_proto/v1beta1/types.proto.
14:52:00.63 [INFO] Completed: Scheduling: Generating Python sources from libraries/proto/operational_plan_store/v1alpha1/award.proto.
Copy code
4:56:05.33 [ERROR] 1 Exception encountered:

Engine traceback:
  in root
    ..
  in pants.core.goals.fmt.fmt
    `fmt` goal

Traceback (most recent call last):
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/core/goals/fmt.py", line 98, in fmt
    return await _do_fix(
           ^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/core/goals/fix.py", line 365, in _do_fix
    all_results = await MultiGet(
                  ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 363, in MultiGet
    return await _MultiGet(tuple(__arg0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 170, in __await__
    result = yield self.gets
             ^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/core/goals/fix.py", line 426, in fix_batch
    result = await Get(  # noqa: PNT30: this is inherently sequential
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 120, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/sql/lint/sqlfluff/rules.py", line 125, in sqlfluff_fmt
    result = await run_sqlfluff(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/backend/sql/lint/sqlfluff/rules.py", line 87, in run_sqlfluff
    result = await execute_process(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/rules.py", line 71, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/.cache/nce/5c54a0c25255df2afb0525da856d49eb81b48cff0e32a9c868f77a296af24319/bindings/venvs/2.28.0/lib/python3.11/site-packages/pants/engine/intrinsics.py", line 107, in execute_process
    return await native_engine.execute_process(process, process_execution_environment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
native_engine.IntrinsicError: materialize_directory() request to sandboxer process failed: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
(@
main
)
Copy code
15:05:52.62 [ERROR] 1 Exception encountered:

Engine traceback:
  in root
    ..
  in pants.core.goals.lint_goal.lint
    `lint` goal

Traceback (most recent call last):
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/core/goals/lint_goal.py", line 173, in lint
    all_batch_results = await concurrently(
                        ^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/internals/selectors.py", line 363, in MultiGet
    return await _MultiGet(tuple(__arg0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/internals/selectors.py", line 170, in __await__
    result = yield self.gets
             ^^^^^^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/rules.py", line 69, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/backend/sql/lint/sqlfluff/rules.py", line 117, in sqlfluff_lint
    result = await run_sqlfluff(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/backend/sql/lint/sqlfluff/rules.py", line 87, in run_sqlfluff
    result = await execute_process(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/rules.py", line 69, in wrapper
    return await call
           ^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/internals/selectors.py", line 78, in __await__
    result = yield self
             ^^^^^^^^^^
  File "/home/ecsb/src/o/2alt-pants/src/python/pants/engine/intrinsics.py", line 110, in execute_process
    return await native_engine.execute_process(process, process_execution_environment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
native_engine.IntrinsicError: materialize_directory() request to sandboxer process failed: status: FailedPrecondition, message: "Failed to create hardlink to /home/ecsb/.cache/pants/lmdb_store/immutable/files/ef/ef7cd3d21d6c694867a0fc7b3534f870e37bd4c16d745d4ce09bb8c057766344 at /home/ecsb/src/e/helios/pants.d/tmp/pants-sandbox-Qomifr/sqlfluff.pex/.bootstrap: File exists (os error 17)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 09 Sep 2025 19:05:40 GMT", "content-length": "0"} }
Sorry if that doesn't help much :-/ but wanted to post in case something jumped out to you to try
h
Hmm, nothing jumps out at me. Can you reliably reproduce? And if so, what happens if you turn sandboxer off?
c
• I hit an error (almost?) every time I do
fmt fix lint check
on my repo, but it is not always the same error. • It goes away if I turn the sandboxer off. A possible clue • I deleted
local_execution_root_dir
and
.pants.d
• killed sandbnoxer • Ran above goals • Got error:
Copy code
native_engine.IntrinsicError: materialize_directory() request to sandboxer process failed: status: FailedPrecondition, message: "Failed to create hardlink to /home/ecsb/.cache/pants/lmdb_store/immutable/files/ef/ef7cd3d21d6c694867a0fc7b3534f870e37bd4c16d745d4ce09bb8c057766344 at /home/ecsb/src/e/helios/pants.d/tmp/pants-sandbox-k0MFOk/mypy.pex/.bootstrap: File exists (os error 17)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Wed, 10 Sep 2025 20:57:24 GMT", "content-length": "0"} }
The log makes it look like the sandboxer was started twice?
Copy code
$ cat .pants.d/workdir/sandboxer/sandboxer.log 
8147 2025-09-10T20:56:27.284Z [INFO] Starting up sandboxer with RUST_LOG=INFO and these options: Opt {
    socket_path: "/tmp/run/pants/5c204249d71d9b4f/sandboxer.sock",
    store_options: StoreCliOpt {
        local_store_path: Some(
            "/home/ecsb/.cache/pants/lmdb_store",
        ),
        cas_server: None,
        remote_instance_name: None,
        cas_root_ca_cert_file: None,
        cas_client_certs_file: None,
        cas_client_key_file: None,
        cas_oauth_bearer_token_path: None,
        upload_chunk_bytes: 3145728,
        store_rpc_retries: 3,
        store_rpc_concurrency: 128,
        store_batch_api_size_limit: 4194304,
        store_batch_load_enabled: false,
        header: [],
    },
}
8147 2025-09-10T20:56:27.337Z [INFO] Sandboxer server started on socket path /tmp/run/pants/5c204249d71d9b4f/sandboxer.sock
8748 2025-09-10T20:57:24.033Z [INFO] Starting up sandboxer with RUST_LOG=INFO and these options: Opt {
    socket_path: "/tmp/run/pants/5c204249d71d9b4f/sandboxer.sock",
    store_options: StoreCliOpt {
        local_store_path: Some(
            "/home/ecsb/.cache/pants/lmdb_store",
        ),
        cas_server: None,
        remote_instance_name: None,
        cas_root_ca_cert_file: None,
        cas_client_certs_file: None,
        cas_client_key_file: None,
        cas_oauth_bearer_token_path: None,
        upload_chunk_bytes: 3145728,
        store_rpc_retries: 3,
        store_rpc_concurrency: 128,
        store_batch_api_size_limit: 4194304,
        store_batch_load_enabled: false,
        header: [],
    },
}
8748 2025-09-10T20:57:24.085Z [INFO] Sandboxer server started on socket path /tmp/run/pants/5c204249d71d9b4f/sandboxer.sock
h
Can you see if there are two sandboxer processes running?
via
ps -ef | grep sandboxer
or similar
c
A possibly related odity is that sometimes I see
--socket-path /tmp/run/pants/5c204249d71d9b4f/sandboxer.sock --local-store-path /home/ecsb/.cache/pants/lmdb_store
and other times
--socket-path /run/user/1002/pants/5c204249d71d9b4f/sandboxer.sock --local-store-path /home/ecsb/.cache/pants/lmdb_store
and from looking at the code I'd expect to always choose the same socket location absent me doing something odd with permissions
Added
process::id()
to the logs to confirm they are different, have not caught two running at the same time with
ps
Copy code
$ cat .pants.d/workdir/sandboxer/sandboxer.log 
21797 2025-09-11T00:28:22.845Z [INFO] Starting up sandboxer with RUST_LOG=INFO and these options: Opt {
    socket_path: "/run/user/1002/pants/5c204249d71d9b4f/sandboxer.sock",
    store_options: StoreCliOpt {
        local_store_path: Some(
            "/home/ecsb/.cache/pants/lmdb_store",
        ),
        cas_server: None,
        remote_instance_name: None,
        cas_root_ca_cert_file: None,
        cas_client_certs_file: None,
        cas_client_key_file: None,
        cas_oauth_bearer_token_path: None,
        upload_chunk_bytes: 3145728,
        store_rpc_retries: 3,
        store_rpc_concurrency: 128,
        store_batch_api_size_limit: 4194304,
        store_batch_load_enabled: false,
        header: [],
    },
}
21797 2025-09-11T00:28:22.896Z [INFO] Sandboxer server (21797) started on socket path /run/user/1002/pants/5c204249d71d9b4f/sandboxer.sock
22644 2025-09-11T00:29:08.259Z [INFO] Starting up sandboxer with RUST_LOG=INFO and these options: Opt {
    socket_path: "/run/user/1002/pants/5c204249d71d9b4f/sandboxer.sock",
    store_options: StoreCliOpt {
        local_store_path: Some(
            "/home/ecsb/.cache/pants/lmdb_store",
        ),
        cas_server: None,
        remote_instance_name: None,
        cas_root_ca_cert_file: None,
        cas_client_certs_file: None,
        cas_client_key_file: None,
        cas_oauth_bearer_token_path: None,
        upload_chunk_bytes: 3145728,
        store_rpc_retries: 3,
        store_rpc_concurrency: 128,
        store_batch_api_size_limit: 4194304,
        store_batch_load_enabled: false,
        header: [],
    },
}
22644 2025-09-11T00:29:08.310Z [INFO] Sandboxer server (22644) started on socket path /run/user/1002/pants/5c204249d71d9b4f/sandboxer.sock
h
Even if there were two concurrent processes, only one can bind to a given unix-domain socket
If a sandboxer process dies a new one is started up, that may be what you’re seeing
Those events are logged in the sandboxer log
Can you see a shutdown between those two startups?
This may be related to the
File exists
error (I’m not sure how though) or it may be a red herring
We retry the
materialize_directory()
if there was an error
But it’s possible the error happens after that file was created
hence
File exists
Hmm, yes, I am willing to bet that is the issue, and that would correlate with a restarting sandboxer
The underlying
materialize_directory()
is not atomic
nor can it be
well, I suppose it could be if we jumped through hoops
c
Here is similar looking example from CI if that helps: https://github.com/pantsbuild/pants/actions/runs/17661412820/job/50195696813
Can you see a shutdown between those two startups?
That is the whole sandboxer log, do you mean something from the pantsd side?
h
That is all of
.pants.d/workdir/sandboxer/sandboxer.log
?
c
That is all of .pants.d/workdir/sandboxer/sandboxer.log?
correct
h
Hmm ok, so i think that retry is the culprit?
Just following up on this, thoughts about whether that retry is the issue? I am reasonably confident about it
c
I don't have a good intuition at this layer of the engine, but a rety would match the not-entirely-consistent behavior.
h
Yeah, I think it has to be this, I will figure out what to do
we kinda need a retry, but if it's not atomic
then we need to make it so
@curved-manchester-66006 revisiting this - was this happening on macos or on linux, or both?
The error messages above can only happen on linux, but you mentioned that sometimes there are other errors
So I'm almost certain this is because while the
Store
docs declare
materialize_directory()
to be idempotent in practice on linux it calls
hard_link()
to materialize hard links, and that is not idempotent (it errors if the target exists). So that function is not living up to its documentation.
c
I've only tried on Linux. (I think I was referring to the various error message
materialize_directory()
) The sandboxer rest tests have flaked out on macos CI before<https://github.com/pantsbuild/pants/actions/runs/17880492681/job/50847493119> but I'm not sure that is related.
h
I have a fix pending
💖 1
Turns out changes over the years ate away at the idempotence guarantee, and we never tested it. Now we do...