<@U06A03HV1> do you know what's up with this for r...
# development
h
@witty-crayon-22786 do you know what's up with this for remote execution? I'm surprised w/ the INFO logs
Copy code
17:26:53.00 [WARN] Making attempt 1 to backtrack and retry `Searching for `go` on PATH=/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk-11.0.2/bin:/usr/lib/postgresql/11/bin/`, due to missing digest Digest { hash: Fingerprint<c6081f9c65391b95ae6e881572c98dfec99b828429b11bb6237bf56f5e778c39>, size_bytes: 18 }.
17:26:53.00 [INFO] Dirtying @rule(pants.engine.process.fallible_to_exec_result_or_raise(environment:remote))
17:26:53.00 [INFO] Dirtying @rule(pants.core.util_rules.system_binaries.find_binary(environment:remote))
17:26:53.00 [INFO] Dirtying @rule(pants.backend.go.util_rules.goroot.setup_goroot(environment:remote))
17:26:53.00 [INFO] Dirtying @rule(pants.backend.go.util_rules.sdk.go_sdk_invoke_setup(environment:remote))
17:26:53.01 [INFO] Dirtying @rule(pants.backend.go.util_rules.sdk.setup_go_sdk_process(environment:remote))
17:26:53.01 [INFO] Dirtying @rule(pants.engine.process.get_multi_platform_request_description(environment:remote))
17:26:53.01 [INFO] Dirtying @rule(pants.engine.process.fallible_to_exec_result_or_raise(environment:remote))
17:26:53.01 [INFO] Dirtying @rule(pants.backend.go.util_rules.go_mod.determine_go_mod_info(src/go/src/toolchain/remoting:remoting, environment:remote))
w
backtracking means something disappeared from the server
is it reproducible?
h
pantsd log
yep. also I get this failure every time
Copy code
Exception in thread buildsense-reporter:
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.9/3.9.14/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/reporter.py", line 394, in _report_loop
    self._build_state.send_final_report()
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/state.py", line 181, in send_final_report
    all_workunits = converter.get_all_work_units(call_num=calls_count, last_update_timestamp=int(time.time()))
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/converter.py", line 123, in get_all_work_units
    all_wus = [
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/converter.py", line 124, in <listcomp>
    self._wu_dict_with_artifacts(wu, call_num, last_update_timestamp) for wu in self._workunit_map.values()
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/converter.py", line 215, in _wu_dict_with_artifacts
    artifacts = self._get_artifacts(wu_json)
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/converter.py", line 267, in _get_artifacts
    artifacts = self._get_digest_artifacts(wu_json, digest_keys, digests)
  File "/Users/ericarellano/code/toolchain/src/python/toolchain/pants/buildsense/converter.py", line 225, in _get_digest_artifacts
    artifacts = (artifact.decode() for artifact in self._context.single_file_digests_to_bytes(digests))
  File "/Users/ericarellano/code/pants/src/python/pants/engine/streaming_workunit_handler.py", line 77, in single_file_digests_to_bytes
    return self._scheduler.single_file_digests_to_bytes(digests)
  File "/Users/ericarellano/code/pants/src/python/pants/engine/internals/scheduler.py", line 579, in single_file_digests_to_bytes
    return native_engine.single_file_digests_to_bytes(self.py_scheduler, list(digests))
pants.base.exceptions.NativeEngineFailure: Missing digest: Was not present in the local store: Digest { hash: Fingerprint<c6081f9c65391b95ae6e881572c98dfec99b828429b11bb6237bf56f5e778c39>, size_bytes: 18 }
w
sounds like a bug then… whether it is a server or client bug will need more debugging though.
does it only occur with
validate
, or do you also see it with
fetch
?
h
I'm on fetch right now (this is Toolchain). I'll try different options
w
hm… yea, it should be impossible to hit this with
fetch
. so likely a clientside bug
h
re the BuildSense error calling
native_engine.single_file_digests_to_bytes
, or the backtracking with Go?
w
hm… yea, it should be impossible to hit this with
fetch
.
because the content should be downloaded eagerly, and so we should miss caches rather than ever seeing a MissingDigest
h
If I disable the BuildSense plugin, then I get this failure
Copy code
in pants.backend.go.util_rules.goroot.setup_goroot
    Find Go binary - environment:remote
  in pants.core.util_rules.system_binaries.find_binary
    ..
  in pants.engine.process.fallible_to_exec_result_or_raise
    ..
  in process
    Scheduling: Searching for `go` on PATH=/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk-11.0.2/bin:/usr/lib/postgresql/11/bin/

Traceback (no traceback):
  <pants native internals>
Exception: Process Process { argv: ["./find_binary.sh", "go"], env: {"PATH": "/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk-11.0.2/bin:/usr/lib/postgresql/11/bin/"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<01d3bc2a2bfde2bcd52525bc7dbd653a32324b11971e95849b330fe173a3c51c>, size_bytes: 91 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<01d3bc2a2bfde2bcd52525bc7dbd653a32324b11971e95849b330fe173a3c51c>, size_bytes: 91 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Searching for `go` on PATH=/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/jdk-11.0.2/bin:/usr/lib/postgresql/11/bin/", level: Debug, append_only_caches: {}, jdk_home: None, platform: Linux_x86_64, cache_scope: PerRestartSuccessful, execution_strategy: RemoteExecution([("OSFamily", "linux"), ("execution-policy", "network"), ("execution-policy", "container-image=<http://283194185447.dkr.ecr.us-east-1.amazonaws.com/ci:2022-07-05.17-49-33-a754b42b5da3%22|283194185447.dkr.ecr.us-east-1.amazonaws.com/ci:2022-07-05.17-49-33-a754b42b5da3">), ("execution-policy", "debug"), ("execution-policy", "tc-preserve-setup")]), remote_cache_speculation_delay: 0ns } produced an invalid result on all configured command runners.
*sorry, this is using
validate
w
…oh: “was not present in the local store”… that might mean that we’re not using the remote store somewhere where we should be?
i’m guessing that the
CacheContentBehavior
has a bug with regard to using remote execution rather than only remote caching
h
generally, when remote execution is used, what is supposed to happen with the remote cache?
w
the remote_cache::CommandRunner is wrapped around the remote::CommandRunner
see https://github.com/pantsbuild/pants/issues/17142 for some explanation of that
i think that this issue is a question of where the “remote” or “local only” stores are used
h
fyi disabling the remote cache results in things working, but the warning about
Failed to expand digest: Digest
w
the “Was not present in the local store” bit in the error message above is specifically triggered in a case where there is no remote store configured… so probably something to do with https://github.com/pantsbuild/pants/blob/f218cdc166e7fb2db0f5a120ac2ab253e77728fb/src/rust/engine/src/context.rs#L524-L534
the
remote::CommandRunner
is probably not actually getting the remote store at all … oops.
h
w/ remote cache disabled
Copy code
in process
    Scheduling: Building pytest_runner.pex

Traceback (no traceback):
  <pants native internals>
Exception: Could not identify a process to backtrack to for: Missing digest: Failed to upload File: Not found in local store: Digest { hash: Fingerprint<b56fcc5934965c287599965a416a3f381b0d9f854a17474e14f0192357b2a0c9>, size_bytes: 548 }
w
@hundreds-father-404: let me know if you need help with this.
ah, so: the remote cache write failure is due to use of
ensure_remote_has_recursive
: https://github.com/pantsbuild/pants/blob/fe13e845e8368d3aaf3d332bb15addbc77263d82/src/rust/engine/fs/store/src/lib.rs#L734-L765 … which uses
expand_digests
, which only looks for digests locally.
👀 1
h
Oh, thanks for finding that! What would be the best fix?
w
i’ll need to look a bit more, but i think that
expand_digests
doesn’t actually need to run local-only.
👍 1
@hundreds-father-404: do you have other things that you can look at, or are you blocked on this?
h
w
sounds good… that or https://github.com/pantsbuild/pants/issues/17142 … shouldn’t conflict.
ok, in that case i’ll work this one.
h
thanks!
w