<@U0N6C2Q9F> question on REAPI file system:
# development
h
@fast-nail-55400 question on REAPI file system:
At https://github.com/pantsbuild/pants/blob/0a43fdf296204e0b28e1473620328f0c8f485795/src/rust/engine/process_execution/src/remote_cache.rs#L148-L149, we intentionally set the Tree as not including the prefix of the output_directory. Over DM, you said
REAPI does not store the prefix of the output directory into the Tree and Directories therein.
I'm not finding why that is the case? I don't see it mentioned in
remote_execution.proto
Related, note that for
output_files
, we store the full path, which complies with the REAPI
Copy code
action_result.output_files.push({
        remexec::OutputFile {
          digest: Some(digest.into()),
          // Note that this is the full path from the workdir.
          path: output_file.to_owned(),
          is_executable: file_node.is_executable,
          ..remexec::OutputFile::default()
        }
      })
    }
Copy code
// An `OutputFile` is similar to a
// [FileNode][build.bazel.remote.execution.v2.FileNode], but it is used as an
// output in an `ActionResult`. It allows a full file path rather than
// only a name.
message OutputFile {
  // The full path of the file relative to the working directory, including the
  // filename. The path separator is a forward slash `/`. Since this is a
  // relative path, it MUST NOT begin with a leading forward slash.
  string path = 1;
f
I’m not finding why that is the case? I don’t see it mentioned in 
remote_execution.proto
Because it allows future uses of that Directory proto to mount it anywhere within an input root it wants without being held to the choice for the path to its root made when it was generated.
h
Do you remember where that is documented in REAPI? I'm seeing this:
Copy code
message OutputDirectory {
  // The full path of the directory relative to the working directory. The path
  // separator is a forward slash `/`. Since this is a relative path, it MUST
  // NOT begin with a leading forward slash. The empty string value is allowed,
  // and it denotes the entire working directory.
  string path = 1;
f
hmm I don’t think I ever handled that
“relative to the working directory”
and the archive code in pants sets a working directory
the existing code always sets it to the path from the top of the input root
h
My reading of "working directory" is more
/tmp/process-execution-123
, rather than the literal cwd. This was failing before we set the
working_directory
arg. My point is that I believe from REAPI we should be storing in the remote cache
__output/shellcheck/README.md
, not
shellcheck/README.md
. And indeed, were we using
output_files
instead of
output_directories
, we would be storing that full path.
output_directories
and
output_files
are behaving differently rn.
f
My point is that I believe from REAPI we should be storing in the remote cache 
__output/shellcheck/README.md
, not 
shellcheck/README.md
That is still wrong though. Pants merges all output directories and files into a single “Digest”. REAPI does not do that. The output directories and files are independent. The digest for the root of an output directory is the digest of Directory proto for that output directory itself, not any parent directories in the path.
h
The digest for the root of an output directory is the digest of Directory proto for that output directory itself, not any parent directories in the path.
Where is this documented? I'm trying to understand
h
Ah yes, that does. Thanks Daniel!
Oh!! So I think this is the issue:
Copy code
action_result
        .output_directories
        .push(remexec::OutputDirectory {
          path: String::new(),
          tree_digest: Some(tree_digest.into()),
        });
    }
The tree_digest is accurate, but the path needs to be the output_directory!
a
(IIRC we went through a few different interpretations of the protobuf when putting together the implementation initially because a bunch of stuff wasn't clear, and we were kind of trial and error-ing based on whatever server implementations happened to do 😄)
f
actually, I think there are two bugs. the code you just pointed out needs to make the path relative to the working directory.
and the code that decodes a Tree back into a merged Pants Digest needs to append the working directory back onto the front.
or maybe I’m missing something
h
Woot, looks like this one-line change fixes it! Thanks Tom and Daniel for talking this through! https://github.com/pantsbuild/pants/pull/11895
f
@hundreds-father-404: Should I wipe the remote cache?
h
It wasn't necessary to do that in my testing locally and in CI, for some reason. And I did inspect
-ldebug
logs locally to confirm that remote caching worked
f
I’m thinking about Pants trying to use existing cache entries which do not have the fix.
h
I'm fine with purging it. Some other code could be broken too like the new pytest extra output dir feature Benjy added
Copy code
❯ rg 'output_directories' -lg '*.py'
src/python/pants/engine/process.py
src/python/pants/backend/codegen/protobuf/python/rules.py
src/python/pants/backend/python/goals/coverage_py.py
src/python/pants/backend/python/goals/pytest_runner.py
src/python/pants/backend/python/goals/setup_py.py
src/python/pants/backend/python/util_rules/pex_cli.py
src/python/pants/backend/python/util_rules/pex.py
tests/python/pants_test/init/test_plugin_resolver.py
src/python/pants/core/util_rules/archive.py
f
did the PR and cherry-pick both land on
main
?
h
PR has landed, cherry-pick is still in CI but note that no code we know of was exercising this broken
archive.py
code path in 2.4. It was only us activating Shellcheck that surfaced this bug a few days ago
f
okay going to clear the cache