I'm trying to write a formatter that does some ver...
# plugins
b
I'm trying to write a formatter that does some very simple addition (copyright header) to the file contents. With the following code, Pants always says my subsystem made changes (likely my output digest isn't right)
Copy code
source_files = await Get(
        SourceFiles,
        SourceFilesRequest(field_set.source for field_set in request.field_sets),
    )
    source_files_snapshot = (
        source_files.snapshot
        if request.prior_formatter_result is None
        else request.prior_formatter_result
    )
    input_digest = source_files_snapshot.digest
    digest_contents = await Get(DigestContents, Digest, input_digest)
    output_digest = await Get(
        Digest,
        CreateDigest(
            FileContent(path=file_content.path, content=maybe_add_copyright(file_content.content))
            for file_content in digest_contents
        ),
    )
    return FmtResult(
        input=input_digest,
        output=output_digest,
        stdout="",
        stderr="",
        formatter_name=request.name,
    )
h
maybe the difference of
/n
? Indeed it calculates if it was changed based on comparing input vs output. So you could materialize to mem w/
DigestContents
and compare that way
f
You may want to use these debug helpers I wrote for myself a while ago to see what the difference is:
Copy code
def diff_fmt_result(rule_runner: RuleRunner, fmt_result: FmtResult) -> None:
    input_digest_contents = {fc.path: fc for fc in rule_runner.request(DigestContents, [fmt_result.input])}
    output_digest_contents = {fc.path: fc for fc in rule_runner.request(DigestContents, [fmt_result.output])}
    for path, input_fc in input_digest_contents.items():
        output_fc = output_digest_contents[path]
        input_content = input_fc.content.decode().splitlines()
        output_content = output_fc.content.decode().splitlines()
        unidiff = "\n".join(difflib.unified_diff(input_content, output_content, lineterm = ""))
        print(f"DIFF for {path}:\n{unidiff}")

def diff_fmt_result(rule_runner: RuleRunner, fmt_result: FmtResult) -> None:
    input_digest_contents = {
        fc.path: fc.content for fc in rule_runner.request(DigestContents, [fmt_result.input])
    }
    input_digest_entries = rule_runner.request(DigestEntries, [fmt_result.input])
    print(f"input_digest_contents = {input_digest_contents}")
    print(f"input entries = {input_digest_entries}")
    print(f"input files = {','.join(sorted(input_digest_contents.keys()))}")

    output_digest_contents = {
        fc.path: fc.content for fc in rule_runner.request(DigestContents, [fmt_result.output])
    }
    output_digest_entries = rule_runner.request(DigestEntries, [fmt_result.output])
    print(f"output_digest_contents = {output_digest_contents}")
    print(f"output entries = {output_digest_entries}")
    print(f"output files = {','.join(sorted(output_digest_contents.keys()))}")

    for path, input_fc in input_digest_contents.items():
        output_fc = output_digest_contents[path]
        input_content = input_fc.decode().splitlines()
        output_content = output_fc.decode().splitlines()
        unidiff = "\n".join(difflib.unified_diff(input_content, output_content, lineterm=""))
        print(f"DIFF for {path}:\n{unidiff}")
🙌 1
b
Copy code
logger.error(f"{[file_content.path for file_content in digest_contents if file_content.content != maybe_add_copyright(file_content.content)]}")
It has empty lists each time
f
also what does your
maybe_add_copyright
function look like?
b
Copy code
def maybe_add_copyright(content: bytes) -> bytes:
    if not has_copyright(content):
        return COPYRIGHT_HEADER.encode() + content
    return content
🙂
h
try printing the input digest and output digest. Check if either the hash and/or size are differnet
I also wonder if is_executable is at play, like perms of the file
f
and the output from the
diff_fmt_result
that I pasted might be useful. (although you need to add QueryRule’s to your RuleRunner for that calls it makes.)
b
It's either a cache issue or exec perms as Eric said. Running with the input's contents has the same output
h
have you tried running on a single trivial file? that reduces the risk of exec perms being the issue
b
Yup, always passes 🙂
So likely a cache issue?
At the risk of nuking my PEXs which dir holds the rule cache?
f
is pantsd enabled? if so, just do
--no-pantsd
to avoid the rule memoization
h
run with
--no-local-cache --no-pantsd
b
ooooh no cache has the same behavior!
spicy
🌶️ 1
Copy code
is_executable=os.stat(file_content.path).st_mode % 2 == 1
That worked
f
except that directly access the filesystem outside of view of the engine’s core rules
👍 1
maybe just copy
is_executable
over from the original
FileContent
?
dataclasses.replace(file_content, content=maybe_add_copyright(file_content.content))
🙌 2
that will preserve all fields except
content
on
file_content
b
oh duh lol
AWESOME!
❤️ 1
Admittedly, this is a bit of a thorn though. Why does the exec flag matter in this context? (I assume it doesn't but does matter if a plugin is trying to exec some file in the chroot?)
h
If the input digest said something was executable, and your output digest is now saying the file is not executable, that matters. They're different things. If you didn't handle this the right way, it would be like Pants is running
chmod -x
when calling
Workspace.write_digest()
1
f
recall that Remote Execution API is content-addressed which means anything that changes the hash of a protobuf changes the digest
👍 1
FileContent
is eventually turned into a REAPI
FileNode
which has
is_executable
as a field. different value => different digest
b
Out of curiosity, why does the rest of the mode not matter?
f
because the proto only stores
is_executable
and none of the rest of the mode
(technically Pants could store the full file mode in the
NodeProperties
proto, but Pants does not do that)