Agreed that the first priority should be removing ...
# development
h
Agreed that the first priority should be removing use of
FileContent
. The thing is, I have no idea what is causing that to get persisted. I don’t see it anywhere in the outputted graph or in our code related to Pytest.
a
We can try removing your new uses, at least :D https://github.com/pantsbuild/pants/pull/7739
🔥 1
h
Yay! Although I don’t think I had added a new use here? With the new inject rule, we go from Snapshot -> ExecuteProcessRequest -> Digest and avoid needing FilesContent
Any ideas for how to figure out why we’re serializing the entire files’ contents, given that the graph doesn’t include
FileContent
? Notably, we do serialize the
BUILD
file for
./pants list
, just it’s always never more than one file. Looking at the rule graph, I’m wondering if
Validate
is happening?
Hm, update. Realized I had not been running with the inject init rule PR applied. It does stop us from materializing
FileContent
for sources which is great. The remaining issue is that we materialize each
BUILD
file involved in the sources
a
Yeah, that’ll be in
parse_address_family
So I guess it comes down to: We need the graph to only reference the digest, and for the bytes to only be handed to the rule outside the graph somehow…
h
Yeah, that’ll be in
parse_address_family
Is the only fix for this to move that logic to Rust? Could we possibly do that? Sending materialized files seems very costly
and for the bytes to only be handed to the rule outside the graph somehow…
Have Python use
with open()
to grab the bytes itself?
a
I’m not entirely sure how we’d do it, tbh… I think @witty-crayon-22786 mentioned the possibility yesterday, though…
w
Well, materializing them is one thing... Keeping them around for long periods is another.
👍 1
If they're nowhere in the Graph, it might be that we have Keys for them...?
...but, probably not. We should only create Keys for things that we are referencing from the Graph.
One important thing to differentiate is whether there is a Handle that points directly to the object, vs the object being held by another python object
For example: if the AddressMapper was holding onto BUILD files after parsing them
h
So we would expect that
./pants list
also should not be keeping the
BUILD
file contents in the handles at the end of its run, right? If so, I’ll focus on solving that because it’s a simpler rule and would also by extension fix this
O(b)
behavior for test runner (where
b
is number of BUILD files)
w
I'm not sure that one is a blocker though... filing a ticket would be good, but the total size of all BUILD files in most repos is ~200MB at most
👍 1
(but yea, we wouldn't expect the content to be held, because that intrinsic doesn't even have an associated Node in the Graph: we lift the content and then pass it directly to the @rule)
h
Good idea to confirm first. I’ll try a CI run again with init rule applied.
O(t + b)
is very different than
O(t + n)
- keeping all the BUILD files in memory vs. the entire repo in memory
w
Can also compare memory usage locally by running with --enable-pantsd and watching top. But can do both in parallel
👍 1