https://pantsbuild.org/ logo
h

hundreds-father-404

05/16/2019, 5:49 AM
Agreed that the first priority should be removing use of
FileContent
. The thing is, I have no idea what is causing that to get persisted. I don’t see it anywhere in the outputted graph or in our code related to Pytest.
a

average-vr-56795

05/16/2019, 12:47 PM
We can try removing your new uses, at least :D https://github.com/pantsbuild/pants/pull/7739
🔥 1
h

hundreds-father-404

05/16/2019, 3:03 PM
Yay! Although I don’t think I had added a new use here? With the new inject rule, we go from Snapshot -> ExecuteProcessRequest -> Digest and avoid needing FilesContent
Any ideas for how to figure out why we’re serializing the entire files’ contents, given that the graph doesn’t include
FileContent
? Notably, we do serialize the
BUILD
file for
./pants list
, just it’s always never more than one file. Looking at the rule graph, I’m wondering if
Validate
is happening?
Hm, update. Realized I had not been running with the inject init rule PR applied. It does stop us from materializing
FileContent
for sources which is great. The remaining issue is that we materialize each
BUILD
file involved in the sources
a

average-vr-56795

05/16/2019, 4:19 PM
Yeah, that’ll be in
parse_address_family
So I guess it comes down to: We need the graph to only reference the digest, and for the bytes to only be handed to the rule outside the graph somehow…
h

hundreds-father-404

05/16/2019, 4:20 PM
Yeah, that’ll be in
parse_address_family
Is the only fix for this to move that logic to Rust? Could we possibly do that? Sending materialized files seems very costly
and for the bytes to only be handed to the rule outside the graph somehow…
Have Python use
with open()
to grab the bytes itself?
a

average-vr-56795

05/16/2019, 4:21 PM
I’m not entirely sure how we’d do it, tbh… I think @witty-crayon-22786 mentioned the possibility yesterday, though…
w

witty-crayon-22786

05/16/2019, 4:22 PM
Well, materializing them is one thing... Keeping them around for long periods is another.
👍 1
If they're nowhere in the Graph, it might be that we have Keys for them...?
...but, probably not. We should only create Keys for things that we are referencing from the Graph.
One important thing to differentiate is whether there is a Handle that points directly to the object, vs the object being held by another python object
For example: if the AddressMapper was holding onto BUILD files after parsing them
h

hundreds-father-404

05/16/2019, 4:30 PM
So we would expect that
./pants list
also should not be keeping the
BUILD
file contents in the handles at the end of its run, right? If so, I’ll focus on solving that because it’s a simpler rule and would also by extension fix this
O(b)
behavior for test runner (where
b
is number of BUILD files)
w

witty-crayon-22786

05/16/2019, 4:31 PM
I'm not sure that one is a blocker though... filing a ticket would be good, but the total size of all BUILD files in most repos is ~200MB at most
👍 1
(but yea, we wouldn't expect the content to be held, because that intrinsic doesn't even have an associated Node in the Graph: we lift the content and then pass it directly to the @rule)
h

hundreds-father-404

05/16/2019, 4:33 PM
Good idea to confirm first. I’ll try a CI run again with init rule applied.
O(t + b)
is very different than
O(t + n)
- keeping all the BUILD files in memory vs. the entire repo in memory
w

witty-crayon-22786

05/16/2019, 4:34 PM
Can also compare memory usage locally by running with --enable-pantsd and watching top. But can do both in parallel
👍 1