Another odd case from one of our engs. They report...
# general
b
Another odd case from one of our engs. They report touching a test file, then seeing us run the lint tools on the one file. Then a few pants runs (with the same exact cmd) later they see us run the lint tools on the "bucket" of files that test file is in.
@witty-crayon-22786 if need be I can share the workunit log JSON privately. My only suspicion so far is it has to do with pantsd invalidation, but I don't know why that would matter
w
hm, so: as it stands, the bucketing won’t coarsen up into larger buckets than have been specified. so if you ask to lint one file, one file will be linted (we won’t expand to a larger group)
so i think that that is currently expected.
h
Assuming other files in that bucket were modified, that is
w
Assuming other files in that bucket were modified, that is
no: regardless of that. the single file in the bucket was modified. there are two cache keys: one for the run with the single file, and one for the run with the bucket
we don’t convert the former into the latter
b
So if there's 2 files modified, what happens?
Actually better question is if I only ever used the same --changed commands for specs, when would it bucket vs not
w
Actually better question is if I only ever used the same --changed commands for specs, when would it bucket vs not
we always bucket: but we bucket the inputs, rather than expanding the inputs into some larger set of inputs and then bucketing those.
so if the input set is two files, we’ll bucket two files (not go and find larger buckets which contain those two files)
h
Oh right
This gets back to the larger potential project of splitting batched processes into individual "virtual processes" and caching the latter as-if they had actually run
b
So the only way we'd run the lint tools on 300-ish files is if pants thinks that many have been changed
w
correct
b
Now to figure out why that's our spec 😂
Would that be possible to suss out from the workunit JSON?
w
um, maybe not from the workunits, but from the raw run data, or debug logs maybe? at a fundamental level, all
--changed
is doing finding some “roots” which directly changed, and then (if
--…=transitive
) including the transitive dependees
i think that what you will see in the raw run data is the specs after the changed calculation though, unfortunately
b
I noticed a file being fingerprinted that was unexpected in the workunits, and has lots of transitive deps. My current suspicion is for some reason Pants is using it to calculate specs
w
if you have direct access, i would suggest:
./pants --changed=.. list
to see what it thinks have been directly changed
1
(without the transitive)