I think I broke `lint` for formatters :thread:
# development
b
I think I broke
lint
for formatters 🧵
Still stewing on it, but
autoflake
is our first formatter, and if it makes changes, you'll see it "fail" in
lint
after converting to a
LintResult
. I'm guessing because the inputs are the same (
None
snapshot) the rule is getting memoized (whereas previously the snapshot would differ since
fmt
changed the file). ...I think For whatever reason I'm feeling a bit fuzzy on the memoization today
h
but autoflake is our first formatter, and if it makes changes, you'll see it "fail" in lint after converting to a LintResult.
we want that, right?
b
I should say I'm running
fmt lint ::
h
Oh. So I think the issue is the inputs to the Autoflake rule aren't changing, maybe? After
fmt
does it's thing, we would expect the inputs to have changed from when Autoflake ran for `fmt`; thus, it reruns for
lint
b
Bingo. I think thats because the type of the input went from a
FieldSet
(which I'm guessing gets invalidated? Due to maybe the source field?) to a filepath
Thats the fuzzy part
h
Hm I'm not sure why the FieldSet would be any different, all we have is the raw source glob in the FieldSet
Maybe try logging the inputs to the Autoflake rule before and after? You'll want to log inside
lint.py
I think, not the autoflake rule. Use
--only=autoflake
b
The FieldSet has to be the magic, otherwise the lint rules would be memoized between runs when you touch a file, right?
h
I might need backup here. @witty-crayon-22786 why did that work before?
👍 1
b
As for the solution, I think we have to bring back the snapshot as an unconditionally already-populated field.
That or maybe mark SubPartition as uncacheable, making the rule run, but ITS Gets are rightfully cached
w
the
FieldSet
doesn’t “contain” the file: instead, consuming it and requesting sources (capturing a
Snapshot
of the sources) depends on the files on disk. when they change, the dependees are invalidated
there isn’t any magic involved in
lint
/
fmt
around caching, except that the Result types are marked uncacheable for the purposes of their log messages
but yea. i don’t fully understand the issue, but it doesn’t sound like you “broke” the actual run: you just might not see the logged side-effect of
autoflake
running twice (in
fmt
and then in
lint
), because it only actually runs once (in
fmt
)
b
But since it changed the sources, it should ideally re-run in
lint
and succeed
w
how does it get the new sources in
lint
?
is it given a snapshot of the new sources as part of the request?
b
Nope, just file paths. And I think that's the crux of the issue
w
shouldn’t be. using filepaths via PathGlobs will trigger invalidation
between
fmt
and
lint
, files will have changed on disk, and nodes should be invalidated
if you’re not seeing that, it would be a bug.
b
Yeah, there's a lot of moving parts. Try it out. I think just adding an unused import and then
fmt lint
that file. (And optionally using --only)
lint
should succeed after rerunning autoflake
w
yea, you’re right. i see the invalidation happening though:
Copy code
+ autoflake made changes.
Copy code
13:48:06.10 [INFO] Filesystem changed during run: retrying `Lint` in 500ms...
Copy code
✕ autoflake failed.
so: what is being written to disk?
…the right thing is being written to disk. interesting.
1
@bitter-ability-32190: do you want to revert until you’re back from vacation?
b
I think making the SubPartition not cacheable might do it?
w
inputs can’t really be marked uncacheable (or, they can, but only if they’re marked
SideEffecting
)
i’m not going to have time to investigate this in the next few days, so we should probably revert until someone can
h
ditto that I won't have time to investigate, unfortunately
b
Oh I see. Because the file changes, the pathglob invalidates, and so the subgraph gets should be invalidated before the memoization happens for the fmt SubPartition rule
w
yea
b
Hmm
w
and so the subgraph gets should be invalidated before the memoization happens for the fmt SubPartition rule
i mean, between
fmt
and
lint
, the writing of the files to disk will invalidate a bunch of nodes out of the graph, such that they need to re-run
b
So I'm "around" for an hour or so each day. I can take a look over the next few days. Id thats too long, feel free to revert (but I'd like for the reverter to be sure it "fixies" the issue and that it wasn't always underlying)
c
I don't think your changes caused this, I tried this on the commit before yours and I still get that
./pants fmt lint
will have a formatter make changes, fail its lint, and write the correct output to disk.
oh, wait, this other commit did it :(
w
Thanks! Yea, a bisect would be super super helpful
b
I wanna say I'm surprised, but also what the hell 😂
c
Doing a git bisect within the PR is a bit rough, most of the early commits hit an import error. but the issue is somewhere in these changes
b
I'm guessing the graph isn't being invalidated like we expect. And this worked before because we were using a snapshot which would have a different value regardless of invalidation
Hmm could it be that the graph is being invalidated, but we never get to the invalid subgraph? Since we're passing file paths which come from FieldSets sources across the rule boundary, we haven't yet hit a pathglobs request. Everything up until the rule that runs the formatter is static. Either way, the fix is simple. We collect snapshots in lint for formatters and pass them to the SubPartition construction