Hello. I'm fixing [an issue](<https://github.com/p...
# development
h
Hello. I'm fixing [an issue](https://github.com/pantsbuild/pants/issues/12019) that RemoteCacheWritesStarted is recorded iff the write finishes. I see why: the closure is getting canceled before it can finish. Ik how to fix, but wondering if I should do a more robust fix Specifically, @witty-crayon-22786 started https://github.com/pantsbuild/pants/pull/11759 to fix the issue with workunits and dangling pointers. Does it make sense for me to finish that change?
Note that I'm on vacation starting wednesday, but looks like this could be a good small-scoped change
w
would the macro fix the issue with the closure, or is it a cleanup nearby?
a comment describing what you think is happening on #12019 might be good
h
I think it's more adjancent cleanup - the fix iiuc is to have a distinct workunit for the remote cache write starting, and a distinct one for the remote cache write itself
w
i would think about it in terms of “threads/tasks”: everything that happens in the foreground task should be in one workunit, everything in the background should be in another
the effect of the macro is just to use the lifetime of the workunit itself to enforce that it doesn’t leak into background work on a different task
👍 1
so, yea… the macro would enforce that you don’t get that wrong. if we want to land it, and then incrementally move to it where we can, that would work i think. i haven’t looked at the macro recently, but would want to confirm that the behavior of the old method is unchanged
i could also probably clean that up to land today
h
It doesn't look too difficult to do it in one big-swoop. Want me to take it over?
w
sounds good: thanks!
👍 1
a
This sounds like it'd be a fairly natural use-case for an
impl Drop
?
h
Ah, that makes sense also - like a
try finally
in Python land
w
yea, the “in progress Workunit” struct implements Drop. i’m not 100% sure how Eric’s edgecase works.
h
@witty-crayon-22786 for the macro, should we also fix
context.workunit_store.record_observation
?
w
yea, probably.
a
Interesting... Yeah, I'd be interested to know how that
Drop
impl isn't happening...
w
in general, i believe that it is. workunits are marked canceled when they are dropped
h
should we also fix context.workunit_store.record_observation
Oh hm, looks like histograms aren't associated with particular workunits, so no need
w
they sortof are. when the workunit completes, some data is aggregated into a global structure, iirc. if all histogram users migrated to this API, the internals could be refactored. cc @fast-nail-55400
👍 1
h
Hm bummer, https://github.com/pantsbuild/pants/pull/11759 does not fix the missing workunit issue. But dangling pointers should be fixed at least
w
thanks for working on that though.
h
Agreed. Also in case you missed it, I couldn't figure out how to get the
tokio::select !
happy and I'd love some help if you're free. I posted a diff with what I was trying to use
w
i’ll take a look
🙌 1
i expect that it was because the tokio::select had an early return sitting in the middle of it, and the macro introduces new blocks
still working on it.
h
I think so too. I'm not sure how to avoid that early return tho - we need to skip cache writes Maybe return a (FallibleProcessResult, bool) and return outside the macro
w
coke
h
I see why: the closure is getting canceled before it can finish
This was not true. The workunit is not getting dropped because we used
tokio::spawn
. It should still be running, but in a background thread after the Pants session has finished The fix is simple for this: add a new workunit that always finishes
w