just observed that there seems to exist a race whe...
# development
just observed that there seems to exist a race when a process result is cached and the digest for the files involved is captured. 🧵
I’m not sure how to investigate this so I’ll just dump some facts first, see what comes out if anything…
hacking on pants code base I have some
in the source, however I notice it just as I run a command, so the scheduler reinitilize and run and spits out the exception and fails. at the same time I fixed the error and saved. now, when I re-run, it still complains with the same error (cached result) but if I save another (dummy) change it re-runs and works.
anyone have any ideas or thoughts here?
Well, we use file watching in pantsd and that sounds like a missed event, which can always happen. I believe across OSs these systems are designed lossy with a ringbuffer to fend against badly performing watch clients. So this should all boil down to how we handle missed events, which I can't recall, but should give you a narrow area in the codebase to look.
But, just on the surface this problem sounds fundamentally racy. Did this just happen once?
So we invalidate all files on ring buffer overflow: https://github.com/pantsbuild/pants/blob/2983f166e18b1c1a480b85bf722d16d514b41b17/src/rust/engine/watch/src/lib.rs#L260-L262 But not all OSes support that event.
I’ve seen this before, but rarely, I think there’s a very narrow window for when the file mutation must take place in order to exhibit this behaviour.
I’m not sure I follow the reasoning why a missed file watching event would explain this. I would expect the failed process to use the fingerprint of the file prior to my saving the fixed version of the file; which was used to launch the process, rather than the fingerprint of the fixed version as it were when the process failed. But I guess that’s a narrow enough code path described to actually dig into…