just observed that there seems to exist a race when a proces Pants #development

just observed that there seems to exist a race whe...

curved-television-6568

04/21/2023, 9:57 PM

just observed that there seems to exist a race when a process result is cached and the digest for the files involved is captured. 🧵

curved-television-6568

04/21/2023, 9:58 PM

I’m not sure how to investigate this so I’ll just dump some facts first, see what comes out if anything…

curved-television-6568

04/21/2023, 10:00 PM

hacking on pants code base I have some

NameError

in the source, however I notice it just as I run a command, so the scheduler reinitilize and run and spits out the exception and fails. at the same time I fixed the error and saved. now, when I re-run, it still complains with the same error (cached result) but if I save another (dummy) change it re-runs and works.

curved-television-6568

04/21/2023, 10:00 PM

anyone have any ideas or thoughts here?

enough-analyst-54434

04/21/2023, 10:33 PM

Well, we use file watching in pantsd and that sounds like a missed event, which can always happen. I believe across OSs these systems are designed lossy with a ringbuffer to fend against badly performing watch clients. So this should all boil down to how we handle missed events, which I can't recall, but should give you a narrow area in the codebase to look.

enough-analyst-54434

04/21/2023, 10:34 PM

But, just on the surface this problem sounds fundamentally racy. Did this just happen once?

enough-analyst-54434

04/21/2023, 10:43 PM

So we invalidate all files on ring buffer overflow: https://github.com/pantsbuild/pants/blob/2983f166e18b1c1a480b85bf722d16d514b41b17/src/rust/engine/watch/src/lib.rs#L260-L262 But not all OSes support that event.

curved-television-6568

04/22/2023, 12:11 PM

I’ve seen this before, but rarely, I think there’s a very narrow window for when the file mutation must take place in order to exhibit this behaviour.

curved-television-6568

04/22/2023, 12:11 PM

I’m not sure I follow the reasoning why a missed file watching event would explain this. I would expect the failed process to use the fingerprint of the file prior to my saving the fixed version of the file; which was used to launch the process, rather than the fingerprint of the fixed version as it were when the process failed. But I guess that’s a narrow enough code path described to actually dig into…

Open in Slack

Previous Next