I opened a WIP draft PR for RFC on (what I'm calli...
# development
b
I opened a WIP draft PR for RFC on (what I'm calling) "Coalesced Process Batching": https://github.com/pantsbuild/pants/pull/15648 Looking for feedback on: • Naming. Naming is hard 😭 • Overall approach, specifically which data belongs in
SandboxInfo
vs.
CoalescedProcessBatch
• Feasibility. Right now I'm confident this could work well for formatters and linters. because this really only works on "successful" process runs where it's OK to throw away stdout/stderr. ā—¦ For it to work with, say, dependency inference, we'd have to not throw away stdout/stderr. Or maybe get clever with output files (output the result JSON to a file with a unique name, then collect them)? • Performance: How bad will this hurt the vanilla code, which now makes several
MultiGet
requests for little SandboxInfos. only be merged later. • Thoughts on reducing boilerplate code in the engine between the new type and existing Process
😮 1
w
The ultimate goal is to have the cache be populated per-file when running a process for maximum cacheability, but still run processes on batches of files for performance.
How does it work today re: caching? Are they batch cached?
b
Right now caching is done 1:1 with the process we run. If we run a process using a digest with 200 files, the cache key is comprised of those 200 files. And a single cache entry is inserted
w
Ah, okay, and then if any of those 200 files are invalidated or if dependency inference determines one of those 200 files has been invalidated - all 200 run?
b
In a nutshell, yeah.
w
Okay, that's what I thought - just wanted to confirm . Thanks!
šŸ™Œ 1
āœ… 1
Would this also eventually support tests?
And add-on to that, would this be affected by sharding?
b
I can't ever seeing tests working with this behavior. The challenge is that when we run the batched process, it will have stdour/stderr for the batch. It's technically infeasible to split that into output per-file. We can live without stdout/stderr for formatters/linters, but for tests it just is too valuable (see
--output=all
on the test goal)
šŸ‘ 1