Anyone looked at batching gets from REAPI recently...
# development
f
Anyone looked at batching gets from REAPI recently? I'm investigating a performance issue with fetching a large number of files in a directory (300k, node_modules) and I think batching might be able to help. I found https://github.com/pantsbuild/pants/issues/6990 and the linked stale/closed pr that I can use as a starting point. But open to your input here on how I should approach this. I'd also like to figure out how to confirm my hypothesis here prior to implementing it fully
b
Re the first question: I don't remember hearing about anyone else looking at this recently.
h
@fast-nail-55400 probably knows the most about this
f
Pants certainly should be batching its CAS read requests. The work to do so just needs to be designed and implemented.
(and not something I have had or will have time to do)
f
I can work on this, I was able to get a first pass of this working already. How much design work do you anticipate is needed here? Would you prefer to take a look a draft pr and provide feedback that way, or some other approach?
f
Depends on how far we want to take the batching. Simple would be to just batch for specific
ensure_local_…
calls. But if multiple async tasks are requesting the same digests concurrently then we may also want to dedupe, but that would require something more like an actor task to receive read requests and batch them itself.
I’d vote simple first.
👍 2
h
At worst we’d be no worse off than we are today.
f
early draft of the approach if anyone has time to leave feedback. In the pr description I've got some specific items. early numbers are showing ~50% reduction for my use case https://github.com/pantsbuild/pants/pull/22243
💥 1
c
early numbers are showing ~50% reduction for my use case
😻