<#17477 batching dep inference> New discussion cre...
# github-notifications
q
#17477 batching dep inference New discussion created by benjyw I did a little bit of manual benchmarking of the python dependency parser process. I took that script, modified it to accept multiple input files, and ran it on all 1227 .py files under src/python/pants, with various size batches. As expected, almost the entire runtime consists of process overhead: Note that I ran this experiment outside of Pants, using
find
and
xargs
, so this doesn't include sandbox setup time. Also, the batches ran sequentially, so the wall time of the current one-file-per-process strategy in Pants is faster than that 99 seconds, thanks to parallel execution. But it's still much slower and more CPU-expensive than it needs to be. E.g., we know we have users with larger repos for whom full-repo dep inference (e.g., in a call to
./pants peek
) takes several minutes. So it seems reasonably clear that batching the dependency parsing is a big perf win. (I'm referring to Python here, it seems likely that similar benefits would obtain for JVM at least). This discussion is to, er, discuss some options for doing so. pantsbuild/pants