Hey team I have a couple of beginner questions about enhanci Pants #general

Hey team, I have a couple of beginner questions ab...

gentle-painting-32087

07/01/2023, 3:33 PM

Hey team, I have a couple of beginner questions about enhancing the performance of our processes that gather dependees. Would be awesome if someone could help me to understand these questions better. Thanks a lot! 🧵

👀 1

gentle-painting-32087

07/01/2023, 3:33 PM

This is our current py-spy performance svg.

new-chage.svg

gentle-painting-32087

07/01/2023, 3:38 PM

Questions: 1. We have seen

validate_python_dependencies

at the end after gathering dependees. Is it possible to skip those processes to save some time? Thanks! 2. For find_owners, do we have some recommendations for improving the performance, I’ve seen a long waiting duration before executing the code, and I’ve seen a couple of threads regarding this, but any advice would be helpful. 3. Overall, our current process for collecting dependees runs within 5~6 minutes, do we think remote caching can save some time for the current process?

enough-analyst-54434

07/01/2023, 4:18 PM

1 & 2 total ~20% of runtime in your flame graph; so if those were optimized to 0 you'd still have 4-5 minutes which seems not much better at all. I guess the 1st question is what Pants command are you running and what are you trying to achieve with it?

bitter-ability-32190

07/01/2023, 4:25 PM

For 3, pants 2.17 ships with experimental support for parsing dependencies in Rust. It's actually about as fast as pulling from the old process cache locally, and therefore is faster than looking up in the remote cache. If you can, try it out and report back 🙂

👍 1

🫡 1

gentle-painting-32087

07/06/2023, 7:20 PM

@bitter-ability-32190 Thanks for the advice, could you please give me some advice on applying

parse_python_dependencies

for our use cases. We tried to build a dependency graph for our repo, then select the tests that needed to be run in our CI system based on the specs input. I have some difficulties connecting our current subsystem with the

PythonInferSubsystem

together. Thanks a lot, any advice would be helpful!

bitter-ability-32190

07/06/2023, 7:21 PM

I think I need more context, but I'm not really sure what you mean 😅

gentle-painting-32087

07/06/2023, 7:33 PM

For 3, pants 2.17 ships with experimental support for parsing dependencies in Rust. It’s actually about as fast as pulling from the old process cache locally, and therefore is faster than looking up in the remote cache. If you can, try it out and report back

Sorry about that, just want to follow your comments regarding this one, I did some research on how to use the rust dependency parse and would be great if you can point me to the correct direction for using it.

bitter-ability-32190

07/06/2023, 7:44 PM

There's an option for it: https://www.pantsbuild.org/v2.17/docs/reference-python-infer#use_rust_parser

🔥 1

gentle-painting-32087

07/07/2023, 6:15 AM

Copy code

time ./pants dependents app/xxx --stats-log --python-infer-use-rust-parser
./pants dependents app/xxx --stats-log --python-infer-use-rust-parser   0.49s user 0.20s system 0% cpu 3:48.40 total

My current observations are: 1.

--python-infer-use-rust-parser

could not boost the performance for

./pants dependents

(MultiGet/Get dependencies)

2. We use a local cache, but it does not help the rerun for

./pants dependents

commands. We always need to spend 5+ mins creating the dependencies graph. Attached is the latest pyspy svg.

latest-rust-pyspy.svg

bitter-ability-32190

07/07/2023, 12:52 PM

Do you have custom dependency inference rules? How many files do you have?

gentle-painting-32087

07/07/2023, 4:25 PM

We have 50000+ files and yes we have our own custom dependency inference rules.

bitter-ability-32190

07/07/2023, 4:25 PM

OK, then that timing doesn't surprise me 😅

bitter-ability-32190

07/07/2023, 4:25 PM

(although I havent seen the py-spy)

gentle-painting-32087

07/07/2023, 6:29 PM

Just curious, if I run

./pants dependents xxx

twice, should we cache the results from the first run and speed up the second run? I am just curious whether we could utilize more caches in our CI builds.

bitter-ability-32190

07/07/2023, 6:30 PM

So there's the daemon, which makes things go super speedy fast (in-process caching). If you exceed the memory limit, or a few other things, the daemon restarts. Then there's the process cache. If Pants has run a particular process in the past it'll pull from this. Then the rust parser uses a separate on-disk cache.

gentle-painting-32087

07/07/2023, 7:16 PM

After bumping the pantsd cache from 2GiB to 12GiB, the duration went down from 240 secs to 48 secs. 🔥🔥🔥🔥🔥🔥🔥🔥

gentle-painting-32087

07/07/2023, 7:20 PM

A follow-up question will be how to evaluate the correct size for running my pantsd in CI, currently, we don’t have a concrete number but would like to understand how to find the correct number.

5 Views

Open in Slack

Previous Next