Hi everyone! New to Pants, but excited about the p...
# general
n
Hi everyone! New to Pants, but excited about the potential for us to manage our Python monorepo. I'm using the newest version just posted (
2.21.0a1
).
pip_version = "24.0"
I'm seeing slow
export
and
generate-lockfile
times (well, slow relative to
pip
anyway. I know that a number of related questions have been asked, and I promise I've been reading them in an effort to debug. Here's what I've got so far. 1. I assume that I need to re-run
generate-lockfiles
and then
export
when I change my root universal requirements.txt? Is that correct? 2. I'm using Lightning Studios as my Linux dev environment, and although I have a hunch they are doing some stuff that slows down the hard drives a bit (in an effort to preseve the contents of the drives), it generally is plenty fast for my needs when using previous pip-based workflows. It takes my system 246s to
generate-lockfiles
(
pants --keep-sandboxes=always --pex-verbosity=9 -ldebug generate-lockfiles --resolve=python-default
) It takes 231s to `export`(
pants  -ldebug --pex-verbosity=9 export --resolve=python-default
) I wanted to first debug the slow
generate-lockfiles
. This seems equivalent to
pip install -r requirements.txt
after having cached everything. Here is a gist with my requirements.txt and a filtered pip-log from my
generate-lockfiles
. (I preserved the pip log file and filtered out all of the noisy "Skipping link" and "Found link" lines) https://gist.github.com/Taytay/492c12eaedce6c7999e0028fb6a9a50a From that log, it looks like It is spending the first 3 minutes or so just checking pypi (even though everything is cached locally), and "pretending" to download cached files. Then, for about the next 15s
11:10:57,542
to
11:13,909
it's copying wheel files into a temp folder, some of which are quite big (
2024-05-16T11:11:11,057 Saved ./.tmp/tmpjnt15mw0/system.conda.miniconda3.envs.cloudspace.bin.python3.10/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl
Then, it wraps up pretty quick. So my conclusion is: Even if the modules are already downloaded, it's going to take pex and pip a few minutes to copy the cached whl files around. Qs continued: 3: Can I avoid pip and make this faster? 4: I'm hoping there are flags I can pass to this to force pip to stop checking the remote so often, and just trust its cache? I feel like I'm missing something (because it's hard for me to imagine others are waiting 500s after updating their requirements.txt). 5: I know pants is good at generating hermetic environments, and a lot of thought has been put into caching, but is there some way to speed this pex caching up further? 6: Do ramdisks help here to speed up the tmp folders? 7: How hard would it be to switch to uv ?https://astral.sh/blog/uv? 😁 Any help (on any Qs) would be super helpful! Thank you!
c
Hi, and welcome! I'll just drop a few notes from what I know, and leave the rest for those with more experience on these matters than me 😉 First, regarding
uv
, there's this ticket: https://github.com/pantsbuild/pants/issues/20679 Secondly, pants does cache a lot, but when it comes to third party requirements, we're kind of out of luck, as it would be bad if there's a new version available that we didn't pick, because we had an older one in the cache.. granted, when merely adding new requirements, it would be nice to be able to say that we don't care to upgrade, only fill in what's missing, but I don't think there's an option for that. Sometimes, long resolve times is due to having very open constraints leading to pip going through a lot of version permutations to solve the dependency graph. If you're able, narrowing down both the Python interpreter constraint as well as potentially time consuming libraries are remedies that can speed up lockfile generation. Speeding up the
export
will probably not be easy (using ramdisks could speed up the i/o, sure, but if that's more than marginal... you'd have to try I suppose 😉 🤷) (and not using
pip
is also no small feat to change..) I have some vague memories that there may be some optimizations available to tell
pip
to not download entire libraries but only the metadata when doing the resolving.. not sure if this is enabled by default already or not if so.
b
In addition to Andreas’s points, one of the reasons why generating a lockfile might be slower than
pip install
is that
pip install
will install the packages specific to your OS + architecture, but lockfiles are cross-platform and will contain links to packages for other OS/platforms as well. That’s mentioned on a comment in the linked issue here: https://github.com/pantsbuild/pants/issues/20679#issuecomment-2002692556
n
Thank you both @curved-television-6568 and @better-van-82973 for the quick replies! My requirements file specifies the exact versions of every package I'd like to download, and my constraint in python-default specifies 3.10.* only. I presume this is the equivalent of asking pip to install packages for a single python version only. My pip log indicates that the time is mainly due to checking pypi and copying cached files rather than resolution, so I think that's good news. Using a ramdisk did speed up export significantly and my recent "export" only took 126s. (Based on this tip: https://docs.backend.ai/en/latest/dev/daily-workflows.html#boosting-the-performance-of-pants-commands) I think what's surprising to me is that installing new packages to the universe of packages in a monorepo must be an extremely common occurance for other much larger pants users. I can't imagine that other teams are willing to wait almost 10 minutes to get a new module installed and then exported (so the IDE can see it). So that makes me think my experience must be a massive outlier due to an issue of my own making, or I'm making some other faulty assumption about how to use pants.
b
I can’t speak for others, but for our own use case - eventually we hit a point of saturation where most libraries that were needed by any team were already installed (granted, this is with < 20 devs). So adding a new library to the mix really doesn’t happen that often. Of course, this can change depending on the team size - depending on how much of the codebase is shared you may want to split into multiple resolves, so that each resolve contains fewer packages and is faster to generate a lockfile for.
1
👍 1
g
Similar to @better-van-82973 I've found that we rarely need to regenerate lock files except for the following use cases: 1. add/remove deps -- this rarely occurs at this point. We have about 100 python modules across our monorepo. 2. upgrade/downgrade deps -- this is still relatively infrequent, but happens more often that add/remove deps. Most commonly this happens when we find a bug in a 3rd party dep that's fixed in a newer version or there is a known CVE that is published.
👍 1
n
Ah - fascinating! That does help explain why this isn’t as huge of a pain point for others. I think my last year of experiments with python and AI in particular have involved adding/upgrading packages quite frequently. I am starting to stabilize but was trying to optimize for 2 things: 1) quick dev machine setup (which used to just involve pip install) 2) quick iteration experimentation as new modules or versions are released (especially with an off-updated module like transformers) (It’s quite likely that this extra setup time pays off in later usage of pants. It’s just surprising that it’s known for being so fast in other ways, and this seemed slower than the alternative it replaces. ) If anyone else knows tricks or cheats to speed this up or iterate more quickly (especially with the pip sequential download thing!) I’d love to hear it. I will hack around in the meantime and see if I can come up with a hack for this use case.
g
Yeah it was a pain in the ass during pants adoption, not going to lie. ML is also a different beast...
n
That’s actually a relief to hear. I am worried I’m swimming upstream with an ML monorepo…
g
We do minimal ML, but it's definitely there and growing. I'm curious about your use case because at my company we use a decent amount of ML, but we aren't constantly updating or adding libs.
h
Pex does actually now support incremental updates of lockfiles, but it's not been wired up in Pants yet.
🙌 2
🔥 2
And I am looking into uv as the resolver for a putative new Python backend: https://github.com/pantsbuild/pants/discussions/20897
n
@happy-kitchen-89482: Awesome news on both! I look forward to hearing more. For raw speed reasons, I'm being drawn to the "new hotness" that is uv and minimal stuff like rye https://rye-up.com/guide/installation/. It handles python installation and some build stuff for me, but it's admitedly MANY fewer features than pants. I might not have pants-sized problems yet. I wish I had that speed (honestly I'd settle for 10x slower than that ), and pants' featureset. 😉
h
That's what I'm hoping to bring to the new backend
❤️ 1
It might launch initially with ruff as the only linter/formatter, for example...
❤️ 1
n
Hi @happy-kitchen-89482! Thanks for mentioning that wishlist issue! I feel like you have your finger on the pulse of these issues, so you don't likely need more, but I added a (long) comment that documents my experience with this and related issues: https://github.com/pantsbuild/pants/discussions/20897#discussioncomment-9469665 🙂
🙌 2
🙏 1
c
Thanks @nice-pillow-26422, honest feedback like that is invaluable and very much appreciated! ❤️
❤️ 1
h
Thanks, will read it carefully later, but this is extremely valuable.