I'm having a hard time understanding how pants wor...
# general
g
I'm having a hard time understanding how pants works together with poetry in the context of a large monorepo where every project has its own
pyproject
files. I have generated the BUILD files which contain the
poetry_requirements
statement. However, when I run
pants test
it fails due to missing dependencies. These dependencies are defined and installed in the venv (
poetry run pytest
works fine) but
pants test
is obviously not picking them up. Can someone advise on how best to approach this?
g
Pants and poetry dependencies are independent. The
poetry_requirements
doesn't use the poetry lockfile or venv; it just reads the dependencies from it. From there on out it's dependent on your Pants setup. https://www.pantsbuild.org/stable/docs/python/overview/third-party-dependencies is a good starting point for reading, and after that we can maybe help if you share some more info about your setup.
g
Thanks! Yes, I'm reading and re-reading that page. I run
pants generate-lockfiles
and it exits without outputting anything. So, I'm not sure if it has generated a lockfile outside of the project or if it just hasn't done anything. But in any case,
pants test ::
doesn't have access to the dependencies.
pants dependencies
doesn't seem to pick up on the dependencies. Given a file with
import pandas as pd
, running
pants dependencies  FILE
returns nothing.
I was missing the
enable_resolves
option in my pants.toml. So, now it tries to generate a lockfile (it would be nice if I got a warning when this is missing). But unfortunately, it manages to concoct a versioning clash for itself. This is most peculiar as there is no such clash in our pyproject.toml files, but I guess it resolves a more permissive version requirement first (resolving to the latest), then clashes with a more restrictive version pinning. Altogether very odd behaviour. I'm unfortunately coming to the conclusion that
pants
is not the right fit for our project. The Poetry integration still seems only partially complete and I don't have the stomach for migrating away from poetry on top of everything else.
If I'm understanding correctly, part of the problem is that pants is generating a unified lockfile for the whole project. This might save some disk space, but is not a good fit for a large monorepo consisting of many deployables. Instead, it'd be better if it created a single lockfile for each
pyproject.toml
. It looks like there is a way to generate each lockfile separately that requires manual configuration. In our case, we have 76
pyproject.toml
files so I'd rather not go down that road.
g
But unfortunately, it manages to concoct a versioning clash for itself. This is most peculiar as there is no such clash in our pyproject.toml files, but I guess it resolves a more permissive version requirement first (resolving to the latest), then clashes with a more restrictive version pinning. Altogether very odd behaviour.
A few corrections are warranted here. Pants does most of its heavy lifting when it comes Python through Pex, for example generating a lockfile. This happens across the whole closure of your dependencies -- there's no permissive resolve etc, it'll pick your dependencies, and then all of those dependencies dependencies, and so on. It is not unlikely that with 76 pyproject.toml files and god knows how many direct and transitive dependencies that contains, that there is a conflict. If poetry can resolve all of those into one lockfile one of two tools have a bug or the setup is different.
I'm unfortunately coming to the conclusion that pants is not the right fit for our project. The Poetry integration still seems only partially complete and I don't have the stomach for migrating away from poetry on top of everything else.
That's a fine conclusion of course, and one I can understand with a codebase your size. I want to be clear that Pants does not integrate Poetry any more than it integrates PDM, hatch, or anything else. The only Poetry specific support I'm aware of is that we consume the poetry-specific additions to pyproject.toml.
If I'm understanding correctly, part of the problem is that pants is generating a unified lockfile for the whole project. This might save some disk space, but is not a good fit for a large monorepo consisting of many deployables. Instead, it'd be better if it created a single lockfile for each pyproject.toml. It looks like there is a way to generate each lockfile separately that requires manual configuration. In our case, we have 76 pyproject.toml files so I'd rather not go down that road.
Your understanding is correct here; this is the suggested way of working. It ensures all your libraries and all your entrypoints are compatible with each other. It's not a matter of disk space, it's easier and more correct. Pants will flat out not work if you were to have a resolve for a library and a different resolve for an executable using that library, since those lockfiles could have conflicting requirements. See this blog post, f.ex: https://www.pantsbuild.org/blog/2022/05/25/multiple-lockfiles-python.
g
Hi Tom, thanks for your detailed response. I think I disagree that creating a unified lockfile is the more correct approach. The reason is that our shared library dependencies are specified in our pyproject files. So, bundling with Poetry will fail if we have a version conflict between a library and a dependee application. IMO, it is not incorrect if Application A with Library X shares the same version of fastapi (or whatever) whereas Application B with Library Y has a different version of fastapi. It's interesting that you talk about different entrypoints. What I'm managing here is multiple different applications all bundled together in the same monorepo. So, there is absolutely no need to have the same lockfile since we're spitting out dozens of different containers, running in different environments. > there's no permissive resolve etc I don't understand this. When I'm talking about permissive, I'm talking about permissive dependency specifications in a pyproject file e.g
>= 1.01
. I'm not saying that Pants is being permissive, rather that some of our dependencies are specified in a permissive fashion. What I saw was there was no conflict in all of our 76 pyprojects for the dependency in question. However, there were both permissive and restrictive versioning requirements mixed across our subprojects (not ideal), so I think Pants resolved to the latest version permitted before running into the restrictive requirement. That's my assessment anyway, I could be wrong. > That's a fine conclusion of course, and one I can understand with a codebase your size. I think I'm confused by the Pants use case after reading your reply. My understanding was that it was built as a better Bazel for Python, i.e. one of the primary usecases is large heterogenous codebases. Is that incorrect? I'm still in love with its automatic dependency inferral and I'd love to be able to do conditional tests based on that. I think I'm going to have to set aside some more time soon to really dig in. Thanks again for your response and apologies for the delay, we had a public holiday here in Denmark.
g
I think there's a linguistic error on my side here. I don't distinguish much between entrypoints and applications at this stage, and I don't think about "projects", "libraries" etc in our Pants repo. We do have some semblance of that structure, but we have a codebase, and different ways of running the code. Some of those are completely disjoint, and some overlap with each other. This is different from our pre-Pants workflow, but we've found this easiest and removed a lot of complexities. I'll try to be clearer 🙂 > IMO, it is not incorrect if Application A with Library X shares the same version of fastapi (or whatever) whereas Application B with Library Y has a different version of fastapi. So what happens in your current codebase if application A also needs library Y (and Y needs fastapi)? Pants makes your declare and pay for this upfront, instead of when someone adds a new import that "breaks the world". We had this setup for a few google helpers we wrote (GCS upload/download, auth, etc). Then we realized we should use the same code for our deployed applications... so we had to merge the dependencies needed. Both setting this up as separate resolves and merging them was a lot more work; and if I'd known better when we migrated I'd have fixed them to use the same versions as everything else. > I don't understand this. snip Ok, maybe I misunderstood. Pants/Pex shouldn't be resolving individual deps until we've gathered all of the ones you've declared, so I don't think this is a result of your direct dependencies. If you can show such a case it seems like a clear bug. I do know this happens with transitive dependencies however, mostly when those dependencies have unnecessarily narrow dependencies (it's effectively a graph coloring problem at that point). > I think I'm confused by the Pants use case after reading your reply. My understanding was that it was built as a better Bazel for Python, i.e. one of the primary usecases is large heterogenous codebases. Is that incorrect? I mean a large codebase to transition. Having gone through 4-5 setups before landing on Pants at work, I can definitely say it's the most different from PDM/Poetry/etc, definitely better... but being different means migrating lots of code is potentially a lot of work and a lot of quirks. We found bugs both in our code and we've had to work around esoteric cases in Pants where our domain has caused issues (ML, generally poorly packaged/built libraries we need).
I'd also say that changing your mindset from "this is a project" to "this is a folder with Python files" is quite helpful for thinking about Pants. Unless your goal is to publish your packages, the way you structure your code on disk is for your feeble human mind, primarily. Thanks to dependency inference, you'll only get the Python files and packages you depend on (transitively), not the whole project, not the whole dir. Just the closure. I.e., if your library Y has 20 files but actually you only want the JSON parsing helper for datetimes, and that has no local dependencies... you'll only get that single file from that library.
g
K, thanks for the clarifications! I might look into incrementally adopting Pants for a single project to begin with to get some more familiarity with it.