Hi all, I'm trying to convert our repository into...
# general
p
Hi all, I'm trying to convert our repository into a mono-repo by splitting code into multiple python packages. To this end, I've defined two packages under
packages/pkgA
and
packages/pkgB
with their individual
pyproject.toml
files, using poetry. However, while editing the dependencies, I just noticed that the code under
pkgB
imports
numpy
and
pants dependencies
shows
packages/pkgA:poetry#numpy
. This means that the tests of
pkgB
run successfully, even though a dependency is missing, because it is present in another package within the repo. Is it possible to disable this behavior and have them only look up
packages/pkgB:poetry#*
or fail if not found? PS. Currently on pants 2.12 if it matters and not using lockfiles (yet)
h
Hi there! It sounds like you probably want to use the "multiple resolves" feature, which lets you have different lockfilfes for parts of your project. https://www.pantsbuild.org/docs/python-third-party-dependencies#multiple-lockfiles You'd define one "resolve" per Poetry project
p
Would there be a way to sync those resolves, i.e. if a dependency is present in more than one, to always have the same version? If they're independent I'm afraid that the version management will become tedious with many packages in the repo (goal is >20 at the moment). For development, we're using poetry to do editable installs and only have a global
poetry.lock
at the top level, as a single source of truth for the dependency versions.
h
Ah, so sounds like you actually can have a single Pants lockfile
1
With Pants you wouldn’t need those individual pyproject.toml files
1
At least not for the 3rdparty requirements part (you may need them for other reasons)
You can have a single top-level Pants lockfile instead
Multiple lockfiles is for when you need to have incompatible deps in different parts of the codebase
if you actively want to keep them in sync (and that is definitely the right thing to do if you can) then you don’t want multiple lockfiles
Note that the lockfile is exactly that single source of truth. Pants only uses the subset of the lockfile that you actually need in any given situation, so having it be large is not a problem
👍 1
Oh, but just saw that you’re not using lockfiles yet. I’d recommend using them if you can, since this is exactly what they’re for 🙂
And maybe switch to Pants 2.13 if you can, if you want the latest and greatest
wrt lockfile support
p
That sounds good. Two follow up questions: • What would be the recommended way to specify the dependencies? E.g. a top-level pyproject.toml (poetry), requirements.txt or
python_requirements
instead of the multiple pyproject.toml files? • We're using poetry (per package) to allow developers to work with editable installs of some of the subpackages. Would there be a way to do this using this setup with pants if we omit the per-package pyproject.toml (and maybe poetry altogether)?
h
E.g. a top-level pyproject.toml (poetry), requirements.txt or python_requirements instead of the multiple pyproject.toml files?
exactly
Would there be a way to do this using this setup with pants if we omit the per-package pyproject.toml (and maybe poetry altogether)?
So, a key idea with Pants is that you are using HEAD for everything in the repo. You don't need editable installs of the other packages in the repo because you already have access to them. https://blog.pantsbuild.org/the-monorepo-approach-to-code-management/ talks about that conceptual difference, and Benjy could probably explain this more better than me
p
Thanks for the reply! On the topic of monorepo:
So, a key idea with Pants is that you are using HEAD for everything in
the repo. You don't need editable installs of the other packages in the
repo because you already have access to them. https://blog.pantsbuild.org/the-monorepo-approach-to-code-management/ talks about that conceptual difference, and Benjy could probably explain this more better than me
In our current development workflow, each developer has a venv with an editable install of our package (soon to be multiple packages) and the required dependencies, in order to develop/debug. I'm imagining having the same in the monorepo, e.g. one person having
pkgA
and
pkgB
, another
pkgA
and
pkgC
and so on, all on the same commit (either main branch or their own feature branches). It would be possible to create a venv with all dependencies with
pants export ::
, but I don't see a way to create a venv with the desired subset if you only want specific packages or to do an editable install for development through pants (only using poetry or setuptools). Could you share your perspective on that? Maybe there is a different workflow that I haven't considered...
h
So if using Pants idiomatically, you wouldn’t need any of these venvs or editable installs
pkgA, pkgB, pkgC are just code in the repo, and pants does the right thing
E.g., when you
./pants test pkgB
pants will see the dep from pkgB to pkgA, know that it needs to create a venv with pkgB and pkgA’s 3rdparty deps and their internal code, and will then use that to run the tests
You shouldn’t usually need to
./pants export
except for debugging
There is a question of how to package and deploy your code
Today, it sounds like each of
pkgA
,
pkgB
,
pkgC
are separate distributions that you publish, maybe to an internal PyPI server?
If so, and you want to continue to do that, no problem, you give each package a
python_distribution
target
But you can also consider whether you actually need to publish these internal libraries at all?
Typically people do this because the standard Python tooling doesn’t support code reuse and sharing any other way. But Pants does!
Via, for example, PEX files.
So you only need to package and publish distributions if code outside the repo needs them
p
Hi @happy-kitchen-89482, thanks for the detailed answer. I agree with you on the matter of distribution, I could work with an internal PyPI server or directly in the repository. Both options are quite easy with pants. Also, when using pants directly, e.g. with
./pants test ...
it builds the right venvs automatically, so it's not a problem there. My point was more about some of the advantages of a venv + editable install workflow that cannot be replaced by functionality in pants, or for which I do not know pants well enough to find the alternatives. I saw the guide for the IDEs and will try autocompletion and debugging with our used IDEs (mainly VSCode and pycharm). Running
sphinx-apidoc
will be a bit more complicated, as it needs all first party code (either in python path or as editable install) and all 3rd party deps, as the code is unfortunately imported. One more case that we have is running code from the command line without the pants dependency resolution, i.e. with the complete first party code (not only what is explicitly imported) and all 3rd party dependencies. This is needed because the code imports modules dynamically based on a configuration file, so the dependency resolution of a
pants run
does not work. For the last two cases (
sphinx-apidoc
and dynamic imports) we'll probably opt for the full venv created by
pants export
and either python path tricks or some kind of editable install of the packages until we find better solutions. It just bothers me a little bit, because the idiomatic use of pants is so nice and elegant, that I want to find some solution for the rest of the cases.
h
Gotcha
So re
./pants run
with dynamic imports, can you add those configuration file dependencies manually in the BUILD file? Or does that configuration file change too frequently for that kind of redundant information to be easy to maintain?
And re sphinx, we have been talking about a sphinx plugin (possibly to use for generating Pants’s own documentation site!)
That would be the idiomatic thing
p
So re
./pants run
with dynamic imports, can you add those configuration file dependencies manually in the BUILD file? Or does that configuration file change too frequently for that kind of redundant information to be easy to maintain?
Unfortunately any valid python symbol can be present in the configuration, meaning that the dependencies would have to be the whole repo. I had tried with
dependencies="::"
or similar wildcard targets, but it's not supported. In the long term, I think probably a new goal (similar to
run
, as a custom plugin) would make sense in order to parse the configuration and automatically find the dependencies as is currently being done with python files.
And re sphinx, we have been talking about a sphinx plugin (possibly to use for generating Pants’s own documentation site!)
I saw that and would welcome it, especially with
sphinx-apidoc
support 👍
Anyway thanks for the answers, I think I will have to live with the workarounds for a bit. The benefits outweigh the inconvenience for my special cases 😉
h
So the configuration isn’t something checked in, but something that can change a lot between runs?
What is an example of how a user passes in this configuration?