I'm evaluating using Pants as we transition to a m...
# general
a
I'm evaluating using Pants as we transition to a monorepo in my org. Is it possible to have a single
pyproject.toml
file at the root of my repository and ensure that these dependencies are installed when running e.g.
pants test /path/to/project
? The folder structure is similar to this:
Copy code
my-repository/
    pants.toml
    pyproject.toml
    poetry.lock

    projects/
        project-a/
          README.md
          project_a/
           routers/
             ...

        project-b/
          README.md
          project-b/
           routers/
             ...
g
Hello! This should be possible; though I haven't tested it myself yet. Pants should be able to use a
pyproject.toml
as the source for a
python_requirements
build target: https://www.pantsbuild.org/docs/python-third-party-dependencies#pep-621-compliant-pyprojecttoml The rest of the setup should work as regular with setting up a default resolve, generating lockfiles, and running your tests.
r
Adding to what Tom said, it looks like you are using poetry, in that case you would need
poetry_requirements()
https://www.pantsbuild.org/docs/python-third-party-dependencies#poetry
⬆️ 2
a
Yes, @refined-addition-53644, you are correct. However, if I run a target such as
pants test /path/to/subproject/tests/unit
pants will give me a ModuleNotFound error. It appears as if having
poetry_requirements
at the top-level is not enough. What would I put in my
BUILD
files at the subproject?
I attempted to use the
source
parameter for
poetry_requirements
but receive the following error:
Copy code
InvalidFieldException: The 'source' field in target projects/my_project:my_project should not include `../` patterns because targets can only have sources in the current directory or subdirectories. It was set to ../../../pyproject.toml. Instead, use a normalized literal file path (relative to the BUILD file).
Which makes perfect sense, since this is inside a sub-folder of the root directory where the pyproject.toml is defined.
g
Having it at the top level should be sufficient in most situations. If you run
pants dependencies /path/to/subproject/tests/unit/some_file.py
, does it reference any of your dependencies at all? Do you have a default-resolve setup, and/or does your
python_tests
and the
poetry_requirements
use the same resolve? Does generating lockfiles work (
pants generate-lockfiles
)?
a
I guess I have some reading to do before I can answer your questions accurately. What we've done in essence is install Pants, configured it to use the python backend, and ran
pants tailor ::
. It then generated
BUILD
files in all subdirectories containing python-sources. And the
BUILD
file at the top-level has the following:
Copy code
poetry_requirements(
    name="root",
)
I had assumed that this would be sufficient for me to be able to run
pants test
afterwards, and have the necessary dependencies installed in the PEX. If I run
pants dependencies
on one of the files as you suggest it does contain references to the relevant projects in the form of
/path/to/project:poetry#requests
as an example.
g
/path/to/project:poetry#requests
sounds like you have multiple sets of requirements here - not sure if that's your intention. If you have a root pyproject.toml with your dependencies I'd expect that to say
//:root#requests
.
Just to make sure, is the package it complains about listed (exactly) in one or both of those dependency sets? Some packages have one pypi name and another import name, and that requires you to manually tell Pants how to map those.
a
Yes, sorry. Assume we have the following:
Copy code
repository/
    libs/
       lib-1/
         lib_1/
    projects/
       project-1/
         project_1/
In this case, we have
lib-1
installed as an editable development dependency in the top-level pyproject.toml, howver, lib-1 also has its own
pyproject.toml
which contains its own dependencies. I suspect this isn't the way to handle first-party dependencies in Pants. In this case the
BUILD
file in
lib-1
at the top level simply contains
poetry_requirements()
.
g
This seems fine (but as you say, probably a bit unusual). You can definitely combine dependencies from multiple sources as long as they are part of the same resolve. In-repo dependencies (path-based) in pyproject.toml should be ignored by Pants as it has its own mechanism for resolving them. As noted above though; I haven't yet migrated our repo to use PEP621 instead of requirements.txt so I'm only speaking generally. To back up a bit: is the module it complains about one of your local libraries or one of your dependencies? What is its name (in your pyproject.toml), and what is the import statement?
a
The module it complains about is one of our local libraries. In the pyproject.toml file it is defined as:
lib-1 = {path = "libs/lib-1", develop = true}
.
g
What are your source roots set as?
pants roots
should list them all
a
The output is simply:
Copy code
libs
projects
Note, I updated the folder structure four messages above. All of my projects have a top-level folder with a hyphen followed by the package with an underscore. If I add
libs/lib-1
to my root_patterns, things start to look better.
g
Ah! Great. We have a winner, or a misconfiguration at least. So: pants does things slightly different from what you'd expect. It does not in any way build wheels for local code. It just looks at what you import, finds it in your source roots, and copies it into the appropriate location - recursively. So, in this case if you import
lib_foo.a
it'll check try to find
libs/lib_foo/a.py
and
projects/lib_foo/a.py
, etc. This means that if your layout is
libs/lib-1/lib_1/a.py
or some such it won't find it. The solution is to add each project as a source root. The easiest way to do this is to use a marker file, which allows you to say
pyproject.toml
denotes a root you can import from.
Copy code
[source]
marker_filenames = ["pyproject.toml"]
❤️ 1
(I'm simplifying a lot here - the mechanics are a lot smarter than I make them out to be.)
This page describes source roots in greater detail: https://www.pantsbuild.org/docs/source-roots
a
Ah yes, that makes sense! Now I'm (almost) able to get everything running. I use a pytest plugin
pytest-env
to define environment variables directly in the
pytest.ini
file. However, it is saying that all of my environment variables required for the tests are not defined.
I see that an option
extra_env_vars
exists in the
python_tests
directive. I'll try using this first.
👆 1
g
Sounds like a plan. (nvm the other part, pytest.ini should be found by auto).
Plugins for pytest are in a bit of a flux depending on your Pants version; but this is the "easy" method: https://www.pantsbuild.org/docs/reference-pytest#extra_requirements
⬆️ 1
a
Sorry to keep on bugging you. I'll check out the
extra_requirements
, that seems to work. I was also able to get
extra_env_vars
to work. However, now I'm getting another ModuleNotFoundError, this time for a third-party package that is present in the top-level
pyproject.toml
. Since last time I have removed
pyproject.toml
from
lib-1
, since it is now a first-party dependency and have updated my roots to include it correctly. If I run
pants dependencies
on the tests folder of the project the module it complains about is not listed here.
g
Is it named exactly the same? For example,
PIL
is provided by
pillow
, which Pants cannot know about easily.
a
Ah, indeed it is not. That should've occured to me as well to be honest.
The name of the package is
python-package
, but the import is simply
from package import ...
g
a
That works! It's becoming clear to me that Poetry has handled a great deal behind the scenes with regards to consuming packages.
It definitely seems like we're getting somewhere. One of my
patch
calls from
unittest.mock
is now failing with an AttributeError. The error message is:
lib_1.utils has no attribute 'something'
. However,
libs/lib-1/lib_1/utils/something.py
does exist. Have you experienced similar issues?
In addition, I'm a little concernced about the fact that
pants dependencies projects/project-1/tests/test_something.py
does not list the module that we needed to configure in
module_mapping
as a dependency.
g
This likely means the file isn't explicitly imported anywhere. Only files and dependencies that are explicitly used are imported into the sandbox.
(this transitive across all files, so it being explicitly imported from something you import is fine too)
a
I see. That could be the issue, since this is a
patch
call inside a unit-test. The actual code is never directly imported since the test uses FastAPIs TestClient to carry out an HTTP Request that calls another function. I guess I would have to inform Pants that all of the functions in the
routers
are dependencies of the tests (?)
g
Yeah explicit dependencies are likely your goto if you cannot import the actual code.
a
Is there a way to define dependencies as "all targets of this subfolder"? E.g. my tests needs to depend on all files in the current project.
g
I think that'd be possible -- but can I ask why? It seems weird to depend on a ton of code you're not importing. Everything you import should be there.
a
I agree. However, we have the following issue:
fastapi.TestClient
is used to run an API query against
/my_endpoint
. This in turn calls
some_function()
in my code. However (AFAIC) this dependency cannot possibly be inferred by Pants, unless there is a plugin that tightly integrates with FastAPI. As a result, I have a bunch of tests that make API calls that in turn call functions in my project which aren't picked up automatically.
I'll try to get a better grasp on the dependencies and transitive dependencies, but this one does baffle me:
pants dependencies --transitive projects/project-1/tests/unit/
One of the dependencies listed is
projects/project-1/project_1/routers/users.py
which has the following on line 8:
from thefuzz import fuzz
. The package
thefuzz
is present in the top-level
pyproject.toml
file, but is not included in the list of transitive dependencies. As far as I can tell, this should definitely have been picked up as a dependency by Pants.
g
You probably need
pants dependencies --transitive
a
Yes, I am using the
--transitive
flag.
g
Ah, sorry, yeah, read too quickly.
Does
pants dependencies projects/project-1/project_1/routers/users.py
find it?
a
Weird, it does not. Even with
--transitive
set (although this should be unnecessary).
But that's a good clue, nonetheless
g
It might be necessary to up the log level with
-ldebug
and see what shows up. I'm not too familiar with the dependency inference, especially if it doesn't print any warnings.
a
Would you look at that.
Copy code
Pants cannot infer owners for the following imports in the target /path/to/users.py

* thefuzz.fuzz (line: 8)
I suspect this is related to the
module_mapping
somehow?
g
That seems likely, though if your dependency is
thefuzz>=1.2.3
f.ex. that should be fine.
a
Yes.. However, if I update the module mapping from:
Copy code
"thefuzz": ["fuzz"]
To:
Copy code
"thefuzz": ["fuzz", "thefuzz.fuzz"]
It is suddenly picked up correctly.
g
Oh! So the first one says that "the package called thefuzz is imported as fuzz", which isn't what you want.
The cases where you need module mappings is only when the provided import is different from what you depend on. These are the ones we have, for example:
Copy code
MODULE_MAPPINGS = {
    "grpcio-reflection": ["grpc_reflection"],
    "grpcio-health-checking": ["grpc_health"],
    "google-cloud-storage": ["google.cloud"],
    "google-auth": ["google.auth", "google.oauth2"],
    "pyyaml": ["yaml"],
    "emote-rl": ["emote"],
    "pillow": ["PIL"],
}
The right-hand side is what things can be imported by having the left hand side as a dependency. In the case of "thefuzz" it always gets imported as "thefuzz" so no work needed. 🙂
a
Of course, my bad! That fixes the issue, thanks 🙂
Alright, all of the unit tests are now running! Thank you very much for all your help. I do have one (hopefully) final question. I see that there is a
pants export
command that can create a virtual environment. However, it doesn't seem to me that this environment contains the site-package dependencies (from poetry) as well. It this possible to achieve, I don't see any options I could pass in to export that would allow for this. My use case is that I want to be able to introspect the code and jump to the source directly in my IDE.
g
I'd suggest following the guide here for setting up a .env file instead: https://www.pantsbuild.org/docs/setting-up-an-ide#first-party-sources
I'm not quite sure what you mean by "site-package dependencies" though - the export should include your dependencies; but not your first-party code. Pants knows about nothing else.
a
By site-package dependencies i mean my third-party dependencies, yes. E.g. if pyproject.toml includes
httpx
as a dependency, I would expect this to be present in the
dist/
folder somewhere.
g
It should be, hmm. You might have multiple exports depending on setup, but I guess you've checked all if you have many.
a
I tried to use the command as it is listed here for exporting third-party dependencies: https://www.pantsbuild.org/docs/setting-up-an-ide#python-third-party-dependencies-and-tools
Copy code
pants export --py-resolve-format=symlinked_immutable_virtualenv --resolve=python-default
16:53:01.75 [ERROR] 1 Exception encountered:

Engine traceback:
  in `export` goal

IntrinsicError: Unmatched glob from the resolve `python-default`: "3rdparty/python/default.lock"
g
Hmm, have you enabled resolves?
[python] enable-resolves = true
in pants.toml. Then
pants generate-lockfiles ::
. Then you should get further. I don't think you can get a working export without resolves enabled.
a
You're right. I had not enabled resolves. After doing so it does try to generate a lockfile based on the
pyproject.toml
file. However, it errors out on all internal packages (published to our own internal package manager). The error is:
Copy code
ERROR: Could not find a version that satisfies the requirement your-package<0.3.0,>=0.2.2
ERROR: No matching distribution found for your-package<0.3.0,>=0.2.2
This was installing fine when it created the PEX file to run tests before we enabled resolves. It also installs just fine using plain poetry and the same index-url. Using
-ldebug
I see that the lock action runs with
--extra-index-url
set correctly.
If I remove this dependency it errors out on the next internal package. I have verified in the package manager that the distribution exists, both
.tar.gz
and
.whl
.
g
Is it compatible with the Python interpreter constraints you've set? Have you set any? Otherwise that sounds like a good start.
Copy code
[python]
interpreter_constraints = [">=3.9,<3.10"]
and
Copy code
[python.resolves_to_interpreter_constraints]
the-resolve = [">=3.9,<3.10"]
might both be necessary
a
Sorry, I forgot to update this thread. That was the issue indeed. I had set the interpreter to a strict
==3.11.*
.
I was finally able to get lockfiles exported after setting
"REQUESTS_CA_BUNDLE={chroot}/ca-certificates.crt"
in
env_vars.add
in addition to this. I do feel like the documentation could use some love, how long have you been using Pants?
g
I think the docs definitely need some love; yeah. I think if you follow them to the letter you'll get to a working setup, but it's spread across many pages and topics. There's a few GH issues around this too; f.ex. https://github.com/pantsbuild/pants/issues/15274. If you want to summarize your issues there or in a new issue I think that'd be very helpful - and of course, you're also welcome to contribute to the docs directly, but having an issue to relate to makes the PR easier 🙂 The CA bundle sounds interesting, that sounds like you might custom certs for your package host? Does pip just use that automatically? I started using Pants last year for my own needs, then adopted it for work early this year. In October I'll hit a year of usage. 😛
a
Yeah. I don't think we'd be able to get a setup working without you to be honest. Correct, we have certificates generated by our own custom CA. Pip needs to be pointed at a CA-bundle that includes the root CA used to sign these in order to work.
g
a
Yeah, but that in itself isn't enough. We also needed to set the REQUESTS_CA_BUNDLE, since pip relies heavily on
requests
internally.
g
So the setting alone didn't solve it? Just want to make sure you tried it; as that seems like a clear bug. Pex has
--cert
and
--client-cert
so we should probably forward the setting in that case, if we don't already. In which case that'd probably be a Pex bug...
a
The setting alone does not solve it. I guess it's possible that you would want to set
ca_certs_path
and not want pip to use this, but I honestly don't know.
g
Hmm, right. I'll make a note and dig in to what currently happens; as it seems like forwarding this setting to the pex command line should make the rest work without any env vars.
a
From the docs:
By default, Pants will respect and pass through the
SSL_CERT_DIR
and
SSL_CERT_FILE
environment variables.
The underlying issue here is that
requests
doesn't respect the
SSL_CERT_FILE
variable, and relies on the (much older)
REQUESTS_CA_BUNDLE
. Since we pass along
SSL_CERT_FILE
it should also be safe to pass along
REQUESTS_CA_BUNDLE
. AFAIK they share the same purpose, but some applications support both or only one or the other.
g
Yeah; but wouldn't it make more sense to respect the
ca_certs_path
? I've never worked in the kind of environment where this ends up being necessary, so maybe I'm seeing the wrong problems.
a
It makes sense if the bundle in
ca_certs_path
contains all certificates installed on the system. However, it's entirely possible to point it to a file containing a single certificate as well if you don't have access to modify the internal bundle. A change that automatically propagates
ca_certs_path
and sets the necessary environment variables might cause unintended issues in such a configuration.
g
👍 Makes sense, yeah.
a
On an entirely different note. Have you found a way to stream the output from pants directly? I'm trying to run my integration tests and the pytest command hangs forever. No output is provided, even with
pants test -ldebug --output=all -- -s
g
It's not something I've tried to do for tests; but Pants generally is quite sparse with streaming output since you might have a dozen things running in parallel.