I'm sorry if my question seems foolish. I have bee...
# general
b
I'm sorry if my question seems foolish. I have been using namespace packages (https://packaging.python.org/en/latest/guides/packaging-namespace-packages/) in my Python repository to ensure a consistent namespace while allowing me to choose from several smaller packages when building Docker images, thus avoiding the inclusion of a large and unnecessary set of libraries. I use Poetry to manage the dependencies among my libraries and third-party packages. The folder structure is organized as follows:
Copy code
.
├── libs
│   ├── lib1
│   │   ├── company_name
│   │   │   └── lib1
│   │   │       ├── __init__.py
│   │   │       └── something.py
│   │   │   
│   │   └── tests
│   │       └── test1.py
│   └── lib2
│       ├── company_name
│       │   └── lib2
│       │       ├── __init__.py
│       │       └── something.py
│       └── tests
│           └── test2.py
│           
└── executable_apps
    ├── app1
    │   └── app1.py
    └── app2
        └── app2.py
If I was to adopt pants, would there be any need for using namespace packages? Would Pants be able to handle only pulling in the necessary libraries or files?
e
Pants can select just the files needed regardless of namespace packages or repo layout. You could have a 10000 file repo with 100 "binaries" and for each individual binary Pants will only package the exact 1st party files that comprise that binary and the exact (transitive) 3rd party dependencies needed by those files.
The relevant features here are dependency inference and file-level dependency granularity: https://blog.pantsbuild.org/why-dependency-inference/
b
That is what I thought. I was just not sure if it would pick just the right files, or just the right libraries (which is basically what I am doing with poetry).
e
Yeah, your binary is a single entry point and import statements are parsed and recurse.
b
Thanks for the help; it is invaluable to get over the relatively steep learning curve. I will probably then remove the namespace package structure (just use company_name/lib1, company_name/lib2...) and let pants figure out the files needed for my binaries. But in that case, are there any best practices when it comes to test locations (specifically for pants)?
e
Not really. Whatever you like that works well with pytest, Pants aside. Pants is pretty un-opinionated about repo layout.
As an aside, the tree you show has 0 namespace packages in the Python sense. You have 4 unique packages: lib1, lib2, app1 & app2: https://packaging.python.org/en/latest/guides/packaging-namespace-packages/
Oh,
company_name
?
You used mixed PEP-420 and
__init__.py
it seems.
b
Yes, in the binaries we are using company_name.lib1 and company_name.lib2; but I might have to look more carefully at the use of
__init__.py
files.
It seems be Ok like this. I should use
__init__.py
files in subdirectories of namespace packages.
e
Yeah, Pants doesn't care - it just makes guessing repo layout doing remote support hard. There is no way to look at a tree and guess where its sys.path roots should be in a PEP-420 world. An empty directory could be just a container for a project or it could be a Python package.
In other words you know company_name is a package and not a container dir, but it could be that libs/lib1 is too with the info in the tree ascii art. Just no way to know for sure.
b
I see. This might be the reason I am having issues running my tests through pants 😕
e
You need to tell it using
[source]
root patterns. It can't guess either!
If
./pants roots
does not match what you know they should be, you'll need to configure.
Python is a pain here. This would not be a problem with Rust, for example, where it is totally unambiguous.
Long ago Pants forced you to explicitly define source roots. We have since tried to guess with common patterns as the default. I'm not so sure guess is great. If you don't fit the default you end up very confused. If you were forced to configure, it might make more sense.
b
What would be a good structure in this case, for making it easy for pants to guess the roots? Also, in the /libs/lib1 folder I have a pyproject.toml file which defines (with poetry) dev dependencies needed for the tests (hypothesis for example). If the source roots were correct, would pants load them up?
e
As to the latter - no, you need a BUILD target to tell Pants to load up dependencies. In this case
poetry_requirements()
in a BUILD sibling of the
pyproject.toml
would do it: https://www.pantsbuild.org/docs/reference-poetry_requirements But one thing you need to shake is the impression that placement implies anything. It does not. If you have 13 pyproject.toml scattered about, each with a BUILD target, then all the deps from all 13 pyproject.toml will be visible to all source files anywhere in the repo. Since Pants parses imports to determine dependencies, this means everything gets exactly what it needs and no more.
👍 1
What would be a good structure in this case, for making it easy for pants to guess the roots?
Unless you're adding new top-level roots at a high clip, structure just doesn't matter. If folks add 1 top-level ~project directory a week, that means adding 1 line to
root_pattern
a week. That is just not too bad! You could change your layout to suit Pants defaults, but it doesn't buy you much.
Have you used
./pants tailor
? After configuring source roots you should. It auto-generates BUILD targets including the
poetry_requirements
ones. https://www.pantsbuild.org/docs/initial-configuration
FWIW I think your roots are as follows:
Copy code
[source]
roote_patterns = [
    "/libs/lib1",
    "/libs/lib2",
]
That said, I have no idea what to do with
executable_apps
. If they are as named, then no other code imports from them and source roots do not matter. If that is not true though, you need to configure for those too.
Another FWIW: none of this is foolish. This is exactly the same set of things that trip up most people when adopting Pants for a Python project. We've tried to make on-boarding easier and although we have, it's still too confusing on average. So you're not alone.
🎉 1
b
Thank you so much. I think this might be coming together. I am trying to get some of the tests running (I already get most of the linters and formatters working just fine). It constantly complaints about missing source roots, so I am just adding those for now; and hopefully get them working. Once I have resolved the missing source roots, I plan to reorganize my repository. Specifically, I plan to remove the unnecessary nesting of my libraries. Instead of having the structure
/libs/lib1/company_name/lib1
, I will use the structure
/libs/company_name/lib1
. The current structure was created to make my packages smaller for building Docker images, but it seems like Pants handles that issue for me.
e
Sounds about right to me!
b
Well. Pants is no longer complaining about missing source roots when I run
Copy code
> pants test libs/lib1::
but complains about missing dependencies, numpy used in lib1/something.py and hypothesis used in lib1/tests/test1. Both libraries are specified in /libs/lib1/pyproject.toml (numpy under dependencies and hypothesis under dev-dependencies). The pyproject.toml has a sibling BUILD file that looks like this
Copy code
poetry_requirements(
    name="poetry",
)
Any ideas?
e
Is numpy used via import statement or via something more dynamic like
__import("...")__
?
I assume you ran
./pants tailor
and it plopped down some BUILD files with content. Could you share the contents of those if so? Can you point me at a repo or is this all private?
b
Yes, I used ./pants tailor for this. Unfortunately, this is all private. I shared above the content of the /libs/lib1/BUILD file. Other relevant files might be /libs/lib1/company_name/lib1/BUILD
Copy code
python_sources()
and /libs/lib1/tests/BUILD/
Copy code
python_tests()
e
What does
./pants dependencies --transitive libs/libs1/tests/test1.py
show?
b
It shows the list of files it depends on under /libs/lib1/company_name/lib1 but one thing looks weird.
Copy code
libs/lib2/pyproject.toml:poetry
It seems to depend on the wrong pyproject.toml file, and not on the correct one
libs/lib1/pyproject.toml:poetry
e
Ok, I'm going to go slow here since I'm blind; so forgive me for that. Can you confirm at least 1 of the files in the transitive dependency list has an
import numpy
or
from numpy import ...
in it?
And 2 - more: + What Pants version? + Are you using lockfiles / resolves? https://www.pantsbuild.org/docs/python-third-party-dependencies#user-lockfiles
b
Nothing to forgive, and again, thank you for your help. I am using pants version 2.15.0.rc5. No, I am not using any lockfiles or resolves.
Yes, there is a "import numpy" line in a few of the transitive dependency files.
e
Ok, and does libs/lib1/pyproject.toml declare a numpy dependency?
b
Yes, it does. But the wrong libs//lib2/pyproject.toml file does not.
e
Ok, well there is no right or wrong - Pants should be looking at all of them if they have an associated BUILD target.
Ok ... just a sec.
Is this a normal dependency? Pants doesn't handle Poetry groups. Just normal + dev.
b
This is normal dependency, but I specify a particular version of numpy (^1.22); but there might be conflicting numpy versions used in different pyproject.toml files.
e
Ok. And what does this say?:
Copy code
./pants list libs/lib1:
b
It includes both dependencies:
Copy code
libs/lib1:poetry#hypothesis
libs/lib1:poetry#numpy
e
Ok. All looks good.
Now, the conflicting requirements is interesting.
Can you temporarily remove the BUILD targets for the pyproject.toml with numpy deps?
Pants handles multiple conflicting resolves, but it would be good to get the basics working before addressing multiple resolves.
The aside is that if dep inference is working, the
./pants dependencies --transitive ...
should have included
libs/lib1:poetry#numpy
in the list to satisfy the numpy import.
b
Yes. I can remove the BUILD target, but what should that do?
e
If there is an issue due to numpy ambiguity it removes it temporarily.
Because then Pants can only "see" the 1 pyproject.toml
This is purely a debugging step.
b
It did not change anything. I was just trying to understand how you are approaching this 🙂
For some reason, if I run the
pants dependencies --transitive /libs/lib1/company_name/lib1
I get dependence on the wrong "pyproject.toml" file from lib2 (although lib1 does not depend on lib2 at all). Same happens if I run
pants dependencies --transitive /libs/lib1/tests
e
Well, you've removed the lib2 BUILD hopefully so that we can ignore that for now?
It should no longer happen with the target removed.
Oh! Maybe you need to kill pantsd? Or run with
--no-pantsd
- again as a debugging sanity check.
Again - almost certainly not needed - but this is a sanity preservation step. It forces Pants to use no in-memory cached data.
b
I misunderstood you a bit when you told me to remove the BUILD file. I did it again, and now I get a log warning message from pants concerning numpy. Something along the lines of "The target libs/lib1/company_name/lib1/somefile.py imports
numpy
, but Pants cannot safely infer a dependency because more than one target owns this module, so it is ambiguous which to use ..." And then a list of other libraries which use numpy. This might be useful; but I will have to take a better look at this tomorrow. It is getting late her.
e
Aha!
Yes, this was the ambiguity I was trying to get you to remove.
So, you will eventually need multiple resolves, but I was trying to avoid cognitive overload for both of us and start simple.
How many conflicts like this do you have?
And are they artificial or real?
b
Close to 20 🤯
e
And are they real, or project devs being capricious? COuld you force any of these 20 conflicts to merge?
b
I am using numpy all over the place; and I probably can merge most of them. Most of them are probably not real.
e
Ok, great.
b
I will have to take a look at this again tomorrow. Thank you so much for your help. I now have something to actually look at 🙂 I will try to check-in again tomorrow and let you know how it goes.
e
Eventually you'll be left with - say 3 conflicts and will need 3 resolves, read here: https://www.pantsbuild.org/docs/python-third-party-dependencies#multiple-lockfiles And definitely start using lockfiles!
Sounds good.
a
not sure if this is helpful - but responding to original question - im using a bunch of published (pypi) namespaced packages with pants to set it up i had to figure out a few things in terms of my pants understanding, but also in terms of how i wanted the packages to interact with each other - ie using repo versions <> published versions in the end i added a couple of pants plugins to simplify the setup, and as is often the case, there are trade offs with the decisions i made, but it works pretty well for our needs if sometimes imperfectly/quirkily the repo is https://github.com/envoyproxy/pytooling
👍 1