TL;DR: Too many `BUILD` files. Hi, I’m trying to ...
# general
h
TL;DR: Too many
BUILD
files. Hi, I’m trying to setup Pantsbuild (2.13) for the first time in an existing Python monorepo. At the root of the repo, I’ve added an empty
BUILD_ROOT
file and a
pants.toml
file with:
Copy code
[GLOBAL]
backend_packages = ["pants.backend.python"]

[source]
root_patterns = ["/X_*/"] # "X" is the name of the company and "X_" is the prefix of each project
There are about 50 projects in the repo. I’ve run
pants tailor ::
and then
git status | wc -l
and got 644. Do I really need to commit 640
BUILD
files? Is there any way around it? (maybe creating one per project?) Thanks in advance 🙂
🧵
h
Hi Tal! As you’ve noticed, tailor creates a BUILD file in each directory that has code, as that has historically proven to be useful. But now that we have features like target generation and
__defaults__
(see https://www.pantsbuild.org/v2.14/docs/targets) it’s less necessary TBH.
You could also have one BUILD file per project, so 50 total, that could absolutely make sense for your layout.
However then the default
sources=
for
python_sources
wouldn’t suffice, and you’d need to be explicit:
sources=["**/*.py", "!**/*_test.py"]
or similar.
You can of course simplify that with a macro.
(the default sources= doesn’t recurse into subdirs)
But you’d have to either do this manually or modify tailor
Or, now that I think of it, tell tailor to use your macro instead of the regular `python_sources`/`python_tests`, let it do its thing, and then only check in the 50 you care about.
c
It does mean that dependencies are less fine grained. If you have an asset (for example, some json file with some data loaded at runtime), you'd create an asset or file target and add a dependency of the python_sources to that target. But now the dependency is attached to all source files: if you change that asset, all the sources are marked as changed, so all the tests need to run. Which may or may not be a problem.
h
Not necessarily - definitely not the case for inferred dependencies, which are always at the file level regardless of target definitions. And you can write explicit dependencies at the file level too, although that gets laborious if you have many files.
😲 1
And note that you can manually write fine-grained targets in that project-level BUILD file
E.g.,
Copy code
resources(name="configs", sources="path/to/configs/**/*.json")
And then depend on that, or you could add an ad-hoc BUILD file in
path/to/configs
if you wanted the target to be more local to the files.
So you have all the flexibility, it’s really down to how you like to organize your BUILD files
c
you’d create an asset or file target and add a dependency of the python_sources to that target.
Copy code
python_sources(sources=["proj/**/*.py"], overrides={("proj/a/src1.py", "proj/b/src33.py"): {"dependencies":[":resource-tgt"]}})
lets you add the dependency to only a select set of source files.
h
Wow lots to think about, thanks! I thought I got logs of them because I’ve misconfigured something. Now I’m wondering whether to keep it as lots or files or not. Will give it a go. Thanks again!
c
Yea, the downside of a powerful tool with many sharp edges to cut your problem in any of a number of ways.
Which is also it’s strength, but it comes with a bit of a learning curve..
w
i will say though: you are definitely most likely to have a good time IMO if you stick to the happy path of BUILD per directory: it involves no custom code, and in the end can actually result in less boilerplate overall (because overrides in parent directories are necessarily pretty verbose)
1
h
So I actually opted to start with the 600+
BUILD
files to first understand how things go, but I’m somewhat dumbfounded with an error that repeats 100s or 1000s of times when I’m running
pants lint ::
(after the initial
pants tailor ::
. The message is of the form:
Copy code
12:36:15.30 [WARN] The target proj_1/test/my_dir_1/my_dir_2/test_my_file.py:tests imports `proj_1.my_dir_1.my_dir_2.my_file.MyClass`, but Pants cannot safely infer a dependency because more than one target owns this module, so it is ambiguous which to use: ['proj_2:reqs0#proj_1', 'proj_3:reqs0#proj_1', 'proj_4:reqs0#proj_1', 'proj_5:reqs0#proj_1', 'proj_6:reqs0#proj_1', 'proj_7:reqs1#proj_1', 'proj_1/proj_1/my_dir_1/my_dir_2/my_file.py'].
Please explicitly include the dependency you want in the `dependencies` field of proj_1/test/my_dir_1/my_dir_2/test_my_file.py:tests, or ignore the ones you do not want by prefixing with `!` or `!!` so that one or no targets are left.
I’m not sure why a target owns a module, or what is a target in the first place. I’m pretty sure I have some config issues, but didn’t found them yet.
w
does the error contain a link to a page with more information?
at least in this case, the relevant explanation is here: https://www.pantsbuild.org/docs/python-third-party-dependencies#warning-multiple-versions-of-the-same-dependency … it looks like you have lots of redundant (in terms of the requirements they contain) requirements.txt files.
This ambiguity is often a problem when you have 2+
requirements.txt
or
pyproject.toml
files in your project, such as
project1/requirements.txt
and
project2/requirements.txt
both specifying
django
. You may want to set up each `poetry_requirements`/`python_requirements` target generator to use a distinct resolve so that there is no overlap. Alternatively, if the versions are the same, you may want to consolidate the requirements into a common file.
one option in this case would be to manually create a merged requirements.txt, and disable tailoring (i.e. “hide”) all of the other files
h
Yes, well, I’ve read through that segment multiple times, but I still don’t understand the meaning. I have several projects, and they depend on each other and there is no loop between them. And yes, each project as 2 or 3
*-requirements.txt
files. I don’t understand what part of it is an issue though, since it looks like the
tailor
command (surprisingly) did pick that up without an hints from me.
To clarify, each project has: 1.
test-external-requirements.txt
for test-only dependencies (like
pytest
) 2.
prod-external-requirements.txt
for dependencies that are used outside of tests and are external to the mono-repo (like
boto3
) 3.
prod-internal-requirements.txt
for dependencies that are used outside of tests and are internal to the mono-repo (like
proj_1
and
proj_2
)
w
the issue is that the scope of dependency inference in pants is “your entire monorepo”, rather than only the nearby project scope: that means that if a bunch of requirements.txt declare the same dependency, then they are ambiguous across the entire repository.
if you’re already using multiple requirements files, then one approach would be to extract a set of
common-requirements.txt
essentially, which are used in more than one project, and then only declare those in one place
that would remove the ambiguity, because there would be one declaration of the dep
h
So IIUC, do I need pants to ignore the
prod-internal-requirements.txt
files since they only include packages that are in the repo?
w
@high-magician-46188: partially, yes. the ambiguity warnings might be noisy, but they do fully describe the problem in this case. this list of ambiguous sources:
Copy code
'proj_2:reqs0#proj_1', 'proj_3:reqs0#proj_1', 'proj_4:reqs0#proj_1', 'proj_5:reqs0#proj_1', 'proj_6:reqs0#proj_1', 'proj_7:reqs1#proj_1', 'proj_1/proj_1/my_dir_1/my_dir_2/my_file.py
…means that 6 different projects have a requirement in a requirements.txt file which is ambiguous with a 7th source which is a firstparty source file
So IIUC, do I need pants to ignore the
prod-internal-requirements.txt
files since they only include packages that are in the repo?
so yes: you should skip tailoring for a file that declares the firstparty code as thirdparty code after that take a new look at the warnings and see what you see.
(and then delete the targets that tailor previously created, which might be easiest accomplished by just deleting all BUILD files and re-running it)
h
One piece of maybe-missing context here: having a requirements.txt (or more than one) per-project in a monorepo is de-facto necessary when you don’t have monorepo tooling like Pants, because they each project acts as its own “repo” within the repo, from a tooling perspective. BUT, with Pants you don’t need those manually-specified per-project requirements. The idiomatic usage is to have a single global set of requirements, generate a lockfile from it, and let Pants select the subset that is actually needed in any given situation. This way you have one consistent set of requirements across the repo, so you don’t get version conflicts between different parts of your codebase, which makes it easier to share and reuse code.
You can have multiple global lockfiles, for the case when different parts of your codebase genuinely need different, conflicting, versions of some requirements. But that number can still be a lot smaller than 50…
If you do nonetheless want independent requirements.txt for each project, and for Pants to use them without ambiguity, then you’ll need to customize dependency inference to know about selecting the requirement that is “closest” (in a filesystem sense) to the importing file.
I do think that might be a desirable feature in general, for such cases
h
TYVM for both of you. I’ve exclude the
prod-internal-requirements.txt
files and re-running, it takes some time. Actually, in the current implementation, there is a global
constraints.txt
to specify the conditions and each project declares its own direct dependencies in the various
*-requirement.txt
files, without the constraints. That is, The ext-req file of
proj_1
will have:
Copy code
proj_2
proj_3
The internal req file will have:
Copy code
boto3
requests
And the global constraints.txt will have:
Copy code
boto3>1.1.1
requests<=1.2.3
h
OK, so basically Pants can do all that for you automatically. Instead of a global
constraints.txt
you have a global
requirements.txt
with relevant version bounds as needed. Then use Pants to generate a lockfile from that, which locks down specific versions (with sha256s) of your entire transitive requirement tree. That lockfile is now the global source of truth for which versions are actually used. Then Pants automatically brings in just the subset that is needed in any situation (running a test, building a pex, building a distribution and so on).
h
👍 Since I’m doing a gradual migration, is there a way to tell pants to use the constraints.txt in the same way
pip
would handle it with the
-c
flag?
c
If you’ve not enabled the Pants resolves feature, you can provide a constraints file using: https://www.pantsbuild.org/docs/reference-python#requirement_constraints In Pants 2.14, you can also provide constraints files for resolves: https://www.pantsbuild.org/v2.14/docs/reference-python#resolves_to_constraints_file
❤️ 1
h
OK, so I’ve: 1. Added a
pants-requirements.txt
that combines all of the
*requirements*.txt
files. 2. Put all of the old
*requirements*.txt
files under
pants_ignore.add
. 3. Upgraded to version
2.14.rc2
. 4. Set
__default__ = "constraints.txt"
(under
[python.resolves_to_constraints_file]
). I think that I’m having much less warnings. I’m now trying to tackle the following case: There is a company-internal package, lets call it “utils”. To install it, we do
company-utils
but to import we do
from utils import whatever
. I think that Pants either doesn’t know how to automatically handle these cases, or it is somehow related to the resolution part, which it is also having a hard time with at the moment… Any suggestions?
So it looks like I need to define
module_mapping = {"company-utils": ["utils"]}
under
python_requirements
. Can it be done in
pants.toml
for it to be globally available? Also, if it is indeed to solution, it could be good to add it to https://www.pantsbuild.org/v2.14/docs/troubleshooting#import-errors-and-missing-dependencies.
c
Thanks for the suggestion of adding to the trouble shooting section. The information is available here: https://www.pantsbuild.org/v2.14/docs/python-third-party-dependencies#use-modules-and-module_mapping-when-the-module-name-is-not-standard but I can see how it would help to have it in more places to make it easier to find.
Regarding module_mapping, normally you don’t have many
python_requirements
targets, so there will not be much duplication. But in case you do, you can do this at a top-level BUILD file (or where it suits your project best, as it applies to all BUILD files in its subtree):
Copy code
# BUILD
__defaults__({
  python_requirements: {
    "module_mapping": {"company-utils":["utils"]}
  }
})
h
Thanks, I will try it. But can’t it be defined in the
pants.toml
file?
h
No, as it’s a property of the requirement, which already lives in the BUILD file.
But I don’t see why there would be duplication? You only need to define this on the single
python_requirements
target that owns your
pants-requirements.txt
, AFAICT?
h
I’m confused again. Shouldn’t there be only a single req file in the whole repo? Also, after running
pants tailor ::
,
grep python_requirements -r .
returns no result. Should it?
h
It should recognize anything that looks like
*requirements*.txt
as a requirements file, yes
Weird
w
did you adjust any settings for tailor? when you started the other day, you had 6+ python_requirements targets, and i suggested excluding some of them.
h
My comment about duplication was that since you have a single pants-requirements.txt, its accompanying
python_requirements
(once we create one) is the only place you would need to add the module mapping. So there is no duplication necessary.