Hey all, I'm trying to gauge the best way to conte...
# general
i
Hey all, I'm trying to gauge the best way to contend with dependencies in our monorepo. Today we don't have many teams on PantsV2 yet, but we envision a day where we'll have hundreds of projects, and hundreds of teams (who don't communicate much per se) in our monorepo using pantsv2. We're already hitting a wall with 2 teams; migrating our second team onto pantsv2 involved them registering a requirements.txt file with python requirements, and it caused team 1s project to start failing because of dependency collisions. We know we're probably not handling dependencies optimally for pants, but we're unsure exactly what optimally is. Given where things stand with 2.9, it seems like we should be planning on explicitly registering every dependency manually in build files as opposed to registering with python_requirements(). @hundreds-father-404 was showing me some work he was doing in 2.10 to make things a little more manageable, but I don't know if there's anything else we're missing
h
I too dislike the limit of one monolithic requirements file. The only alternative I know of is to pepper
python_requirement
targets throughout your repo and declare them closer to targets (though not so much that it's prohibitive to bump your usages of click from v6 to v7 for example).
However, you lose the ability to constrain with a lockfile in that scenario so I'm hoping the more native lockfile support on the roadmap helps with that.
h
I've also struggled philosophically what is better. While it's nice to have segmented dependencies, it's probably nice to have everyone developing with common things so crossing over between projects is easier.
1
h
Usually the "optimal" thing from a codebase management standpoint is to have a single consistent resolve for the entire repo, and Pants selects from that whatever subset is needed in each given scenario. This gives you maximal flexibility in terms of code integration and re-use, as you don't silo your codebase based on incompatible requirements.
However this isn't always possible, and does have some downsides (e.g., what if this global resolve contains rarely used but very expensive to resolve requirements)
1
So this is where the upcoming multiple lockfile support comes in
But what I would say is - try not to have a requirements.txt for every specific project or binary or deployable artifact. That is no longer necessary, and involves a lot of maintenance burden to keep in sync.
1
Instead, let Pants do the work
But you can have separate requirements.txt for every project if you prefer to work that way
And I think the new multiple lockfile support will let you specify which one applies where
@icy-account-9671 I don't see that you'd have to abandon
python_requirements()
?
i
I think earlier on, I was just moreso making the assumption that pants could cleanly infer dependencies... like once i'd registered them, my python sources would easily figure out what dependency made sense (i.e. if pyyaml is in 3rdparty as well as my local list of dependencies it would use my local projects defined version of that module), but instead pants (maybe smartly so) just tells you i don't know which version you want, so i'm going to not set you up with this dependency at all... so I think I'm learning, we just always have to register the dependencies and explicitly call each one out in python_sources and python_tests
i
I want code portability. We are moving hundreds of repos across our org into one repo. They might be using requirements.txt or poetry or whatever. If these projects eventually split off (has happened already) to another repo, we want them to
mv ./monorepo/my-project ./my-new-repo
. Keeping the requirements.txt in sync w/ Pants is essential. Note: The projects we are moving into our monorepo are years old. Not all are Python. Eventually, we'll likely have similar concerns when Pants supports npm, yarn, etc.
h
Chris, that is how "multiple resolves" works. You can say for example that everything from one
python_requirements()
uses the resolve 'projectA', whereas another one uses 'projectB'. (Resolve is a logical name for a particular lockfile) Then, you say what resolves particular code uses via the
python_sources
and
python_tests
targets, and Pants will only infer deps on requirements coming from that resolve
i
that definitely makes life easier... i imagine tailor won't necessarily be able to populate those sources with the correct resolves though... significantly less boilerplate though indeed
i think that would help for @incalculable-yacht-75851 s point above too
h
i imagine tailor won't necessarily be able to populate those sources with the correct resolves though
For now at least, not yet. I personally really want to see more powerful
tailor
features like being able to set default args. Maybe even plugin hooks that let you insert your own logic The other angle is if we can make it easier to apply metadata to multiple targets: https://github.com/pantsbuild/pants/issues/13767
h
Huge to default args
h
Yeah i was surprised it wasn't voted higher in the 2022 survey! See "More powerful BUILD file management" in https://docs.google.com/forms/d/e/1FAIpQLSd5HBUDNKQCxmPORcdwAOYoU9fHjrmRCKyM1R3JZvtig0Y5HQ/viewanalytics
r
I would just like to add another data point here. When I started working with mono repo, my idea was something as suggested by @incalculable-yacht-75851. The whole reason we switched to mono repo was to avoid infrastructure (CI/CD) work but keeping modularity of different packages so that we can move things easily if needed.
h
One thing we've toyed with is resolving dep inference ambiguity via filesystem closeness. For example, choose the provider of a dep that is in the closest ancestor dir to the target that refers to the dep.
🙏 1