Hi y’all! My company (medium/large SasS, a couple ...
# general
i
Hi y’all! My company (medium/large SasS, a couple hundred engineers total) has been using Pants in a new Python monorepo for several months now. We currently have ~2 dozen projects and hope to grow to ~100 projects in the monorepo. We have been using a single dependency resolve since starting, with fairly good results (with the option for a second if there is ever a dependency conflict). However, recently, there has been concerns raised around the single resolve, mostly by engineers who are more familiar and comfortable with each project maintaining its own, independent dependencies. So I’m curious what have other’s experience been like with a single resolve (or, a few resolves to deal with conflicts), especially in a large codebase? I’d really like to be able to point to some examples of how it has worked out elsewhere to help calm people down.
b
What are their specific concerns? "I'm not comfortable with this" is a valid thing to air, but without specific, hopefully technical, points it likely doesn't bear much fruit. It might just be a framing issue. Each project is managing its set of dependencies. It's just that the union of those sets make up the resolve 🙂
l
+1 Maybe if you could demonstrate to them how the executables in each project only pull in the required dependencies it would help folks be calmer. You could do that by showing the contents of some of the built artifacts. But otoh, Greg, I agree that as a community here in pants we are in a unique position to assemble a playbook and collect stories around the socio-technical aspects of the mono-repo style which is not without its tradeoffs. For example, suppose some other team needs to bump the version of a dependency, with a single resolve - does that mean we are forced to take the bumped dependency and on which timeline, and maybe it breaks our service but not theirs, how will the testing go, who is responsible for it? You can see Nate and Tom's responses to a very similar question relating to sharing internal libraries via code rather than via versioned artifacts. • https://pantsbuild.slack.com/archives/C046T6T9U/p1696782701162249?thread_ts=1696769859.850689&cid=C046T6T9Uhttps://pantsbuild.slack.com/archives/C046T6T9U/p1696789957433149?thread_ts=1696769859.850689&cid=C046T6T9U
The pants/monorepo method is to lean in the direction of "Whatever comes out of these gates, we've got a better chance of survival if we work together" (if you tolerate the Gladiator quote). Like if dependencies are hard, let's stick together about them.
❤️ 3
i
For example, suppose some other team needs to bump the version of a dependency, with a single resolve - does that mean we are forced to take the bumped dependency and on which timeline, and maybe it breaks our service but not theirs, how will the testing go, who is responsible for it?
Gautham, this basically sums up their concerns. It’s less about what the final artifact contains, more about limiting the universe of versions to a single version and then requiring all users to upgrade at essentially the same time. Personally I prefer the Gladiator approach 😄
l
I second your request - I would love to hear more stories about how this works in practice.
John's recommendation and the details that Tom added in the other thread are worth checking out.
❤️ 1
h
Is each of these "projects" an independently deployable binary? Do they share any code?
If they are literally 100 orthogonal binaries then maybe 100 resolves is fine. But once they start sharing code... Oy
That's where the rigor of a single resolve saves you
from JAR hell
or its Python equivalent
i
Is each of these “projects” an independently deployable binary? Do they share any code?
one of our objectives with the monorepo is to promote code reuse/sharing, so yes we do have shared code. This is also proving to be a sticking point, though (one that I think is purely because they prefer polyrepo and haven’t worked in a monorepo) - they think the shared code should be packaged. Having done that, I vastly prefer the way Pants does it…. but ‘proving’ that it works at scale / doesn’t ‘cause problems’ is becoming a chore (to be clear, it has worked fine so far and I vastly prefer it to managing lots of versions of packages)
h
Oh lord, no, sharing first party code via publishing is an absolute nightmare
Now you have JAR hell in your first party code!! That is a total own goal
Imagine A depending on B and C, and those in turn depend on two different versions of D...
Plus the drag of having to publish every time you update your deps is a big disincentive to code sharing
f
You could make the argument that you get much greater team autonomy that way though, and with some discipline around required versions where library packages are required to work with a range of dependencies the worst bits of dependency hell can be side-stepped. You can argue that the boost from autonomy is worth the friction at the boundaries. A few companies make this an important policy point. I wish we had someone with Amazon experience around, because I know they're big into operating this way and have a lot of process and tooling built around it. And a pretty wild internal build system that expects these kinds of things and apparently has ways of dealing with them
Monorepos generally seek to avoid those problems entirely in favor of a wider cohesiveness. It's not without drawbacks, but it avoids the kinda combinatorial explosion of compatibility you sometimes need to deal with when you have multiple versions of everything floating around. But "team #1 can't upgrade dependency A because team #2 uses dependency B which needs A at that version" does become a thing. Pants and other tools like it tend to have some ways of dealing with this kind of problem, but the pressure is definitely to avoid it. It reduces the "surface area" of problems you have to deal with collectively at the expense of some raw autonomy
h
Different companies have different experiences, but I've seen this autonomy devolve into chaos, with teams being more concerned about protecting the boundaries of their repos than about breaking other teams... So instead of unity of purpose you get a loose affiliation of warring tribes...
Admittedly that was an extreme example
f
Yeah managing this stuff is hard. But I think with monorepos it's also easy to wind up with a big noodle soup of dependencies with few real contract boundaries and a system only a few people really understand at any level. Managing large systems at scale is difficult and none of these approaches are silver bullets. I used Amazon as an example of a company that was successful with this but I mean you can see some of the downsides of it if you use AWS: there's little to no API consistency, there are tons of competing products with limited differentiation, it can be hard to have a clear idea of "what product should I use for the thing I want to do?" That said, I'd probably use AWS if I started my own company tomorrow. And a lot of people would do so as well. So it's clearly possible to have a functioning model that values autonomy. You just need the right processes and management around it. My point is to try to understand different people's views. Engineers wanting autonomy or business leaders wanting that for their teams are not wrong to feel that. The argument should be made about the trade-offs between the approaches and the facts on the ground because that draws the focus away from people's feelings on the matter. "Deliver packaged libs" as a requirement is probably a lesson someone learned out of pain of doing it some other way. But even AWS build tools (which prefer using teams' packages) have faculties for cloning source, patching and building. So you do end up with even more versions when you do things like that. And it puts a burden on the library team to vend their versions to other teams that don't have much incentive to update especially when it's risky. Library teams end up supporting a huge range of versions because of this which is a big source of wasted work. You take hard problems like factoring and versioning and put an organizational structure around it. With a shared-source model, it's a library team's burden to ensure that their library doesn't break downstream consumers. In practice this ends up with feature flags or abstract branches or other things supporting multiple code paths in a single code version to support the different use cases. When managed well, I think this becomes a lot cleaner, but it can become its own nightmare of dead code. Tooling and discipline are key to making this work, but the monorepo approach at least puts all the code that needs to be reviewed in one place.
c
FWIW We are very early in the whole "actually using a monorepo" thing and are sub 100 engineers at the moment but... for the original question of managing 3rdparty dependencies, we are toying with ideas around establishing ownership of pinned dependencies like 1stparty code. That is "anyone" can add a "foo>2" dependency in the primary resolve, but if you want to pin
foo==1.2.3
then a specific team needs to own that pin and be responsible for updating it like they would if they maintained an internal library.
👍 1
f
Pants actually supports multiple versions of third-party dependencies, via resolves, but it does increase the complexity of managing it quite a bit, since you have to "parameterize" targets that get used by code/targets from different resolves. But you can do exactly what you described. There can be a "default resolve" where you define a default pinned version for everything, but other teams can create other resolves if they need.
i
Lot of great stuff in this thread, thanks everyone!