Curious about some repo management philosophy: Has...
# general
h
Curious about some repo management philosophy: Has anyone developed or come across best practices for monorepo management as it relates to the dependency graphs? I can think of two extremes (to simplify, let's just talk about functions): 1. A module contains a huge number of functions 2. Functions are 1:1 with modules 1 is mostly a benefit for the developer. It's easy to remember where to import what they need and use new features as they become available. 2 is beneficial for build systems since you'd guarantee that you're only dependent on the functions you're using. You'd see the benefits of test caching, packaging sizing, etc. This will be very subjective, but where should the line be drawn between one and two? Separately, is their tooling that can give a sense of the ratio of things included vs. things used. I think this would be a mix of static analysis and build tooling working together. It's just a different flavor of code coverage analysis I think.
f
It's worth questioning which developers #1 is really beneficial for. It makes it easy to import, sure, but it can make it really hard to edit, modify or test some modules. For compiled languages, packages are often a minimal compilation unit as well, so this can make working with them quite difficult.
h
Yup, I agree. In the python ecosystem, I think it's easy to lump everything together since you incur minimal cost when running code successively. It's only when you get to the testing and packaging side of development where 1 becomes a burden.
👍 1
f
Yeah, I'm living this right now. But in my opinion, treating the testing and packaging side of development as something you "only get to" is a huge mistake IMO. Developing anything without testing, packaging, and deployment as first-class design concerns results in unmaintainable messes
👍 1
h
I think one struggle is that it might not be clear how much testing/packaging burden is placed on code structure. That's why I mentioned some type of coverage analysis to give your package a rough score. It's probably difficult for third party things (I doubt many are importing a third party dependency to thoroughly use every aspect of it), but first party analysis could be insightful.
Not all of our developers are as aware of the repo management tooling we have in place. The developer experience is heavily dependent on which parts of the repo you work in. If you work in a very well-maintained portion of our repo, things are generally good. If you work on lesser parts, maybe that's not the case.
f
To wit: we have a number of files that are imported extremely widely and each contain 1000s of constants, type aliases, and enums. I think the idea that "this will make it easy to reach for" has been appealing, but when you start doing dependency analysis or wanting to split up stuff that relies on this, you find yourself in real pit. It's still not clear how we can divide all this stuff up without creating a ton of circular deps. I don't think either of these two mentioned extremes is a great idea, but I understand the impulse to tend towards #1, even though I hate it. I don't know what automated analysis should be run on these things, but do think some common sense reasoning of "do these things lumped together have a similar reason to change?" is a decent razor
If you find good way to clean this stuff up retroactively, please let me know, because it's a challenge I still don't have a good answer for