The second issue is conceptual: We currently have ...
# general
m
The second issue is conceptual: We currently have one repository per library and build up pipeline stages by declaring the internal and external dependencies in
requirements.txt
. For some projects or some branches, we also pin internal dependencies. With a Pants-based monorepo, it makes sense to develop and test against the union of internal libraries, i.e., the source code as is. I can see how to cut different libraries from the monorepo as well. But I don't see a good solution to distributing libraries with pinned internal dependencies. Essentially, I'm asking how can I layer a packaged and possibly pinned view on top of the all sources in a big soup view of the monorepo with Pants. Any help in resolving this last remaining issue would be fantastic.
h
I'm not 100% sure I understand this, but have you looked at using generated setup.py files to create dists from your monorepo? There is logic there for overlaying a coarser-grained dist layout on top of the more fine-grained internal library layout, with each library being published by one owning dist. Is that in the neighborhood of what you're trying to achieve?
m
@happy-kitchen-89482, I read that a few weeks ago. The way I understood it is that it covers how to create individual distributions. The problem I'm trying to solve is the graph of distributions. Let's say we have to libraries A and B, both created from the monorepo and A depends on B. How can I pin B at a particular release? Also, lets assume there is another library C that also depends on B. But I'd like to pin C's B to a different version. Also, D may just prefer the latest B and not require pinning. I'm not clear on how to do that, in particular how to capture those extra version constraints.
Hey @happy-kitchen-89482, when you have a chance, I'd still love to figure out how to handle this case. Thank you!
h
Hey sorry, was traveling and this slipped below the fold, thanks for the reminder!
So if you use generated setup.pys, then when you build the wheel for A, it will look at the
version
in `python_distribution_B`'s
provides={}
dict, and make the wheel for A declare that version in its
install_requires
So each
python_distribution
declares its own distribution name and version, and when Pants builds some dist A and notices that its libraries depend on other libraries in some other dist B, it "zooms out" those deps to make A depend on B at the version B currently declares for itself.
This of course does not let you pin A and C to two different versions of B.
That is a slightly unusual case, because you're allowing code in the repo to be mutually incompatible - in your example, D is compatible with the current state of B, but A and C are not, even though all are at the same SHA.
So you're giving up one of the big advantages of a monorepo, namely all your code is mutually compatible at a given SHA. But I totally understand that this is the reality 🙂
So in your situation, A and C need binary dependencies on B, each at some compatible version.
(D can probably still have a source-level dependency, because that's ~the same as depending on "latest")
One way to do this might be for A and C to have handwritten setup.py instead of generated ones. Then you have full control over
install_requires
.
If that works, but ends up being onerous, I think it's reasonable to modify Pants to allow for overriding the computed version of a requirement.
Right now we're very strict - there are keys that we compute for you in a generated setup.py, and if you try to set them manually in
provides={}
we barf: https://github.com/pantsbuild/pants/blob/372fa36efb09d7d477436dbf927edb28537bc2f5/src/python/pants/backend/python/goals/setup_py.py#L221
But I think a change to be more subtle about it makes sense (e.g., merge
install_requires
and allow a user-supplied version of foo to override the computed version)
In your real-world case, what are the numbers involved? I.e., how many A, B, C, D are you dealing with? Is it feasible to start with handwritten setup.py and see how onerous that is? And if it works but is annoying we can discuss a change to allow partial generation of setup.py?
m
Thank you, @happy-kitchen-89482. That was super-helpful. You just helped me realize that I've been making the problem more complex than it need be — in the context of our existing practices: First, even the possibility of a monorepo on the horizon has already started an effort towards eliminating unnecessary internal variability from the code base. There are about 10 or so internal libraries that are re-used across most data processing pipelines. Any pins have rapidly been vanishing. Yay! Second, there is one version of those pipelines that must evolve at a much slower speed. We currently use branches for the purpose. But I just realized that nothing about Pants prevents us from continuing to use a dedicated branch. In that branch, we can do all the pinning we want, but it needn't spill over into the main branch. In short, nothing really changes besides that we don't deal with a repo per library anymore, but slowly try to converge on one repo. Thank you!!!
🙌 2
h
@freezing-photographer-88553 I think this is a similar problem to the one we discussed earlier, and are converging on a similar solution!
I.e., managing "stale" projects in a branch