How does one express "I depend on a package elsewh...
# general
b
How does one express "I depend on a package elsewhere in this repo, and at a particluar version of it" ?
đź‘– 1
h
What do you mean by “version” here? Are you depending on your own code or third party-requirements? (Also reminder to try to use threads, please, to help things stay organized.)
b
this would be an "own code"
e.g. ns/foo depends on ns/core
but I want to say it depends on ns.core==1.2.3
(preferably wihout ns/core/BUILD -> python_distribution() having to say version=1.2.3)
because it might not be at that time...
h
Do you mean that you want to be able to pin to your own first-party code at a particular version? That is, if another team goes and changes that code, then you will still use the old pinned version of it and won’t get their changes until you manually update the version?
b
mmmm, basically
more about order of rollout of dependencies
but the effect is the same
h
Okay, Pants does not support a workflow like that where you can depend on a frozen portion of your first-party code. I don’t believe other monorepo build tools like Bazel or Please do, and would be a little surprised if they do. A major motivation for a monorepo is that many orgs have found it’s difficult to deal with internal versioning, where you have to reason about a change in library A breaking library B which then breaks app C. And those all live in different repos, so you have to somehow keep track of it, and then go chase down the team who broke your app when you try upgrading internal versions. In a monorepo, when A makes a change, you’ll immediately know that C is now broken. And a build tool like Pants or Bazel makes that easier to know when it happens because you can do something like
./pants --changed-since=<branch> --changed-dependees=transitive test
in CI. For example, some orgs set up their CI so that when library A changes, the team owning app C will be notified and will have to approve the change.
b
I'm thinking about where leaf lib is updated, and then users pull (pip install -U) that thing
but they don't know the "core" one it depends on has also been updated to match
h
If you’re instead talking about a workflow where you want to dynamically compute the
version
used in your
python_distribution
, that is a workflow that can be supported in several different ways. For example, one user is using Git to update versions based on what files change. To do this type of thing, you would write a plugin for
setup_py
That is sound to deal with versioning that way because you are not trying to “pin” your own internal code. You’re instead talking about what code gets exported externally. Because that code is getting exported externally, by definition, it’s a “snapshot” of the code at a certain time.
b
hang on,
the problem is that both "foo" and "core" get updated. End user does
pip install -U foo
h
Taking a step back, it sounds like your org is not currently using one monorepo, perhaps? You’re using multiple repos, and exporting wheels for each project that the other projects then install via
pip
b
it is a monorepo BUT
it is just libs
ns/core ns/this ns/that
where this and that depend on core
enduser does
pip install -U foo
but during the build of foo, it wrote what was in the current BUILD version for core
and THAT we need to dynamically calculate at setup-py time
h
Okay. What percentage of the monorepo is using a build tool like Pants or Bazel?
b
we want to NOT synchronize the BUILD version with git (source of truth) except only at setup-py invocation (we'd sed the darn things)
no idea about Pants or Bazel or any other
oh, sorry, misunderstood
Pants would govern the whole of this monorepo
h
What I’m trying to figure out is how feasible it would be for y’all to change your workflow such that you stop using internal wheels to express dependencies between projects, and instead have Pants manage those dependencies. If only, say, 3% of your project is using a build tool, then for now you might need to keep that workflow of installing from internal wheels. But otherwise, build tools like Pants exist so that you don’t need to handle things like project A installing the wheel for project B. project A can simply use the code for project B like it uses any other first-party code. The build tool helps empower you to do that, and track that a change in A doesn’t break B, etc
b
sorry... umm
I think you've got other orgs layouts in mind
right now there is one repo ("project")
whch is a very fat py lib (which we build into a pip module and lots of other projects install and use)
we're exploring turning that into a cluster of related pip packages and managing the source in a (mono)repo
but there will be intra-repo dependencies
in this universe, 100% of this (mono)repo is managed by the build tool
it ONLY has py libs - meant to be under the same namespace
ns/core ns/foo ns/bar
etc
where foo and bar will depend on core
typical usage would be that endusers would install ns/foo and ns/bar and those would pull in core without the enduser explicitly asking for them
much like
pip install boto3
gets you botocore
the thorn here is that we need git tags to be the source of truth of the pkg versioning
becasuse we can't have devs racing PRs with each other where they're separately bumping the versions different ways (major minor or patch)
so we have the tooling to let git manage the per-package versions, and then _at the last moment before calling some
setup-py
or similar,
sed
the version into the BUILD file
h
Okay, let me see if I have this right. So you have
ns/core
,
ns/foo
, and
ns/bar
all living in the same monorepo. Each of those top-level namespaces is a distinct `python_distribution`/wheel, right? And there might be dependencies between those packages within that same repo. At the same time, you have several other projects in your org which are not under that monorepo? They live in distinct repositories, and they install from your “lib monorepo” via internal wheels. It sounds like these are more your apps, rather than your core/libs. Is that right?
b
but this is a problem if I'm building ns/foo pkg, and it looks at ns/core/BUILD and sees a meaningless version="...." in there
Your paragraph 1 is correct
both paragraphs are correct 🙂
❤️ 1
h
Cool, thanks for explaining that. For context, are there any plans to move those other repos into your monorepo, where everything lives together?
b
No 🙂
the 2 drivers are 1. its too sprawling, make it tighter/cleaner 2. some pkgs have heavyweight 3rd party deps, and we'd rather isolate those so that if you don't need that (e.g. cassandra) then we don't force you to install those 3rd party deps
but we like keeping the code for these libs/pkgs/wheels next to each other - easier for reading and developing
(also for PR flow)
h
Okay. So your objective is to make sure that the version used by
ns/foo
truly is correct, that it reflects the state of things. For example, if
ns/core
has any changes, then `ns/foo`’s version should be bumped. Is that correct?
b
not quite
let's assume a pair of PRs where someone changes foo and core - in a related fashion
foo will now need a bumped version of core
enduser upgrades their ns.foo, and that should pull down the appropriate (newer) version of ns.core
the problem happens that we DON'T want to track the versions in the BUILD files - because only git should be doing that for us - to avoid races between engineers
đź‘Ť 1
now, this may be a thing worth mentioning - our build pipelines DO NOT support sending in the version of BOTH core and foo when we build foo
(build foo pkg)
so we can't do the last-momment but not commited to git sed on both ns/core/BUILD and ns/foo/BUILD
we'll only have the new version for foo
h
Okay, thank you for walking through that. It sounds like you would want to use a plugin for your
setup-py
kwargs then, which would read from Git and dynamically set the value without ever writing it to a BUILD file. https://www.pantsbuild.org/docs/python-setup-py-goal Do you already have the logic written for how you read from Git to determine the version? You only need to teach Pants how to read the value, rather than sed-ing your BUILD file?
b
oh yeah, the git version is easily available
h
Cool. So, it sounds like you’ll want to follow that guide to write a plugin. With the engine, you can run a subprocess like Git, and also tell the engine to not cache it because of course Git is going to change. The end effect will be like you’re doing this in your build file:
Copy code
python_distribution(
  ...
  provides=setup_py(
     ..
     version=subprocess.run(["git", "version-logic"]),
)
Does that sound right?
b
wow
âť“ 1
that would be
git tag parent-repo | grep core | sort -V
during an
ns.foo
build
except.. well, most of the time that would be right
but it would be nicer to be able to pin in a file
h
Okay, cool. Plugins are written in typed Python 3 code, so you would want to run a process with
git tag parent-repo
, and then you could use Python to search and to sort.
b
can one invent other macros? (targets, I guess?)
like python_library() ?
for consumption by a plugin?
h
but it would be nicer to be able to pin in a file
To pin the computed version to a BUILD file? Would that have the same challenge you were originally trying to solve of not wanting the BUILD file persisted to disk?
can one invent other macros? (targets, I guess?)
Yes, absolutely. And you can also add new fields to pre-existing target types for consumption by your plugin. See https://www.pantsbuild.org/docs/target-api-concepts for the Target API I’m not sure if that would be what you want to do here, though. I’m trying to figure out the best way to solve the goal you’re after
b
I want that in ns/foo/BUILD to pin the version of ns/core
but neither ns/foo/BUILD and ns/core/BUILD speak of their own versions
its not unlike specifying the version of a 3rd party lib
and thanks for the pointer to creating targets
❤️ 1
h
I want that in ns/foo/BUILD to pin the version of ns/core
This goes back to my original comment. I don’t think that this type of workflow works with Pants (or other build tools afaict). You can’t pin to certain versions of your own first-party code in the same monorepo. Your build always represents the current state of things.
b
I'm starting to suspect that monorepo semantics (as they're currently being socialized) may not be as good a fit for this project as one might think
h
Are you thinking you would want to split up your current monorepo into several smaller repos? I’m curious how you were solving this version beforehand of pinning to a certain version of code within the same repo?
b
no, same repo
currently, it is one repo -> one pip pkg
no problem to solve 🙂
but the project (in the sense of work) at hand is to split it so that one repo provides a related cluster of pip packages
đź‘Ť 1
h
Let us know if you’d be interested in help with writing that setup-py plugin that reads from Git to dynamically set the
version
(but not write to BUILD). We’d be happy to help with it Also happy to keep talking through your org’s code structure and whether a build tool like Pants or Bazel is a good fit. Even if y’all do end up going back to a single monolithic distribution, we hope Pants might be helpful in managing that monorepo, like running your tests and linters. And if Pants isn’t a good fit, that’s totally valid too. Our underlying motivation is to help people to have a solid build experience.
b
I honestly don't get your point and question there... (may be valid, but I don't get it)
and.. he deleted what I was replying to
a
i didn't think it was a good response to your thoughts and i'm gonna chill out and scroll back a bit to get a better idea. sorry
b
np 🙂
my situation is NOT a "classic" monorepo like some web-fe / js things (which tend to snapshot app+libs state and bundle together)
✍🏻 1
its just meant to be a collection of related py libs
as if boto3 and botocore were in the same tree/repo
and you said
pip install boto3.s3
because that was the only thing you wanted - and it knew to pull in a pinned (or at least minimum) version of botocore
maybe that's a better (relatable) example?
Ok, I must rolll for the day. Thanks much for all your patient help
❤️ 1
h
Hmm "one repo provides a related cluster of pip packages" is something Pants supports today. It sounds like you just need a custom way to generate version numbers based on git state? (and I think you're at least the second person to ask about that, so maybe it should be a core thing at some point)
Is the idea that at deploy time dependencies are consumed via pip artifacts, but at build time via direct in-repo dependencies?
h
It sounds like you just need a custom way to generate version numbers based on git state?
Yes, but iiuc, the desire is to further be able to pin a portion of the monorepo to a certain state of the rest of the monorepo. Like how you would pin a 3rdparty req to ==x.y.z, but now you’d be doing it with your own first-party code in the same repo.
h
For example, if lib B depends on lib A then at deploy time B is a wheel that declares
A==1.2.3
in its requirements metadata, but when you run B's tests it simply grabs A's code from within the repo?
OK, so even at build time (say when running tests) you consume the wheel version, not the raw source?
h
OK, so even at build time (say when running tests) you consume the wheel version, not the raw source?
Michael had to log off, but this is my understanding of what he’s after. That’s how the rest of their organization does it; they have several smaller repos that consume this one monorepo of libraries. Those each pin to the particular version they want of the monorepo.
b
@happy-kitchen-89482 had it correctly with
Copy code
if lib B depends on lib A then at deploy time B is a wheel that declares A==1.2.3 in its requirements metadata, but when you run B's tests it simply grabs A's code from within the repo?
the problem comes up in that we don't want to maintain version info for A or B about themselves in the BUILD files - this will be dynamically set just before running
setup-py
. But we do have cases, much like for 3rdparty, of wanting to pin an internal dep, just like benjyw said. This sort of 1stparty pin we would explicitly write into B's BUILD file and commit that to version control (just like requirements.txt, etc.)
it seems as if naively running
setup-py
that Pants goes and infers "B depends on A; let's look at the
version="..."
in A/BUILD and use that value when generating the setup.py `install_requires`"
I need to short-circuit THAT part and have Pants take an explicit 1stparty dep version from B's BUILD (at any rate, something other than the version field in A/BUILD's
python_distribution
)
come to think of it....
the 1stparty-ness of the source of the "A" is what is confusing matters here
as far as B is concerned, it has no real idea and doesn't care that A was built from code in the same repo tree. B expects to be installed on a machine where A of sufficient version is already installed, or can be pulled in as dictated by
install_requires
so the question might be "how to convince pants that something which is technically local to this tree should be treated like a 3rdparty lib?"
AHA!
Copy code
python_requirement_library(
    name="installed_core",
    requirements=["ns.core==1.0.0"],
)

python_library(
    name="cassandra_leader_election_lib",
    sources=['*.py', '!tests'],
    dependencies=[
        ":installed_core"
#        "gh/core"
    ],
)

python_distribution(
    name="leader_election_cassandra",
    dependencies=[
        ":cassandra_leader_election_lib",
    ],
    provides=setup_py(
        ....
    )
I "tricked" it by using the
python_requirement_library()
to create an alias of sorts
@hundreds-father-404 FYI ^
h
So unless I'm misunderstanding, I think things already work this way today?
Ah yes, exactly, what I was thinking was that you need a target to represent "library A as a published wheel", which is your
python_requirement_library
I guess.
b
so you'd say my trick above is the "right way" to approach my situation?
h
I think so, but I'll reread the thread to be sure I understand all the details.
b
(obvs least amount of hackery... compared to writing plugins)
you can start from where I confirmed that you understood correctly with your summary from yesterday evening 🙂
h
Thanks for the shortcut 🙂
I think this is right: The
python_distribution
reflects "how to publish A", the
python_requirement_library
is "A after it has been published" and the
python_library
is "the code of A".
b
nod
if I have an unrelated q, should I start a new thread or ...?
h
If B depends on the
python_library
of A then it pulls in A's code directly from the repo at the current git sha. If it depends on the
python_requirement_library
then it pulls in A's code as-if it was a third party dep.
But you don't want to depend on both.
New thread for new q is great, thanks!
b
yeah, basically want B's python_libarary to treat A as a 3rd party (assumption that it has been published) irrespective of the fact that they're in the same git repo
h
Then just depending on the
python_requirement_library
instead of the
python_library
sounds like exactly the right move.