I'm trying to think of ways to incrementally add p...
# general
f
I'm trying to think of ways to incrementally add pants to a chaotic monorepo, and I'd like to validate an idea I had. The plan would be to deeply restrict where BUILD files go to a few components that sit at the leaves of our messy dependency graph, and then just ask Pants if it knows about those files in CI... If it knows about the full set of files in the changeset, we could use Pants to direct the CI pipeline. If there are files Pants doesn't know about, then we'd give up and just run the old pipeline (which basically runs all tests in the monorepo). Does this approach make sense?
b
@hundreds-father-404 is OOO today, but I bet he'll have thoughts on this tomorrow.
šŸ‘ŒšŸ» 1
f
An alternative to leaving files out of BUILD files might be to group files into different targets based on what we we're ready to deal with, something like
Copy code
SOURCES_DEFAULT = (
    '*.py', '*.pyi', '!test_*.py', '!*_test.py', '!tests.py', '!conftest.py', '!test_*.pyi', '!*_test.pyi', '!tests.pyi',
)

READY_FILES = ["alert_rule_verifier.py"]

python_sources(
    sources=READY_FILES,
)

python_sources(
    name="unready",
    sources=[
        *SOURCES_DEFAULT,
        *(f"!{name}" for name in READY_FILES),
    ],
    tags=["unready"],
)
I suppose this could be written in a macro if it works
b
Ideally, you should be able to switch over code for `format`ting right? As dependencies shouldn't matter. Then, switching over
lint
and
check
shouldn't be too hard (at least that was my experience), as the dependency inference is quite good. I just turned on
unowned_dependency_behavior
and played whack-a-mole with the errors until they stopped. (We haven't gotten there yet, but) once you have
lint
, I think
test
really isn't hard to migrate either šŸ¤” . Really the challenges are verifying the built things (docker images or Python binaries)
šŸ‘ 1
(But in our use case, we're not doing any "switching" per-se. We run both Pants and Bazel, and as things migrate they only ever live in one bucket)
f
We don't use pip for python requirements. We get all requirements from our system packaging ecosystem (RPM). So it will be a long time until we can even think about using
test
. Switching to {test, run, package} will be a long journey of integrating Pants with RPM and doing hermetic builds in lightweight containers rather than just pexes in temp dirs. And in general our dependency graph is a total mess.
šŸ‘€ 1
We're switching from a bunch of Makefiles and a kitchen sink approach to dependencies and packaging that "just works" but is deeply inefficient
We're hoping demonstrating the advantages of Pants (and really foregrounding dependency stuff a bit in our devs' minds) will incentivize the refactoring we need to accomplish our release goals (independent deployment of separate services, ability to do canaries and reverts, etc), by giving developers a way around our 1 hr PR pipelines
āž• 1
But my idea is to start at the leaves here: essentially if
./pants dependees --changed-since=$MERGE_TARGET
fits in the blessed set of files we trust to run through pants, we can skip whole sections of the legacy pipeline
c
In our case moving a legacy repo to pants there were essentially many small problems that fell into two categories: 1. BUILD file wrong/doesn't exist 2. circular dep Both of these can be solved in a non pants repo - BUILD files can exist in any codebase and just be ignored. There is nothing special about pants that fixes circular deps - it just makes them a problem you have to deal with. So basically we just started pickingg away at these errors on our main branch until we were ready to switch to pants. We wrote a shell script that did the last mile of converting the repo over (basically a bunch of sed commands) and then used that to check our progress along the way.
šŸ¤” 2
We had a few smaller apps that we moved over first but eventually had one large app that comprised the majority of our code.
h
we could use Pants to direct the CI pipeline.
To do what, in particular? I agree with Joshua that
fmt
and
lint
are good entry-points and realistic to run over whole repo. Unless you use Pylint, no need to teach Pants about any third-party dependencies at all.
f
fmt
and
lint
don't have to think about deps, which also makes them easy to run on changed files without Pants...
Copy code
CHANGED_PY_FILES="$(git diff --name-only --diff-filter=MRA $TARGET_REF '*.py')"
which is how we do this already. So it's not much of a value-add for us to start with this.
āž• 1
To do what, in particular?
To run only tests that are part of the transitive dependencies of the changed set of code. Something like
Copy code
./pants dependees --changed-since=$TARGET_REF | xargs ./pants filter --target-type=python_test --granularity=file | cut -d: -f1
will give you paths you can pass to
pytest
directly
The dep inference works like a charm, even without specifying third party, but: ā€¢ We have some places that use dynamic imports or have other kinds of implicit dependencies, and I don't want to let relationships like that skate through CI untested until we've "cleared" a component for use in pants ā€¢ There are ad-hoc build mechanisms scattered throughout the code base; Pants doesn't know about these files (yet), so if these files get changed, we need to just abort and use the legacy "build everything" pipeline, until we've taught Pants how to deal with these components
I guess my idea is that by starting at some more-or-less well-isolated components that work well with dep inference, we can demonstrate the value of a tool like this much more clearly than we could with "easy" things like fmt and lint
šŸ‘ 1
Btw, answering "does Pants know about the files in this changeset?" seems to be easy to answer with
Copy code
./pants list $(git diff --name-only --diff-filter=MRAC $TARGET_REF)
which will fail if you pass it files that aren't assigned to a Pants target
I guess I just need to answer the "are you a 'cleared' component?" question... which I think I might be able to do with plain
targets()
with tags šŸ¤”
h
f
That might work, or I could also just tag tests and use
filter
h
Yeah, although benefit of using
skip_tests
is
./pants test ::
will Just Work
f
I can't really use
./pants test
though without a good bit of work to either create a hermetic RPM-based environment on the fly, or to create some hack to make it run non-hermetically
not using pip basically means not using pex (except internally for tools maybe)
h
Ah sorry okay, I misinterpreted how you're using Pants. My bad. Yeah,
skip_tests
isn't as useful because they all need to be skipped.
answering "does Pants know about the files in this changeset?" seems to be easy to answer with
That will answer if the changed file is known about, but you still need to answer if the dependees of that file are blessed. It might be useful to error on unrecognized imports so that you can be more confident dependencies are valid: https://www.pantsbuild.org/docs/reference-python-infer#section-unowned-dependency-behavior Maybe only create targets for blessed code? Use the new
tailor
ignore options so Pants doesn't try to add targets for those. https://www.pantsbuild.org/docs/create-initial-build-files
f
I tried creating targets for blessed code only... I ended up with a lot of ignore patterns; I tried using
!inverted_patterns
but that didn't work (have it on my todo to report this as a bug/issue)
And I thought about this and realized the visualization of dependencies and things that comes with having all the stuff in BUILD files works, so I'd kinda like to keep them
šŸ‘ 1
I guess what maybe I could do is create some "components" with higher-level dependencies, and make sure that the transitive closure of the changeset all map to a known "component"
I'm just struggling to visualize all these graph problems in my head... I need to get a whiteboard šŸ˜…
šŸ’Æ 1
h
Yeah that's a good point. So it sounds like you're debating one of these two approaches then? 1) In each BUILD file, have a
ready
target vs.
not ready
target, You'll use a macro or could use the new Target Generation plugin API to approach that. You'd use
tags
to indiciate which is which, or alternatively could add a new plugin field like "blessed": https://www.pantsbuild.org/docs/target-api-extending-targets 2) Treat BUILD files like normal, i.e. what
./pants tailor
would do. But also have some
targets
and put in their
dependencies
what is blessed. That then seems like a philosophical question of where the "blessed" metadata should live: do you want it centralized? Or decentralized to each directory?
f
Probably centralized for now, I think I can switch it
h
To help answer that, who will be doing the blessing? If it's only you and maybe someone else, centralized makes the most sense to me. If you want to empower everyone to decide what to bless, decentralized is better
f
For now, just the RE team
šŸ‘ 1
So yeah, centralized makes more sense
And the graph question becomes... "Do all paths in this transitive closure of dependees terminate in a blessed target?", just need to translate that into a query I ask pants
Thanks for being a sounding board @hundreds-father-404, it's really helpful! (and for suggesting plenty of ideas even though my use case is pretty out there)
ā¤ļø 1
h
And the graph question becomes... "Do all paths in this transitive closure of dependees terminate in a blessed target?", just need to translate that into a query I ask pants
Are you going to explicitly list every file in the `target`'s
dependencies
? If so, you could run
./pants dependencies path/to:target_blesser
(not
--transitive
) and save that list. Then, I think your earlier command of
./pants --changed-since --changed-dependees=transitive
and check that everything is in the earlier list. And then you'll probably want to use that
unowned_dependencies_behavior
option
Note that this will not handle `files`/`resources`, though! If someone changes a JSON file for example, Pants will not naively know about that dependency. I'm not sure how you can mitigate that risk. Normally you do it by running
test
, which is a non-starter
f
For files/resources, I was just going to make targets like
Copy code
python_sources(name="sources")
files(name="conf", sources=["*.yml"])
target(name="dummy_linker", dependencies=["sources", "conf"])
šŸ‘ 1
and then the higher level component can depend on that linker target
I don't want to explicitly list every file if I can avoid it
I think I can get away with not listing every file if I do this check:
./pants dependees --changed-since=$branch --transitive
must be a subset of
./pants dependencies --transitive path/to:blessed_targets
šŸ‘ 1
h
Yeah, I think that's right. You can use
./pants --chanced-since=$branch --changed-dependees=transitive list
for the first query btw. Otherwise if you use
dependees
I think you want to use
--closed
to ensure the input targets are included
šŸ™šŸ» 1