I m trying to think of ways to incrementally add pants to a Pants #general

I'm trying to think of ways to incrementally add p...

flat-zoo-31952

01/12/2022, 10:08 PM

I'm trying to think of ways to incrementally add pants to a chaotic monorepo, and I'd like to validate an idea I had. The plan would be to deeply restrict where BUILD files go to a few components that sit at the leaves of our messy dependency graph, and then just ask Pants if it knows about those files in CI... If it knows about the full set of files in the changeset, we could use Pants to direct the CI pipeline. If there are files Pants doesn't know about, then we'd give up and just run the old pipeline (which basically runs all tests in the monorepo). Does this approach make sense?

busy-vase-39202

01/13/2022, 12:20 AM

@hundreds-father-404 is OOO today, but I bet he'll have thoughts on this tomorrow.

👌🏻 1

flat-zoo-31952

01/13/2022, 1:13 AM

An alternative to leaving files out of BUILD files might be to group files into different targets based on what we we're ready to deal with, something like

Copy code

SOURCES_DEFAULT = (
    '*.py', '*.pyi', '!test_*.py', '!*_test.py', '!tests.py', '!conftest.py', '!test_*.pyi', '!*_test.pyi', '!tests.pyi',
)

READY_FILES = ["alert_rule_verifier.py"]

python_sources(
    sources=READY_FILES,
)

python_sources(
    name="unready",
    sources=[
        *SOURCES_DEFAULT,
        *(f"!{name}" for name in READY_FILES),
    ],
    tags=["unready"],
)

I suppose this could be written in a macro if it works

bitter-ability-32190

01/13/2022, 4:41 PM

Ideally, you should be able to switch over code for `format`ting right? As dependencies shouldn't matter. Then, switching over

lint

and

check

shouldn't be too hard (at least that was my experience), as the dependency inference is quite good. I just turned on

unowned_dependency_behavior

and played whack-a-mole with the errors until they stopped. (We haven't gotten there yet, but) once you have

lint

, I think

test

really isn't hard to migrate either 🤔 . Really the challenges are verifying the built things (docker images or Python binaries)

👍 1

bitter-ability-32190

01/13/2022, 4:43 PM

(But in our use case, we're not doing any "switching" per-se. We run both Pants and Bazel, and as things migrate they only ever live in one bucket)

flat-zoo-31952

01/13/2022, 5:02 PM

We don't use pip for python requirements. We get all requirements from our system packaging ecosystem (RPM). So it will be a long time until we can even think about using

test

. Switching to {test, run, package} will be a long journey of integrating Pants with RPM and doing hermetic builds in lightweight containers rather than just pexes in temp dirs. And in general our dependency graph is a total mess.

👀 1

flat-zoo-31952

01/13/2022, 5:12 PM

We're switching from a bunch of Makefiles and a kitchen sink approach to dependencies and packaging that "just works" but is deeply inefficient

flat-zoo-31952

01/13/2022, 5:15 PM

We're hoping demonstrating the advantages of Pants (and really foregrounding dependency stuff a bit in our devs' minds) will incentivize the refactoring we need to accomplish our release goals (independent deployment of separate services, ability to do canaries and reverts, etc), by giving developers a way around our 1 hr PR pipelines

➕ 1

flat-zoo-31952

01/13/2022, 5:19 PM

But my idea is to start at the leaves here: essentially if

./pants dependees --changed-since=$MERGE_TARGET

fits in the blessed set of files we trust to run through pants, we can skip whole sections of the legacy pipeline

clean-city-64472

01/13/2022, 6:34 PM

In our case moving a legacy repo to pants there were essentially many small problems that fell into two categories: 1. BUILD file wrong/doesn't exist 2. circular dep Both of these can be solved in a non pants repo - BUILD files can exist in any codebase and just be ignored. There is nothing special about pants that fixes circular deps - it just makes them a problem you have to deal with. So basically we just started pickingg away at these errors on our main branch until we were ready to switch to pants. We wrote a shell script that did the last mile of converting the repo over (basically a bunch of sed commands) and then used that to check our progress along the way.

🤔 2

clean-city-64472

01/13/2022, 6:35 PM

We had a few smaller apps that we moved over first but eventually had one large app that comprised the majority of our code.

hundreds-father-404

01/13/2022, 7:59 PM

we could use Pants to direct the CI pipeline.

To do what, in particular? I agree with Joshua that

fmt

and

lint

are good entry-points and realistic to run over whole repo. Unless you use Pylint, no need to teach Pants about any third-party dependencies at all.

flat-zoo-31952

01/13/2022, 8:49 PM

fmt

and

lint

don't have to think about deps, which also makes them easy to run on changed files without Pants...

Copy code

CHANGED_PY_FILES="$(git diff --name-only --diff-filter=MRA $TARGET_REF '*.py')"

which is how we do this already. So it's not much of a value-add for us to start with this.

➕ 1

flat-zoo-31952

01/13/2022, 9:03 PM

To do what, in particular?

To run only tests that are part of the transitive dependencies of the changed set of code. Something like

Copy code

./pants dependees --changed-since=$TARGET_REF | xargs ./pants filter --target-type=python_test --granularity=file | cut -d: -f1

will give you paths you can pass to

pytest

directly

flat-zoo-31952

01/13/2022, 9:11 PM

The dep inference works like a charm, even without specifying third party, but: • We have some places that use dynamic imports or have other kinds of implicit dependencies, and I don't want to let relationships like that skate through CI untested until we've "cleared" a component for use in pants • There are ad-hoc build mechanisms scattered throughout the code base; Pants doesn't know about these files (yet), so if these files get changed, we need to just abort and use the legacy "build everything" pipeline, until we've taught Pants how to deal with these components

flat-zoo-31952

01/13/2022, 9:13 PM

I guess my idea is that by starting at some more-or-less well-isolated components that work well with dep inference, we can demonstrate the value of a tool like this much more clearly than we could with "easy" things like fmt and lint

👍 1

flat-zoo-31952

01/13/2022, 9:14 PM

Btw, answering "does Pants know about the files in this changeset?" seems to be easy to answer with

Copy code

./pants list $(git diff --name-only --diff-filter=MRAC $TARGET_REF)

which will fail if you pass it files that aren't assigned to a Pants target

flat-zoo-31952

01/13/2022, 9:20 PM

I guess I just need to answer the "are you a 'cleared' component?" question... which I think I might be able to do with plain

targets()

with tags 🤔

hundreds-father-404

01/13/2022, 9:21 PM

Have you considered using

skip_tests

peek

? https://www.pantsbuild.org/docs/existing-repositories#3-set-up-tests

flat-zoo-31952

01/13/2022, 9:23 PM

That might work, or I could also just tag tests and use

filter

hundreds-father-404

01/13/2022, 9:26 PM

Yeah, although benefit of using

skip_tests

./pants test ::

will Just Work

flat-zoo-31952

01/13/2022, 9:30 PM

I can't really use

./pants test

though without a good bit of work to either create a hermetic RPM-based environment on the fly, or to create some hack to make it run non-hermetically

flat-zoo-31952

01/13/2022, 9:31 PM

not using pip basically means not using pex (except internally for tools maybe)

hundreds-father-404

01/13/2022, 9:41 PM

Ah sorry okay, I misinterpreted how you're using Pants. My bad. Yeah,

skip_tests

isn't as useful because they all need to be skipped.

answering "does Pants know about the files in this changeset?" seems to be easy to answer with

That will answer if the changed file is known about, but you still need to answer if the dependees of that file are blessed. It might be useful to error on unrecognized imports so that you can be more confident dependencies are valid: https://www.pantsbuild.org/docs/reference-python-infer#section-unowned-dependency-behavior Maybe only create targets for blessed code? Use the new

tailor

ignore options so Pants doesn't try to add targets for those. https://www.pantsbuild.org/docs/create-initial-build-files

flat-zoo-31952

01/13/2022, 9:55 PM

I tried creating targets for blessed code only... I ended up with a lot of ignore patterns; I tried using

!inverted_patterns

but that didn't work (have it on my todo to report this as a bug/issue)

flat-zoo-31952

01/13/2022, 9:56 PM

And I thought about this and realized the visualization of dependencies and things that comes with having all the stuff in BUILD files works, so I'd kinda like to keep them

👍 1

flat-zoo-31952

01/13/2022, 9:57 PM

I guess what maybe I could do is create some "components" with higher-level dependencies, and make sure that the transitive closure of the changeset all map to a known "component"

flat-zoo-31952

01/13/2022, 9:58 PM

I'm just struggling to visualize all these graph problems in my head... I need to get a whiteboard 😅

💯 1

hundreds-father-404

01/13/2022, 10:03 PM

Yeah that's a good point. So it sounds like you're debating one of these two approaches then? 1) In each BUILD file, have a

ready

target vs.

not ready

target, You'll use a macro or could use the new Target Generation plugin API to approach that. You'd use

tags

to indiciate which is which, or alternatively could add a new plugin field like "blessed": https://www.pantsbuild.org/docs/target-api-extending-targets 2) Treat BUILD files like normal, i.e. what

./pants tailor

would do. But also have some

targets

and put in their

dependencies

what is blessed. That then seems like a philosophical question of where the "blessed" metadata should live: do you want it centralized? Or decentralized to each directory?

flat-zoo-31952

01/13/2022, 10:05 PM

Probably centralized for now, I think I can switch it

hundreds-father-404

01/13/2022, 10:06 PM

To help answer that, who will be doing the blessing? If it's only you and maybe someone else, centralized makes the most sense to me. If you want to empower everyone to decide what to bless, decentralized is better

flat-zoo-31952

01/13/2022, 10:06 PM

For now, just the RE team

👍 1

flat-zoo-31952

01/13/2022, 10:07 PM

So yeah, centralized makes more sense

flat-zoo-31952

01/13/2022, 10:08 PM

And the graph question becomes... "Do all paths in this transitive closure of dependees terminate in a blessed target?", just need to translate that into a query I ask pants

flat-zoo-31952

01/13/2022, 10:08 PM

Thanks for being a sounding board @hundreds-father-404, it's really helpful! (and for suggesting plenty of ideas even though my use case is pretty out there)

❤️ 1

hundreds-father-404

01/13/2022, 10:12 PM

And the graph question becomes... "Do all paths in this transitive closure of dependees terminate in a blessed target?", just need to translate that into a query I ask pants

Are you going to explicitly list every file in the `target`'s

dependencies

? If so, you could run

./pants dependencies path/to:target_blesser

(not

--transitive

) and save that list. Then, I think your earlier command of

./pants --changed-since --changed-dependees=transitive

and check that everything is in the earlier list. And then you'll probably want to use that

unowned_dependencies_behavior

option

hundreds-father-404

01/13/2022, 10:12 PM

Note that this will not handle `files`/`resources`, though! If someone changes a JSON file for example, Pants will not naively know about that dependency. I'm not sure how you can mitigate that risk. Normally you do it by running

test

, which is a non-starter

flat-zoo-31952

01/13/2022, 10:24 PM

For files/resources, I was just going to make targets like

Copy code

python_sources(name="sources")
files(name="conf", sources=["*.yml"])
target(name="dummy_linker", dependencies=["sources", "conf"])

👍 1

flat-zoo-31952

01/13/2022, 10:24 PM

and then the higher level component can depend on that linker target

flat-zoo-31952

01/13/2022, 10:25 PM

I don't want to explicitly list every file if I can avoid it

flat-zoo-31952

01/13/2022, 10:39 PM

I think I can get away with not listing every file if I do this check:

./pants dependees --changed-since=$branch --transitive

must be a subset of

./pants dependencies --transitive path/to:blessed_targets

👍 1

hundreds-father-404

01/13/2022, 10:43 PM

Yeah, I think that's right. You can use

./pants --chanced-since=$branch --changed-dependees=transitive list

for the first query btw. Otherwise if you use

dependees

I think you want to use

--closed

to ensure the input targets are included

🙏🏻 1

3 Views

Open in Slack

Previous Next