Problem: I want our CICD tool to kick off a projec...
# general
j
Problem: I want our CICD tool to kick off a project pipeline when
./pants --changed-since="${REF}"
contains that pipeline. If multiple project pipelines have changed, then multiple triggers will be fired. The atomic unit of v2 is the file. So that implies that when looking at the output of
change-since
, the tool needs to create a list of all the projects involved then fire the triggers. That begs two questions: * How do I define a project? * How do I map a file to a project?
For how to define a project, I am leaning towards using the top of the source roots. In other words for
Copy code
nyc/src/python/brooklyn
nyc/src/python/queens
nyc/tests/python/brooklyn_tests
nyc/tests/python/queens_tests
then the project would be
nyc
and not
brooklyn
and
queens
.
h
Do you mean that you have multiple top-level projects, and each project has a corresponding CI pipeline (~set of CI jobs to run) specific to that project? You want to design a way to know when to run which pipeline, right?
j
Yes.
We currently have a single job that runs all the tests all in one pipeline.
And we have no visual method for seeing the test status for a particular project.
h
Okay, and it looks like you’re using this project structure: https://www.pantsbuild.org/v2.1/docs/source-roots#multiple-top-level-projects Note that warning:
Note that even though the projects live in different top-level folders, you are still able to import from other projects. If you would like to limit this, you can use ./pants dependees or ./pants dependencies in CI to track where imports are being used. See Project introspection.
(Someone was also wondering about a Pants feature to automate some of that) Tangibly, you’d want to use
--changed-dependees=transitive
here I think
j
./pants roots
🀩
h
Overall, that sounds sensible to use the top level source root to define what is a project. It’s hard to give a solid answer without knowing more about your org, but it sounds reasonable Does each top-level folder generally correspond to a particular team? Another approach I’ve seen is to use OWNERS files for more granularity, so that you can say this particular subfolder belongs to the Ads team, and this one to the Billing team, etc
w
if you defined a target type associated with the pipeline, you could filter to just that type. but if you don’t have other metadata associated with a pipeline, and they’re all uniform, then maybe that’s more boilerplate than you’re looking for.
πŸ€” 1
j
Yeah. Top level folders correspond to services. (Teams are empowered to modify any and all services that impact their business goals. We are small enough to coordinate when overlap happens.)
So in a
nyc/BUILD
file have a "cicd_pipeline" target?
w
yea.
j
I smell having to write another plugin.
But not as complex as the jupyter one.
h
if you defined a target type associated with the pipeline, you could filter to just that type
What would this look like in terms of
--changed-since
? Users would have to add an explicit
dependencies
field to the pipeline target so that it knows which files belong to it? (The target having a
sources
field that globs over the whole project isn’t a great idea, but possible. It means that things like file args will resolve to both targets, the Python one and the pipeline one)
But not as complex as the jupyter one.
Yeah this would only be using the Target API, which is much much easier. No Rules API this time
w
ah. true, it would. the target would need to depend on the inputs to the pipeline.
j
That's a great idea. The CICD tool we are considering can take YAML on stdin to create the pipeline on the fly. So instead of boiler plating the yml file, we could run
./pants pipeline target
.
h
we could runΒ .pants pipeline target.
That part would require the Rules API hehe. But we’re happy to help with that
w
^Eric’s point is very relevant. using targets like i was suggesting would require that the target had useful dependencies on code in the project.
j
Yeah. I want to avoid people editing multiple files when creating new code.
This is my alpha version of the script:
Copy code
β”‚ File: build-support/buildkite/generate-project-tests-pipeline.sh
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   β”‚ #!/bin/bash
   2   β”‚
   3   β”‚ set -eu
   4   β”‚
   5   β”‚ echo "steps:"
   6   β”‚
   7   β”‚ for atest in $(./pants  filter --filter-target-type="python_tests" ${1}::)
   8   β”‚ do
   9   β”‚     echo "  - command: \"./pants test ${atest}\""
  10   β”‚     echo "    label: \"${1} test $(basename ${atest})\""
  11   β”‚ done
w
i expect that having filename/layout conventions would be easier.
j
which defines a project pipeline.
generate-poject-tests-pipeline.sh nyc
would create a pipeline that tests ALL the stuff in the
nyc
project. I'd like this to only test the
changed-since
stuff.
h
sing targets like i was suggesting would require that the target had useful dependencies on code in the project.
That’s a pretty major restriction and will be easy for your coworkers to forgot to do, which is dangerous with CI There are two possibilities I see with the custom target approach: 1) Set the
sources
field to glob over everything. It’s not a huge deal if a non-Python target and a Python target refer to the same thing afaict. (Two Python targets means dep inference doesn’t work). Things like
test
will just ignore the irrelevant file. The downside is that the output of
--changed-since
and
list
will be longer, which may confuse your coworkers 2) Benjy speculated the other day if we should add
dependencies
globs..I highly doubt we want that, including for perf reasons, but it’s possible perhaps.
w
3) do it with inference.
h
Oh, that would work actually!
Oh for sure that’s what we should do!
w
but using inference still amounts to having a layout convention: if you don’t follow it (β€œa ci_cd target depends on all targets below it in the directory structure”) you run into trouble
πŸ‘ 2
h
Yeah, true. It wouldn’t be that much different than
--changed-since
mixed with the script, except that parts of it would be automated, like using Pants to filter out irrelevant things
j
The
changed-since
is mutually exclusive of target arguments. Couldn't this be solved by
./pants --changed-since=${REF} --target-regex="nyc*"
or something like that? I could just use
grep
on the output of the
changed-since
, but this seems like something
filter
could own, too.
w
that regex would encode the filename/layout convention, basically
but yes
j
Ok. Thank you for helping me think out loud. Also keeping me from going down the
./pants pipeline
path prematurely.
I'm going to see how convoluted it gets using the existing
pants
methods. I suspect there is an easy pattern in here that I just haven't kneaded out yet.
πŸ‘ 1
lesson: reasonable layout of repo pays dividends