Is there a way I can easily run multiple `experime...
# general
f
Is there a way I can easily run multiple `experimental_run_shell_command`s in parallel? For example, both of these
experimental_run_shell_commands
were identified as changed:
Copy code
 ./pants --changed-since=HEAD --changed-dependees=transitive --filter-tag-regex='^cdk$' list
aws/projects/project_1:cdk
aws/projects/project_2:cdk
But if I try to
run
them, I get an error:
Copy code
 ./pants --changed-since=HEAD --changed-dependees=transitive --filter-tag-regex='^cdk$' run
12:32:43.46 [ERROR] 1 Exception encountered:

  TooManyTargetsException: The `run` goal only works with one valid target, but was given multiple valid targets:

  * aws/projects/project_1:cdk
  * aws/projects/project_2:cdk

Please select one of these targets to run.
I can hack something with
xargs
like this:
Copy code
export PANTS_CONCURRENT=True && ./pants --changed-since=HEAD --changed-dependees=transitive --filter-tag-regex='^cdk$' list | xargs -L1 -P 2 ./pants run
but I was wondering if there's a "better"/more
pants
-y way to do this?
Looks like maybe GNU
parallel
would be better than
xargs
because it batches up output as if commands were run sequentially.
b
Interactive Processes can only be run serially. So multiple pants commands (without using the daemon) is the only way
f
Interactive Processes
My command doesn't need to take any input - does that mean it's non-interactive and I should be using a different method other than
experimental_run_shell_command
?
b
That's the internal name for anything being
run
. What do your processes do? AFAIK we don't have a mechanism to have the user request multiple processes run in parallel on-demand and without caching 🤔
f
Ah, gotcha. That's helpful context. My script calls out to a
cdk
(AWS CDK infrastructure-as-code tool) to run a command (
diff
or
deploy
infrastructure). That command needs the Python context from Pants since the CDK code itself is defined in Python.
So the command looks like this in the CDK project's
BUILD
file:
Copy code
experimental_run_shell_command(
  name="cdk",
  tags=["cdk"],
  command="../../scripts/cdk-deploy.sh",
  dependencies=["aws/projects/project_1:project_1"],
  workdir="aws/projects/project_1",
)
and then the script (simplified) looks like this:
Copy code
#! /bin/bash
source "$SCRIPT_DIR/../../dist/export/python/virtualenvs/cdk_dependencies/3.8.16/bin/activate"

export PYTHONPATH="$SCRIPT_DIR/../projects:$PYTHONPATH"
npx -y cdk synth
So it's a little hacky. I have to
pants export
to get the virtualenv and set the PYTHONPATH myself. There might be a better way.
CDK is kinda funky, the CDK "binary" is written in Node, so I need to use
npx
(NPM execute) to run it. Sadly, I can't just "run a Python file"
b
Hmmm you could maybe plug into the
package
command. There's a
deploy
one as well. That'd involve a plugin today, but could also be extended
👀 1
f
Let me look at the docs for those. IIRC I looked at
deploy
and I thought it was helm-specific.
I don't think I looked into
package
at all for this
b
It is today, but anything is pluggable 😌
f
Ah, so via
experimental-deploy
is how I'd plug in? https://www.pantsbuild.org/docs/reference-experimental-deploy (not really any docs, but maybe I can reference the Helm impl for an example)
b
There's a bit of a paradigm shift going on in regard to shell processes. I think we're ooching towards really opening up the floodgates with those. I could easily see, from recent changes, a way to specify an
experimental_shell_publish_command
which: can run in parallel, as part of
publish
CC @ancient-vegetable-10556 /@happy-kitchen-89482 /@witty-crayon-22786 while we're splashing at
shell
stuff
👍 1
f
Sure. For some more perspective, I'm coming from `yarn`/`lerna` monorepos and they can
run
any arbitrary
script
in the
package.json
with a concurrency flag. It's super nice!
Copy code
yarn lerna run --since '' --concurrency 10 cdk -- deploy '**'
b
(Oh my bad, I mean
publish
not
package
. brain fog got me there)
👀 1
a
b
^ Yup that was my mental reference
a
For what it’s worth, and I haven’t spent too much time reading whether this needs to run outside the sandbox, but you can do this:
Copy code
experimental_shell_command(name="a", command="first_command")

experimental_shell_command(name="b", command="second_command")

experimental_run_shell_command(name="c", dependencies=[":a", ":b"], command="/bin/true")
and then
./pants run path/to:c
but the above assumes that
first_command
and
second_command
can be run inside the sandbox
but if so, they’d run in parallel
b
They'd also be cached, which isn't ideal 😕
a
They’re only cached so long as their input dependencies don’t change
f
I think this might be the behavior I want. If: • the Python source code (that defines the infrastructure) doesn't change; or, • the underlying 3rdparty dependencies don't change; or, • the shared "library" methods don't change then the CDK shell command(s) should not be run at all
a
then try a format like the above; noting that you’ll probably need some hackery to handle reverts
(specifically: reverts could set the state of the repo back to one where the results of those tasks were cached, and therefore wouldn’t run)
f
Hmm, gotcha. Yeah the caching kinda scares me. With infra-as-code, I'd rather run "too often" and let the provisioning engine determine it's a no-op than not triggering when we should.
I think the way I have it now, using GNU
parallel
is probably safer in that respect?
a
Certainly.
f
Alright, cool. I think I'll go that route for now. @ancient-vegetable-10556 and @bitter-ability-32190, thank you for taking the time to help me out! I hope that, in exchange, this use-case is useful for consideration (it sounds like maybe you're thinking about this anyway).
a
I have considered making it possible to mark `experimental_shell_command`s as non-cacheable, but that is for a later date!
👍 1
f
FWIW, I looked at
bazel
before
pants
and
pants
was so much more understandable for me!
🙌 3
I got about halfway through a
bazel
tutorial video and I was like...

https://media.giphy.com/media/wYyTHMm50f4Dm/giphy.gif

😄 1
b
I migrated our monorepo and haven't looked back lol
🙌 1
b
@famous-river-94971 glad it's working out for you! Would it be okay to quote you, with or without attribution, on Twitter?
f
Hey Carina - sure, feel free! I don't want to ruffle any feathers with Bazel lovers, so just use the quote, no attribution necessary.
I'm happy to tell you all how I really feel in our Pants slack safe-space, but not trying to upset folks RE: bazel. Every tool has its place and purpose 🙂
b
Sure!
We very much agree, by the way. We're rooting for people to have whatever tool best fits their use case.
b
There's definitely situations bazel is a better choice for build+test. And even in those cases pants can still help out with fmt+lint+check. While we were migrating, that's the boat we were in.