<#17345 `experimental_shell_command`'s sandbox and...
# github-notifications
q
#17345 `experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs New issue created by huonw Describe the bug When a
experimental_shell_command
(
:second
) depends on another
experimental_shell_command
(
:first
), the sandbox for
:second
include the dependencies of
:first
, rather than just the output of
:first
. https://gist.github.com/huonw/c55d0b387ed6030cda611898ee2d0361 provides a reproducer, where
:first
generates
output.txt
by 'consuming'
input.txt
(but does nothing with it), and
:second
includes a
sleep
to demonstrate when it is running. The
archive
package includes all the
.txt
files that ended in the sandbox for `:second`: I'd expect it to only be the direct output of
:first
(
output.txt
), not the
input.txt
dependency of
:first
. BUILD file from that gist for convenience:
Copy code
file(name="input", source="input.txt")
experimental_shell_command(
    name="first",
    command="""
    echo "doesn't affect output"
    echo contents > output.txt
    """,
    tools=["echo"],
    dependencies=[":input"],
    outputs=["output.txt"]
)

experimental_shell_command(
    name="second",
    command="""
    sleep 3 # make 'actually running' obvious
    """,
    dependencies=[":first"],
    tools=["sleep"],
    outputs=["*.txt"],
)
archive(name="archive", files=[":second"], format="zip")
Copy code
git clone <mailto:git@gist.github.com|git@gist.github.com>:c55d0b387ed6030cda611898ee2d0361.git
cd c55d0b387ed6030cda611898ee2d0361
./pants version # 2.13.0

# initial build (takes about 3 seconds to run :second, due to sleep)
./pants package ::

# check package output:
unzip -l dist/archive.zip # two files: input.txt output.txt

# validating the cache works as expected (~instantaneous)
./pants package ::

# BUG: change to :first dependencies, that doesn't affect output... 
echo 'new contents' > input.txt
# ... reruns :second, and thus takes ~3 seconds
./pants package ::

# EXPECTED: change to :first script that doesn't affect output...
sed -i '' 's/affect output/affect output still/' BUILD
# .... doesn't rerun :second, only :first
./pants package ::
AFAICT, there's no way to break `:second`'s importing of
:input
, e.g. adjusting to
dependencies=[":first", "!:input"]
doesn't do anything: the behaviour is the same (and using
!!:input
isn't supported). The
:input
being
file
is just for convenience. This appears to apply to all other target types, e.g. a
pex_binary
or yet another
experimental_shell_command
. I haven't checked how this behaves when depending on explicit codegen targets, only these adhoc
experimental_shell_command
ones. Pants version 2.13.0 OS macOS Additional info Background: how are we using
experimental_shell_command
to hit this? We're using
experimental_shell_command
to try to bridge the gap between Pants supported code and unsupported code (JS/TS), as well as for adhoc codegen tasks (to avoid avoid having to write and maintain a plugin). This can result in 'long' chains of `experimental_shell_command`s that depend on each other (and other resources), e.g.:
Copy code
# some app:
python_sources()
pex_binary(name="app", ...)

# codegen the schema itself:
experimental_shell_command(name="schema", dependencies=[":app"], outputs=["schema.json"], command="./app.pex export-schema > schema.json", tools=["python3.9"])

# set up the NPM dependencies:
experimental_shell_command(name="node_modules", outputs=["node_modules/**"], command="npm ci", ...)

# use the schema to do codegen or generate docs or whatever:
experimental_shell_command(name="codegen-from-schema", dependencies=[":node_modules", ":schema"], outputs=["codegen/**"], command="npm run do-codegen < schema.json", ...)
experimental_shell_command(name="docs-from-schema", dependencies=[":node_modules", ":schema"], outputs=["docs/**"], command="npm run generate-docs < schema.json", ...)

# continues with targets using :codegen-from-schema and :docs-from-schema... (`archive`, `experimental_shell_command` and `experimental_run_shell_command`)
The Python code that goes into
pex_binary
changes regularly and thus the PEX itself changes too, but the exported
schema.json
doesn't change so much (i.e. we're often refactoring/adding features/fixing bugs without changing the schema). In theory, if
:app
changes but the schema doesn't, `:schema`'s dependees (
:codegen-from-schema
,
:docs-from-schema
) don't need to run, but those dependees pull in the
app.pex
file from
:app
, and thus do rerun. Rerunning can take significant time. (See also: https://pantsbuild.slack.com/archives/C046T6T9U/p1666315386420799) pantsbuild/pants