<https github com pantsbuild pants issues 17345| 17345 `expe Pants #github-notifications

<#17345 `experimental_shell_command`'s sandbox and...

quaint-telephone-89068

10/26/2022, 2:02 AM

#17345 `experimental_shell_command`'s sandbox and invalidation includes transitive dependencies of `experimental_shell_command` dependencies, not just outputs New issue created by huonw Describe the bug When a

experimental_shell_command

(

:second

) depends on another

experimental_shell_command

(

:first

), the sandbox for

:second

include the dependencies of

:first

, rather than just the output of

:first

. https://gist.github.com/huonw/c55d0b387ed6030cda611898ee2d0361 provides a reproducer, where

:first

generates

output.txt

by 'consuming'

input.txt

(but does nothing with it), and

:second

includes a

sleep

to demonstrate when it is running. The

archive

package includes all the

.txt

files that ended in the sandbox for `:second`: I'd expect it to only be the direct output of

:first

(

output.txt

), not the

input.txt

dependency of

:first

. BUILD file from that gist for convenience:

Copy code

file(name="input", source="input.txt")
experimental_shell_command(
    name="first",
    command="""
    echo "doesn't affect output"
    echo contents > output.txt
    """,
    tools=["echo"],
    dependencies=[":input"],
    outputs=["output.txt"]
)

experimental_shell_command(
    name="second",
    command="""
    sleep 3 # make 'actually running' obvious
    """,
    dependencies=[":first"],
    tools=["sleep"],
    outputs=["*.txt"],
)
archive(name="archive", files=[":second"], format="zip")

Copy code

git clone <mailto:git@gist.github.com|git@gist.github.com>:c55d0b387ed6030cda611898ee2d0361.git
cd c55d0b387ed6030cda611898ee2d0361
./pants version # 2.13.0

# initial build (takes about 3 seconds to run :second, due to sleep)
./pants package ::

# check package output:
unzip -l dist/archive.zip # two files: input.txt output.txt

# validating the cache works as expected (~instantaneous)
./pants package ::

# BUG: change to :first dependencies, that doesn't affect output... 
echo 'new contents' > input.txt
# ... reruns :second, and thus takes ~3 seconds
./pants package ::

# EXPECTED: change to :first script that doesn't affect output...
sed -i '' 's/affect output/affect output still/' BUILD
# .... doesn't rerun :second, only :first
./pants package ::

AFAICT, there's no way to break `:second`'s importing of

:input

, e.g. adjusting to

dependencies=[":first", "!:input"]

doesn't do anything: the behaviour is the same (and using

!!:input

isn't supported). The

:input

being

file

is just for convenience. This appears to apply to all other target types, e.g. a

pex_binary

or yet another

experimental_shell_command

. I haven't checked how this behaves when depending on explicit codegen targets, only these adhoc

experimental_shell_command

ones. Pants version 2.13.0 OS macOS Additional info Background: how are we using

experimental_shell_command

to hit this? We're using

experimental_shell_command

to try to bridge the gap between Pants supported code and unsupported code (JS/TS), as well as for adhoc codegen tasks (to avoid avoid having to write and maintain a plugin). This can result in 'long' chains of `experimental_shell_command`s that depend on each other (and other resources), e.g.:

Copy code

# some app:
python_sources()
pex_binary(name="app", ...)

# codegen the schema itself:
experimental_shell_command(name="schema", dependencies=[":app"], outputs=["schema.json"], command="./app.pex export-schema > schema.json", tools=["python3.9"])

# set up the NPM dependencies:
experimental_shell_command(name="node_modules", outputs=["node_modules/**"], command="npm ci", ...)

# use the schema to do codegen or generate docs or whatever:
experimental_shell_command(name="codegen-from-schema", dependencies=[":node_modules", ":schema"], outputs=["codegen/**"], command="npm run do-codegen < schema.json", ...)
experimental_shell_command(name="docs-from-schema", dependencies=[":node_modules", ":schema"], outputs=["docs/**"], command="npm run generate-docs < schema.json", ...)

# continues with targets using :codegen-from-schema and :docs-from-schema... (`archive`, `experimental_shell_command` and `experimental_run_shell_command`)

The Python code that goes into

pex_binary

changes regularly and thus the PEX itself changes too, but the exported

schema.json

doesn't change so much (i.e. we're often refactoring/adding features/fixing bugs without changing the schema). In theory, if

:app

changes but the schema doesn't, `:schema`'s dependees (

:codegen-from-schema

:docs-from-schema

) don't need to run, but those dependees pull in the

app.pex

file from

:app

, and thus do rerun. Rerunning can take significant time. (See also: https://pantsbuild.slack.com/archives/C046T6T9U/p1666315386420799) pantsbuild/pants

Open in Slack

Previous Next