Only outstanding TODO on the Python `run` <PR> is ...
# development
b
Only outstanding TODO on the Python
run
PR is how to switch between sandbox mode. The friendliest solution IMO is a new
run
flag, however this flag is only relevant when `run`ning a "scripting" language ("compiled" languages wouldn't benefit). So pros/cons. The alternative is a field on
python_source
and
python_test
targets, which is a bit funkier and has no obvious CLI support (we could add a "default" CLI option to
python
, but now we're talking even more machinery) Thoughts y'all?
👀 1
h
cc @sparse-lifeguard-95737 (we were just DMing about the importance of the sandbox mode option for them)
1
b
I'm pretty sure the default value will (eventually) be the "run in repo" value, as it follows the element of least surprise. But again, sad to have a flag that is irrelevant for a
docker_image
or
go_binary
. 😞
Another thing to consider is we might want a field as well? Some scripts might not work well in-repo? The likelihood of this should be rare though, IMO and can usually be made to work in-repo with a bit of tweaking
s
As a user it would make sense to me if “run” on a pex target was always sandboxed and “run” on a python file was always in-repo
👆 1
But I have put much less thought into it than you all 😄 I just want to stop explaining why the Django management CLI doesn't Just Work in pants to my wider eng team lol
🙌 1
2
b
FWIW
run
on a PEX target will be sandboxed, but that really won't matter anymore in the long run, as we'll be running the built PEX. We'll still set
CWD
to repo root as well
FWIW A naive person's opinion matters significantly. There are few people who understand the nuance, and so Pants should do the obvious thing, as that what the majority will experience 😉
1
💯 1
w
I’m pretty sure the default value will (eventually) be the “run in repo” value, as it follows the element of least surprise.
because it is not sandboxed, i don’t think that it will be a good default.
b
@witty-crayon-22786 you'll have to be more specific, because to me, I'd prefer very strongly we don't have this "gotcha" for our users to stumble on
w
when we need to make tradeoffs between correctness and ergonomics, i think that we need to continue to bias toward correctness.
@bitter-ability-32190: the gotcha when it comes to correctness is “hm, why is
--loop
or
--changed
broken?”
b
Can you ELI5 why
--loop
or
--changed
would be broken?
w
because the entire pythonpath entry is added, rather than only the detected files/dependencies
and only the detected files will cause the run to be invalidated
b
ELI3? 🙂 I still don't follow. Also how often are people using
run
standalone vs
run
with those flags? I'm guessing at least an order of magnitude in difference
w
the “run without sandbox” flag adds `PYTHONPATH=some/dir:…`… inside of
some/dir
, inference has detected that you depend on
some/dir/a.py
, but not on
some/dir/b.py
. when
some/dir/b.py
changes,
--loop
or
--changed
will not detect your script, because the file wasn’t a dep.
basically: sandboxing works at the file level… but running without the sandbox works at the directory level.
b
If I don't depend on
b.py
and I change it, why would I expect the run to be restarted?
Also, it really seems we're making this right for the 2% chance people are using those flags, and wrong for the 98% chance they aren't 😅
w
the reason dependency inference is safe is because sandboxing is enforcing that if we get dependency inference wrong and fail to detect that you need something, the run fails
so, yes: inference would need to fail to detect
b.py
as a dep as well. perhaps easier to imagine as a resource which needed to be explicitly declared
and wrong for the 98% chance they aren’t
--changed
is one of the two recommended modes of CI, so it’s more common than that.
b
Ah excellent, yes, let's talk about resources! This is a huge win for us for our scripts to not HAVE to declare resources for those! Why declare the dependency for depencencies sake? Why would we want the user to have their
run
fail complaining about a missing file when we could make it succeed?
--changed
is one of the two recommended modes of CI, so it’s more common than that.
--changed
with goals like
fmt
and
lint
? sure. Goals like
run
? Seems sketchy.
👍 1
h
I don't know what the right tradeoff here is. But
run
is a very weird goal that already violates several of our hermeticity expectations. For example, if you use a filesystem API like
open()
, you don't need to declare `file`/`resource` targets for those deps. We already do "bad" things
1
b
A golden example is Django's
migrate.py
script. It isn't used in prod, it isn't a part of a test. Let's allow users to fall into the Pit of Success.
w
--changed-.. --filter=… list | xargs -L1 ./pants run
is easier to imagine: “run changed deployment scripts”
h
--changed-.. --filter=… list | xargs -L1 ./pants run is easier to imagine: “run changed deployment scripts”
We error if you don't give exactly 1 argument to
run
1
w
that’s what the -L1 does
h
and ignore the rest? When this came up last week w/ a user, I recommended that they save the output of
--changed-since list
and then loop over it calling
./pants run
multiple times
b
Also I fail to see how
--changed
with
list
is part of a conversation about `run`'s sandbox behavior 😐
w
anyway. this whole thing was off topic for this thread i think. i just think that there should be more discussion before making an unsandboxed mode the default.
b
True, OP was about "should we make this a
run
flag"
w
and ignore the rest?
no, it runs them sequentially, one at a time.
Also I fail to see how
--changed
with
list
is part of a conversation about `run`’s sandbox behavior
because you won’t detect that it needs to
run
… it’s the same dependencies list.
b
because you won’t detect that it needs to
run
I'm really not following this 😅 You're just providing an arg to run and we run it.
h
"undeclared dependencies" will mean that
--changed-dependees
and
--loop
does not work properly. Even if the script executes, our understanding of your repo is incomplete, and we might not show something has been changed properly My point is that this is already the case with the
run
goal. It's already an impure goal Unclear to me what to decide here, though
b
Ah I see. Yeah I don't think we should be enforcing
run
purity for `run`'s sake. If you want metadata purity,
test
your code. "We don't run your code like normal because we might not have gotten all your dependencies right, and in the event you really rely on those dependencies being right, but aren't validating it through tests, we want to ensure you know it by failing your
run
" doesn't quite strike me as something ergonomic.
And it still isn't true if you don't use source roots and use namespace packages (like I do) because the CWD ends up in
sys.path
anyways so we still "find" the in-repo file 🤷‍♂️
h
If you want metadata purity, test your code.
Or, your unowned-dependencies mechanism! I generally agree that
run
is the wrong medium to be enforcing deps are correct exhaustive. We already don't do that We should document that though as a warning
1
b
We should document that though as a warning
Yeah. Both options have thorns which need thorough documentation. IMO one options has less thorns 😛 --- Back on topic though.
run
flag or field? 🙂
(And back off topic, just hit another datapoint for sandbox-by-default) After adding
--debug-adapter
support, it will only work if we run the user's code in-repo. More frustratingly, nothing will crash or no errors or warnings will appear (unless we implement them) after connecting the client. Just silently runs the code in the sandbox to completion (because of the path mismatch).
s
Checking in on this - since
2.13.0a0
was released without reverting the
run_in_sandbox
field on the
pex_binary
target, does that mean it will definitely be present in the 2.13.0 release? or there’s still a chance it’ll end up reverted?
b
I think we should probably decide that. @happy-kitchen-89482? Of we could rename it to experimental so we can drop it without deprecating?
Dan, you could do what I did in the meantime and add an in-repo plugin with this support
h
I should mention that running unsandboxed should not be very commonly necessary. Using this to escape having to declare deps properly could be misuse. The legitimate use is the Django makemigrations case - Django uses
__path__
magic to determine a write location based on the location of a loaded module. But this is not super common.
s
django is the reason why I want it
h
Right, and so
run_in_sandbox
solves that case for you
If we remove it, then we have to offer the alternative of always running unsandboxed
At first blush that seems maybe dangerous to me
OTOH I see the argument that we might want
./pants run path/to/file.py
to be equivalent to
python path/to/file.py
, where things ~just work
b
But this is not super common.
It's very common for a lot of our scripts in-house 🙂 Even in Pants,
generate-docs
would now be writing in-repo. As is
generate_github_workflows
and
contributors.py
😉
OTOH I see the argument that we might want
./pants run path/to/file.py
to be equivalent to
python path/to/file.py
, where things ~just work
So that's my proposal (in a nutshell) 😉 Remember Winston? He would be wasting time trying to figure out why running the script through Pants isn't working like when he runs it through his venv. We got off topic here, let me open a new thread
s
to be clear, I don’t really care either way about
run_in_sandbox
on
pex_binary
vs. un-sandboxed
run
directly on a python script - I can make either one work for my purposes. I’ve had people breathing down my neck about django management CLI + pants not working as expected, and was going to point at the new
run_in_sandbox
field and say “look it’s going to work soon!” when I saw chatter here about possibly reverting the addition of that field in 2.13. I don’t mind if it’s reverted, I just want to know if it’s going to be reverted so I don’t bait-and-switch my team
1
b
FWIW Dan, you can do the "right thing" today with an in-repo plugin if that floats your boat. I can assist. We have this in our repo at work 💪
Then when Pants catches up upstream you can upgrade-and-remove plugin and it should "just work" (TM)
s
I might take you up on that depending on how the other thread goes 🙂
👍 1
h
OK, so to clarify - there will definitely be a way to achieve this. We won't remove
run_in_sandbox
until we have some alternative.
b
Since this thread already got hijacked to talk about the default value: https://github.com/pantsbuild/pants/pull/15849/ is out of "draft" and has the default set to "in-repo".