For the feature of running builds inside Docker co...
# development
h
For the feature of running builds inside Docker containers, it sounds given that people want this to be cutomizable per target, right? This test runs in Centos7, this one in Centos8, this other one in both? https://github.com/pantsbuild/pants/issues/13682
s
per-target customization could be useful but IMO we’d get most of the value from a single global config to say “execute everything in image X instead of directly on the host”
that would save us from weird Mac/Linux differences
then per-target images could be a polish/improvement so we could have a thin image for most targets and a fat one for things like our bioinformatics tests that need many more system libraries
w
Out of curiosity, does global configs differ from just running a docker container containing Pants against your repo?
s
🤔 I guess not in the core functionality, just the UX of it (being able to
./pants <goal>
instead of
docker run ….. ./pants <goal>
)
👍 1
been awhile since I thought about this, need to refresh my mental wishlist 😂
maybe a per-goal / per-subsystem config instead of a global one? i.e. I don’t need to run
fmt
in a container, but I do want to run `test`s and
package
of
pex_binary
targets in one
1
w
I don't disagree with that 🙂 I use a docker container that pulls in everything, but it would be nice to avoid that... "nice" rather than necessary. Where I find a lot of value is packaging and testing for stuff that I'm deploying to microservices, or certain architectures, or stuff like that.
s
and while I could run
package
of
docker_image
targets in an image, the docker-in-docker setup can get messy so I’d probably prefer to run it natively as long as it was possible to do so while still building any linked PEXes inside the image
w
Ahhh, right - there is a possible docker in docker workflow there. I'd second that
package
and
test
have the most obvious value, goal-wise, over lints/formatters/introspection tools being inside containers.
b
check
maybe as well? Requires compilers for some backends
s
On the flip side not really needed for
mypy
- makes me lean towards a per-subsystem option instead of a per-goal option in the ideal
Per-subsystem defaults with per-target overrides? 🤔
w
there is a lot of subtlety here, but my hope is that an initial version which only supports: 1. running the code (i.e., invoking a python interpreter) 2. packaging the code …in docker would be sufficient for a first version (in which case it might be triggered by
test_in_environment=X
and
package_for_environment=Y
fields for tests and packages, respectively). but it definitely makes sense to think through what it might look like to control the environment of other tools
h
but IMO we’d get most of the value from a single global config to say “execute everything in image X instead of directly on the host”
This is specifically what I'm trying to figure out: would it be sufficient to only have a single global toggle that is one environment, or will we need to allow more granularity. That impacts the design I strongly suspect people will want more granularity
👍 1
p
The choice of running in a container vs on the host depends on the host too. So, it might have to be an option in ~/.pantsrc Eg - MacOS host vs native Linux Host. I have everything available on my Linux Host so, I wouldn't want to use docker there, but it would be immensely useful on my Mac. So I think configuring this in
pants.toml
or BUILD files is problematic.
h
Oof that's a good point, Jacob. Stu and I also discussed how some environments are not possible to run from your machine: on macOS, I can run on macOS or Linux via Docker. On Linux, I can't run on macOS (outside of remote execution)
p
So maybe targets need
platform_constraints
(and/or
os_constraints
) similar to
interpreter_constraints
?
Then the host can have config about which platforms are available locally vs via docker.
h
Right now, the thinking is an
environment
field. The "Environment" will have all the relevant config like
[python-bootstrap].search_paths
and env vars. It will also likely say whether to run on localhost vs remote execution vs Docker image. So, I'm thinking that if you say
python_test(environment="centos7_docker")
, that implies to run in the Docker container I'm trying to figure this modeling out today. It's tricky...so many use cases, and ideally we don't make Pants harder to use if you don't need this Docker feature
p
I can imagine something similar (config wise) being used to wire up
qemu
or other virtualization tech to use virtualized arm64 from an x86_64 host.
w
Oof that’s a good point, Jacob. Stu and I also discussed how some environments are not possible to run from your machine: on macOS, I can run on macOS or Linux via Docker. On Linux, I can’t run on macOS (outside of remote execution)
yea: there will almost certainly need to be a mechanism for matching the current host “fuzzily” against the configured Environments: for example, if an Environment doesn’t specify a platform, then it applies to all platforms, etc.
@proud-dentist-22844: yea, that and remote execution 🤐 … that’s part of the reason why an Environment is not == to a docker image, per-se.
👍 2
w
I can imagine something similar (config wise) being used to wire up
qemu
or other virtualization tech to use virtualized arm64 from an x86_64 host.
Oh how direly I would like this... My mishmash of qemu scripts building arm OSes and packages is ... There's no emoji to describe my pain
👍 1
p
How do we disambiguate
Environment
and env vars?
w
In many cases they are very related: an Environment contains the config for all env vars. It also contains other types of config.
But although variable names are frequently
env
, when they mean environment variables, I'm not too worried about confusion.
(although if we come up with a better name than Environment for this concept, that's fine with me)
2
h
@proud-dentist-22844 why would you not want to use Docker when you're on Linux? Part of the motivation for this feature is that every one can use the identical environment, including CI
w
docker is relatively cheap on linux, but it’s still indirection (particularly through an extra virtual filesystem)… imo, it is still a goal to support running without docker if the local environment matches.
👍 1
h
Alright do these user stories look accurate? Any suggestions for stories to add? https://docs.google.com/document/d/1vXRHWK7ZjAIp2BxWLwRYm1QOKDeXx02ONQWvXDloxkg/edit#heading=h.fqc8ud3uhszq this project makes my head hurt hah, so many competing priorities
Oh oh oh oh oh idea. Differentiate between "local environments" vs force-to-run-on-this-environment, like Docker and Remote Execution. For example: • define 2 "local" environments, one for Linux and one for macOS • define a Docker environment that no matter what uses a particular Docke image • if
run_in_container
is not set, then use a local environment: whichever is compatible • if
run_in_container
is set, it needs to be a docker environment The main idea here is that "local" environments let you set platform-specific config, without forcing all targets to say when to use what. Pants can figure it out -- if we go with the target approach, we could even have
local_environment
and
docker_environment
be distinct targets
w
yea, probably. i think that i have assumed that any Environment which doesn’t have an associated docker image is eligible to be matched locally.
👍 1
p
Calling it a "runtime environment" makes more sense than just "environment". And yes, this is a headache inducing set of problems to solve.
On my Gentoo box, I have many different versions of python installed which I would like to reuse. Adding docker makes development more difficult because hooking up a debugger is more complex. Hermeticity is excellent, but it has the downside of environmental assumptions leaking into the code when it should be able to run in a wider variety of environments. My developing on Gentoo, a system few others use, really helps at pushing against the faulty assumptions about "all systems" that will run the software. I've also worked on software that was fundamentally incompatible with network namespaces, and still other software that leveraged creating many network namespaces, so these could not run in a container. These goals don't apply equally to every project I work on--they are some of the reasons I avoid or have avoided docker.
👍 1
❤️ 1
h
Thanks! I came up with a sketch that will satisfy that 🙂
Copy code
docker_environment(prefer_local_environment=True)
If you have a
local_environment
with
compatible_platforms
that match yours, we won't use Docker
😎 1
p
Cool!