I’ve got a predicament with our local mac, and lin...
# general
g
I’ve got a predicament with our local mac, and linux based CI/CD processes. We have a PEX artifact that gets packaged into a docker container. Due to some wheel compatibility issues, the PEX must be build for a linux x86 system. To get around this issue we started using environments (in particular a
docker_environment
) to build the PEX and subsequent
docker_image
target. Everything works great locally until it came time to publish our
docker_image
from CI/CD (GitHub Actions self-hosted runners). The build fails with the following error when I specify a
docker_environment
on the
pex_binary
target:
Copy code
Exception: Failed to obtain version from local Docker: error trying to connect: No such file or directory (os error 2)
As best as I can tell the
DOCKER_HOST
environment variable isn’t getting passed down into the build process and it’s not finding the docker binary when trying to run
docker -v
. I’ve specified
DOCKER_HOST
and a number of other vars in the
docker.env_vars
table and have also tried supplying them to the
docker_environment
target itself. I’ve seen a few somewhat-related issues around this in the slack history, and so I’m wondering if there’s an issue with my pants config or if
docker_environment
just isn’t ready for primetime. (I’m on pants 2.15.0, but have tried other versions too)
w
i believe that
2.15.1rc2
fixes this.
👀 1
g
👏 oh nice, I’ll give this a shot right now. Thank you @witty-crayon-22786.
Actually, I tested with
2.15.1rc2
yesterday and still ran into that error. I’m also not seeing any mention of
DOCKER_HOST
in the debug logs either.
Copy code
18:12:17.69 [ERROR] 1 Exception encountered:

Engine traceback:
  in select
    ..
  in pants.core.goals.publish.run_publish
    `publish` goal
  in pants.core.goals.publish.package_for_publish
    ..
  in pants.core.goals.package.environment_aware_package
    ..
  in pants.backend.docker.goals.package_image.build_docker_image
    ..
  in pants.backend.docker.util_rules.docker_build_context.create_docker_build_context
    ..
  in pants.core.goals.package.environment_aware_package
    ..
  in pants.backend.python.goals.package_pex_binary.package_pex_binary
    ..
  in pants.backend.python.util_rules.pex.create_pex
    ..
  in pants.backend.python.util_rules.pex_from_targets.create_pex_from_targets
    ..
  in pants.backend.python.util_rules.python_sources.prepare_python_sources
    ..
  in pants.core.util_rules.source_files.determine_source_files
    Get all relevant source files - environment:docker_x86
  in pants.engine.internals.graph.hydrate_sources
    Hydrate the `sources` field - (environment:docker_x86, packages/moz-service-proto/moz_service_proto/central_point.proto:../proto)
  in pants.backend.codegen.protobuf.python.rules.generate_python_from_protobuf
    Generate Python from Protobuf - environment:docker_x86
  in pants.backend.python.util_rules.pex_environment.find_pex_python
    Prepare environment for running PEXes - environment:docker_x86
  in pants.core.util_rules.subprocess_environment.get_subprocess_environment
    ..
  in pants.engine.internals.platform_rules.environment_vars_subset
    ..
  in pants.engine.internals.platform_rules.complete_environment_vars
    ..
  in pants.engine.process.fallible_to_exec_result_or_raise
    ..
  in pants.core.util_rules.environments.extract_process_config_from_environment
    ..

Traceback (no traceback):
  <pants native internals>
Exception: Failed to obtain version from local Docker: error trying to connect: No such file or directory (os error 2)
w
what is
DOCKER_HOST
on the relevant machine?
g
<tcp://localhost:2376>
w
got it. it looks like we might only support
unix://
domain sockets currently
g
And that’s via the
docker_environment
, right? Everything works as expected before adding the
docker_environment
(except we can’t build PEX/Docker images on our macs)
w
yes. the
docker_environment
implementation uses a native client library rather than forking the
docker
cli
g
Gotcha, that makes sense, although kind of frustrating. We’re using docker-in-docker inside of GHA which is likely why this is so complicated. The fix for our situation is probably some kind of logic that says to only mess around with
docker_environments
when running on our macs, and use the default behavior when inside of CI/CD)
w
i’ll be honest: i don’t know the pros/cons of
tcp
vs
unix
sockets in this context. is that a choice that you made when setting up Docker here?
it looks like i could add a flag here (or attempt to do some automatic fallback), but some more context would help
mm… according to https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-socket-option , it also seems like because you’re using
2376
, an encrypted connection might be necessary
g
I’m not exactly sure either. We’re using https://github.com/actions/actions-runner-controller/ for our self-hosted runners. Although we’re using an outdated version of it and it looks like they may have switched away from the TCP socket recently.
w
ok. it looks like some very basic detection would fix the
unix
vs
tcp
aspect of this, but the TLS/SSL aspect is still unclear to me (it doesn’t get a unique address string, afaict).
g
Thank you Stu!
Follow up here, I went through the process of upgrading our actions-runner-controller helm chart to the latest version, 0.18.0 -> 0.23.3, and the
DOCKER_HOST
variable is updated to
unix:///run/docker/docker.sock
inside our runners now. However, I’m still running into issues on
2.15.1rc2
. When I have both a
local_environment
and a
docker_environment
specified (only my
pex_binary
targets are specifically using this environment), I get this issue:
Copy code
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
When I remove the
local_environment
and leave only the
docker_environment
, I get the following issue:
Copy code
Engine traceback:
  in `publish` goal
  in Building local distributions - environment:docker_x86

ProcessExecutionFailure: Process 'Extract environment variables from the Docker image python:3.8.16' failed with exit code 126.
stdout:
OCI runtime exec failed: exec failed: unable to start container process: chdir to cwd ("/pants-sandbox/pants-sandbox-nsMAG4") set in config.json failed: no such file or directory: unknown
w
hm. are you sure that this is “docker in docker”, and not “docker from docker”?
if you’re connecting to a docker daemon outside of your container, the container won’t be able to expose files to other spawned containers
g
🤔 I’m not sure about that - it’s likely docker-from-docker since we’re using K8S to run our GHA runners. I think my first solution was the right one where I had one
local_environment
(linux_x86) and one
docker_environment
. The
local_environment
falls back to the
docker_environment
so that CI/CD uses the
local_environment
to build the pex and our Mac’s use the
docker_environment
. The issue there is that the
DOCKER_HOST
environment variable doesn’t seem to get passed through so it’s looking for
unix:///var/run/docker.sock
instead of
unix:///run/docker/docker.sock
inside of CI/CD
All of this just worked inside of CI/CD before we tried to enable environments. Environments has unblocked us locally, but blocked us in CI/CD. Is there a way to just enable environments when running outside of CI/CD but disable them when inside of CI/CD via
pants.ci.toml
or something like that?
w
Yes: there is an example of that in the docs.
g
oh, I read through that early on but haven’t revisited it. I think that may be the magic solution here. Thanks Stu, I’ll give that a try here shortly.
w
The
local_environment
falls back to the
docker_environment
so that CI/CD uses the
local_environment
to build the pex and our Mac’s use the
docker_environment
.
The issue there is that the
DOCKER_HOST
environment variable doesn’t seem to get passed through so it’s looking for
unix:///var/run/docker.sock
instead of
unix:///run/docker/docker.sock
inside of CI/CD
i’m not sure whether i understand the setup for this one: would you mind filing an issue about it?
g
Yeah I can do that
Thanks for your help again Stu. I tried overriding which environment to use but still ran into the same issue. I raised an issue about it here: https://github.com/pantsbuild/pants/issues/18915