I've recently started using environments, but one ...
# general
a
I've recently started using environments, but one thing I find weird is that it leaves docker containers around, is there a way to clean them up?
Copy code
$ docker ps
CONTAINER ID   IMAGE          COMMAND     CREATED          STATUS          PORTS     NAMES
e8fa94310526   740597b0607a   "/bin/sh"   6 minutes ago    Up 6 minutes              distracted_goodall
f3a5fd43045b   740597b0607a   "/bin/sh"   14 minutes ago   Up 14 minutes             compassionate_benz
a2af13d93af7   740597b0607a   "/bin/sh"   23 minutes ago   Up 23 minutes             adoring_joliot
5ec3a6301923   740597b0607a   "/bin/sh"   28 minutes ago   Up 28 minutes             elated_moser
839f4194c6d7   740597b0607a   "/bin/sh"   2 hours ago      Up 2 hours                optimistic_goldwasser
5fdb7e77c4e3   740597b0607a   "/bin/sh"   4 hours ago      Up 4 hours                boring_chebyshev
0871cf8a7d80   740597b0607a   "/bin/sh"   4 hours ago      Up 4 hours                agitated_mcclintock
0268f7be9469   740597b0607a   "/bin/sh"   4 hours ago      Up 4 hours                busy_murdock
d64c5190284e   740597b0607a   "/bin/sh"   4 hours ago      Up 4 hours                sharp_colden
f
I assume you use the
pantsd
daemon?
If so, has
pantsd
be running a long time (or at least as long as those containers)?
a
I mean I do, but I'm not sure no pantsd doesn't do the same
f
My point is that the container removal should happen on pantsd shutdown. So if pantsd never restarts, then the containers would just accumulate.
a
Yeah that's not true
f
But this could be a bug. That understanding on my part is just supposition which needs investigation.
a
Sorry, let me rephrase, it could be that it's supposed to happen, but it doesn't.
And even if it was, what if the pantsd thing dies?
It's a container, let it go 🙂
What's worse is that if you restart docker, you have to restart pantsd too
Hm, it does kill them sometimes.
f
if pantsd dies it no longer has the container ID in memory and the containers are not currently marked as having come from Pants
(in any way which some sort of orphan cleanup algorithm could operate)
a
Or... and hear me out, the containers could be just made to die after building
Okay, I got it. If you run pants with
--no-pantsd
, which you need to in order for it to pick up a new image, this happens
Not sure if the --no-pantsd leaves something around, or it just causes pantsd to get confused.
f
when I wrote this feature, I recall container startup overhead was significant which is why I wrote the container caching in https://github.com/pantsbuild/pants/pull/16801
a
haha, significant. it takes 2 minutes to resolve dependencies so it can start the package job 😄
Okay, anyway, --no-pantsd leaves stuff around.
f
well that doesn't mean Pants has other performance issues. But no caching containers will make performance worse.
a work around for now could be to add some metadata to the Docker containers allowing them to be purged easily
a
I'll go have a shower and make a ticket
👍 1
Though, to be fair, we're on 2.18, it might've been fixed
f
maybe try with one of the 2.22rc's but I imagine that code has not seen updates in a while.
git log
says I have not touched the Docker code since October 2022
a
Yeah, I have a 250k line change PR trying to update pants to 2.21, it's not even half way there 😄
image.png
f
wowzer. (side note: I think our official recommendation is to go version by version to ease such large PRs, but I don't know how much of your PR is related to particular changes.)
a
Started in April, still trying to get it to work, heh
Ah, the problem isn't that, it's the fact that we have to use resolves.
So whichever the version that forces you to do that is, that's the issue.
f
got it
a
Sorry, forgot which one it is, I started this in April, heh
Or, well, technically, a coworker started in August of last year, heh
image.png
Anyway, that's a whole different saga. As I said, shower, then ticket. Thanks for the info, not sure I would've made the connection between --no-pantsd and these containers, been running this hundreds of times today
f
There is also the fact that the shudown code is run on a timeout. If it takes too long, Pants will skip the shutdown tasks.
You can try increasing the
--session-end-tasks-timeout
option's value and see if it helps.
any way, will respond further on the issue you are writing
a
Let me try that
Nope, no effect.
Copy code
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ IMAGE_TAG=test pants --no-pantsd  --session-end-tasks-timeout=30000 package apps/airflow_db_reader/::
21:24:41.46 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:24:48.60 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:24:50.31 [INFO] Canceled: Building apps.airflow_db_reader/bin.pex with 44 requirements: Flask==2.2.5, Jinja2==3.1.4, SQLAlchemy==1.4.47, aioredis==1.3.1, aioredlock==0.7.2, async_generator==1.10, bcrypt==3.2.1, boto3-stubs[cog... (855 characters truncated)
21:25:29.33 [INFO] Completed: Building apps.airflow_db_reader/bin.pex with 44 requirements: Flask==2.2.5, Jinja2==3.1.4, SQLAlchemy==1.4.47, aioredis==1.3.1, aioredlock==0.7.2, async_generator==1.10, bcrypt==3.2.1, boto3-stubs[cog... (855 characters truncated)
21:25:31.57 [INFO] Completed: Building docker image <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
21:25:31.57 [INFO] Wrote dist/apps.airflow_db_reader/bin.pex
21:25:31.57 [INFO] Wrote dist/apps.airflow_db_reader/docker.docker-info.json
Built docker image: <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
Docker image ID: sha256:dee2899d916e4a9c8186890773b782b7a8114d90316af37854c4d2cee0904765
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$
Not sure if that's seconds or milliseconds, but it added another container
Copy code
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ IMAGE_TAG=test pants --no-pantsd  --session-end-tasks-timeout=30000 package apps/airflow_db_reader/::
21:27:02.88 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:27:10.06 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:27:14.83 [INFO] Completed: Building docker image <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
21:27:14.83 [INFO] Wrote dist/apps.airflow_db_reader/bin.pex
21:27:14.83 [INFO] Wrote dist/apps.airflow_db_reader/docker.docker-info.json
Built docker image: <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
Docker image ID: sha256:dee2899d916e4a9c8186890773b782b7a8114d90316af37854c4d2cee0904765
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ docker ps
CONTAINER ID   IMAGE          COMMAND     CREATED          STATUS          PORTS     NAMES
7fc73d78040f   740597b0607a   "/bin/sh"   5 seconds ago    Up 4 seconds              distracted_jackson
fa438395b029   740597b0607a   "/bin/sh"   2 minutes ago    Up 2 minutes              sleepy_sanderson
0eaa0bacc764   740597b0607a   "/bin/sh"   4 minutes ago    Up 4 minutes              interesting_margulis
2a22184da2e4   740597b0607a   "/bin/sh"   5 minutes ago    Up 5 minutes              festive_kare
84ac58338527   740597b0607a   "/bin/sh"   16 minutes ago   Up 16 minutes             flamboyant_kapitsa
fad6602caf80   740597b0607a   "/bin/sh"   19 minutes ago   Up 19 minutes             charming_cori
8064619bd141   740597b0607a   "/bin/sh"   26 minutes ago   Up 26 minutes             epic_swanson
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ IMAGE_TAG=test pants --no-pantsd  --session-end-tasks-timeout=30000 package apps/airflow_db_reader/::
21:27:30.31 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:27:37.33 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:27:41.85 [INFO] Completed: Building docker image <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
21:27:41.85 [INFO] Wrote dist/apps.airflow_db_reader/bin.pex
21:27:41.85 [INFO] Wrote dist/apps.airflow_db_reader/docker.docker-info.json
Built docker image: <http://614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test|614596319632.dkr.ecr.eu-west-1.amazonaws.com/cr-airflow-db-reader:test>
Docker image ID: sha256:dee2899d916e4a9c8186890773b782b7a8114d90316af37854c4d2cee0904765
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$
^[[Acbirzan@GP3CMXYV9V:~/PycharmProjects/cr_pythdocker ps
CONTAINER ID   IMAGE          COMMAND     CREATED          STATUS          PORTS     NAMES
cdbddcf74077   740597b0607a   "/bin/sh"   9 seconds ago    Up 8 seconds              hardcore_chebyshev
7fc73d78040f   740597b0607a   "/bin/sh"   36 seconds ago   Up 36 seconds             distracted_jackson
fa438395b029   740597b0607a   "/bin/sh"   2 minutes ago    Up 2 minutes              sleepy_sanderson
0eaa0bacc764   740597b0607a   "/bin/sh"   4 minutes ago    Up 4 minutes              interesting_margulis
2a22184da2e4   740597b0607a   "/bin/sh"   5 minutes ago    Up 5 minutes              festive_kare
84ac58338527   740597b0607a   "/bin/sh"   17 minutes ago   Up 17 minutes             flamboyant_kapitsa
fad6602caf80   740597b0607a   "/bin/sh"   20 minutes ago   Up 20 minutes             charming_cori
8064619bd141   740597b0607a   "/bin/sh"   26 minutes ago   Up 26 minutes             epic_swanson
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$
f
You are building Docker images with Pants? The Docker invocations for that are separate from the
docker_environment
logic.
a
Yeah, I'm just too lazy to figure out what the name of the pex_binary is in each app
We... don't really standardise this 😞
Copy code
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ docker ps | wc -l
      10
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ pants --no-pantsd  --session-end-tasks-timeout=30000 package apps/airflow_db_reader:bin
21:31:20.72 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:31:29.38 [INFO] Wrote dist/apps.airflow_db_reader/bin.pex
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ docker ps | wc -l
      11
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ pants --no-pantsd  --session-end-tasks-timeout=30000 package apps/airflow_db_reader:bin
21:31:36.80 [INFO] Reading /Users/cbirzan/PycharmProjects/cr_python/.python-version to determine desired version for [python-bootstrap].search_path.
21:31:45.50 [INFO] Wrote dist/apps.airflow_db_reader/bin.pex
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$ docker ps | wc -l
      12
cbirzan@GP3CMXYV9V:~/PycharmProjects/cr_python$
I could argue that it didn't even need a docker container for that 😄
Actually, hm, where's the cache for this stored...
f
The container is "cached" but still running in Docker during the Pans run.
a
Yeah, must be in my local cache directory, otherwise it couldn't build that in seconds.
f
Ah yeah, Pants still caches process outputs locally even with
docker_environment
a
Macs are soooo slow for anything IO related 😞
okay, yeah, 28GB of cache in
~/.cache/pants
heh
f
garbage collection of the Pants local cache is its own basket of fun 🙂
a
As is trying to make a cache key for circleci, the literal opposite of this issue 😄
I've made https://github.com/pantsbuild/pants/issues/21328, but it was a rush job, really, I'm trying now to get a reproducible example in a repo
Okay, not gonna be able to do this tonight, will try tomorrow. Feel free to ask me for more info there, will take care of it tomorrow morning(ish) EU time,
I'm trying to create a reproducible repo, but...
The docker integration is really cache-y, I have to restart pantsd every time I do anything