Hey all! awesome to see the questions here, and wa...
# general
d
Hey all! awesome to see the questions here, and wanted to throw one out there; happy to provide a minimal example with some more time. Basically, we'd like to be able to bake the dependencies of our repos into a base docker image so that the CI/CD pipeline will be as fast as possible. So for example, for the purpose of CI/CD pipelines: 1. we have an image that has all dependencies for our project-v1.0 installed into it, but not project-v1.0 a. Specifically,
poetry install --no-root
or whatever the equivalent command is (forgive me, going off the top of my head) 2. During merge request pipelines, the pipeline adds the current state of the repo to the container 3. (today) we run
poetry install --sync
in the container a. This allows any new dependencies to be added dynamically b. old dependencies to be removed c. Installs the library under test d. Is faster than a "clean" build because poetry knows that several of the dependencies are already installed in the virtual environment 4. We then run linting/security scanning/ tests / etc We'd really like to be able to replicate the above with pants, while keeping the benefits of how pants normally operates (with the hermetic execution, dep inference, etc) I tried building a pex and working with that, but i'm running into a few issues (with and without the pex; it's all a bit messy atm while i've been trying to debug / explore): •
pants repl ::
(just used for my interactive testing) in the container fails with a message saying manulinux versions of the dependent libraries are not available ◦ This is technically true; they aren't (it's numpy and such + a few custom deps we don't currently build manylinux versions of) ◦ What i'd expect / want to happen is for pants to automatically build the required wheels from the sdist (which is available on the private mirror, and pants is also able to reach pypi) ◦ Oddly, I think this might be working if I do something like
./pants test ::
in that it does exactly what i'd expect (build the whls from sdist) but I don't know that it's actually baking/caching the whl files into the image as opposed to re-downloading them •
./pants export ::
does indeed work, and if I
source ./path/to/venv/bin/activate
and
python -i; import my_library; import my_library_dep_that_won't_work_with_repl
that all works as expected. Additionally, my question at a high level is how can I bake the required whl files into the image so that they are available and will be used and also support ./pants downloading and using new deps as needed in the pipeline? And how can I prove / verify that they are actually there / being used? I thought I might need to run
./pants (re?)generate-lockfiles
but after opening them up they actually aren't platform specific (which was a duh! moment once I realized it) For what it's worth, I was seeing the above (no manylinux versions available) messages both with and without the pex_build. I kind of suspect this is something related to how i'm doing things (probably wrongly, although i've been pouring over the (well-written!) docs)? So happy to provide a minimal example once I have some time 🙂 I think in current state I arguably could just ignore the issues with
./pants repl ::
in the container because I believe
./pants test ::
and
./pants package
etc are working? but I don't want to just ignore it since it seems like this is something that should work. Also, with current state, i'm not certain that it isn't re-downloading the whl files on each new command; is there info about how/where pants caches the whl files it uses? Can someone confirm if pants is creating a separate .pex for each command / sub-portion of each command? • This actually would make a lot of sense for the hermetic execution reasons, and appears to be what's happening given the output messages
❤️ 1
g
I am curious to understand how this would be more effective than caching pants. Presumably you'd also have to re-build the docker image when dependencies diverge enough that it starts taking too long to build again. I would have though that caching, particularly with REAPI would be more effective.
d
Thanks for taking the time to respond! TBH, caching just hasn't made it yet on my list; it very well may end up being better, but I knew there were some concerns I had (more me / us problems than pants problems) so as a first step was simply trying to replicate what I knew worked 🙂 Spot on with needing to rebuild the docker image when the deps diverge; this is a very real fact of life. Typically done when a deployment is cut, which is fairly often anyhow. Additionally, thanks to pants, building the image and pushing it is no longer the manual pain that it used to be
result of the above is that all changes currently being worked are using the image from the last deployment's dependencies as their baseline, with the ability to always update the CI deps image if there's a strong reason to out-of-cycle
also ACK that we're iterating on this process currently, so definitely not suggesting that it's necessarily the best way of doing things
g
Thanks for answering my question - it all makes sense 🙂 I don't know enough about
pants
to answer your questions. However, if I was tackling it, I might start with a
Dockerfile
with the minimal base libs and mount the repo into it and build it from there so it pulls everything in before committing/pushing. I also don't know enough about docker to know if this would work 😕 So this may be of zero use! Presumably, you could include it as part of the build process to rebuild and push out the docker image as well so that it's up to date. Good luck in any case - and hope you get it working 🙂