Hi everyone! We've been using Pants for our Python...
# general
b
Hi everyone! We've been using Pants for our Python-based machine learning/data science monorepo to test and build multiple libraries and PEX executables that share common dependencies. Now we are running the pants test and binary goals as part of our CI/CD pipeline (Gitlab CI). To take advantage of Pants' caching system, we persist the
.cache
and
.pants.d
directories from one CI pipeline to the next. However, we are now facing the issue that the
.pants.d
directory regularly grows to 15+GB, which is a lot to download/upload in each CI stage and we are actually running into disk space issues with shared Gitlab runners. Does someone know if there is a way to purge or clean the cache directories without losing the ability to only rebuild the changed projects within the monorepo? We've naively tried to only persist the
.cache
directory from one run to the next, however, that lead to every project being rebuilt in every CI run. Any help or input would be appreciated!
👋 1
👀 1
w
hey Florian!
so, unfortunately, the v1 caching model is not applied very thoroughly to python… you won’t be able to cache hit for many common things, which is why preserving the
.cache
directory might not be getting much.
f
the Pants CI has some code to purge caches in Travis which may be of interest as to what paths to focus on: https://github.com/pantsbuild/pants/blob/master/build-support/bin/prune_travis_cache.sh
w
@better-nail-12700: to be 100% clear though, you mean that you’re stashing the content of the
Copy code
--cache-read-from="['<str>', '<str>', ...]" (default: [])
The URIs of artifact caches to read directly from. Each entry is a URL of a RESTful cache, a path of a filesystem cache, or a pipe-separated list of alternate caches to choose from. This list is also used as input to the resolver. When resolver is 'none' list is used as is.
directory?
@fast-nail-55400: some of that will only apply to v2
@better-nail-12700: also, which version of pants are you using?
b
Hi! Thanks for the quick reply. We are calling pants without the
--cache-read-from
param. However, if my understanding is correct, Pants will default to the
.cache
and
.pants.d
directory to store dependencies and intermediary build artifacts. We are manually persisting those two directories between pipeline runs. The
.cache
directory is actually not too big, however,
.pants.d
grows to very large sizes. We are using Pants version 1.26.0.
w
yea, the
--cache-read-from
option has a default value (although you cannot tell it from the docsite! oy). can check that you’re caching the right thing by running something like
./pants options | grep 'cache.*read.*from'
b
The largest two subdirectories of
.pants.d
are
python-setup
and
pyprep
if that is relevant information
w
yep.
b
Yes,
cache.read_from
points to the correct
.cache
directory
w
ok
not a whole lot is able to be cached for python in v1.
b
Do you by any chance know how pants determines, which projects need to be re-tested and re-built? As I understand, Pants calculates hashes of the code, however, where are these stored?
w
but. we have been working for a while now on a new python backend, atop a new engine for pants.
b
Yeah, I've actually just read this this morning that v2 is about to be released. Super exciting!
👖 1
w
@better-nail-12700: in v1, they are stored in the
.pants.d/build_invalidator
directory
but the
.pants.d
directory isn’t really supposed to be a public api… so i can’t really recommend trying to cache portions of it
depending on your usecases though, you might already be able to use v2: https://pants.readme.io/docs/pants-v1-vs-v2
b
Oh, that is interesting. I will definitely look into that.
w
linting, formatting, testing, bundling, setup-py’ing, etc are working and being used by intrepid folks.
oh, and related to @fast-nail-55400’s comment (thanks!), how to cache for v2 is much better documented: https://pants.readme.io/docs/using-pants-in-ci#directories-to-cache
@better-nail-12700: does your team have any custom plugins for pants?
b
We are using
pantsbuild.pants.contrib.node
to build an Angular app inside our monorepo with pants, however, we could theoretically build that outside of Pants
w
ah… yea, unfortunately that would be necessary to use v2/2.0, although we’re looking at how to prioritize our second+ language
b
I'm actually trying to get v2 running right now. Per the new documentation, it looks like I would need v1.29.0 in order to run in v2 mode, however, v1.29.0 is not released yet. What am I missing?
w
the most recent stable version is
1.28.0
… it has reasonably good support for v2
1.29.0
will likely be released this week… it’s currently on rc3
does the docsite say that 1.29 is required somewhere?
b
Ah, I was using the
dynamic_ui
setting in pants.toml which seems to be not supported by 1.28.
w
mm: it was renamed between 1.28 and 1.29. sorry for the trouble
v2_ui
in 1.28, with a deprecation at 1.29.
b
I was looking at this page and just copied the settings: https://pants.readme.io/docs/pants-v1-vs-v2.
No worries 🙂
w
in general, we recommend upgrading version at a time, because we ensure that things are deprecated for at least one stable version
@better-nail-12700: got it. that’s good feedback, because we probably shouldn’t have 1.29 be the default landing page until it has seen a stable release!
i’m going to grab lunch, but will be around afterward.
🔙 1
b
Ok, cool. I'll play around with v2 for a little bit and get back to you 🙂
h
I’m around too now if I can help with anything with upgrading to the v2 engine!
b
Ok, from the first look: it works and it's super fast, awesome! However, I'm having trouble with one of the projects. When I try to run it or the corresponding tests, it throws the following error:
Copy code
ModuleNotFoundError: No module named 'pkg_resources'
💯 1
I guess it's an issue with setuptools
h
Yay! Hm, it likely is. Does the test or one of the transitive dependencies directly import
pkg_resources
? Do you have
setuptools
in your
requirements.txt
already?
h
Might be a missing dependency on setuptools? v2 is much stricter about isolationg dependencies.
👍 1
h
(On a related note of dependencies, we added support in v2 Python for lockfiles. See https://pants.readme.io/docs/python-third-party-dependencies#using-a-lockfile-recommended. Once this setuptools issue gets resolved, we recommend doing this as a way to make your builds more stable and reproducible)
b
The error is actually thrown by a third party library trying to import
pkg_resources
. I've added setuptools to the requirements.txt, but that did not do the trick 🤔
The package is
trafilatura
that's causing the issue
👍 1
h
You’ll then want to add the requirements target to either the
python_tests
target or whichever target uses that problematic third party library. For example, if your requirements.txt is located at
3rdparty/python
, then add to the
dependencies
field
3rdparty/python:setuptools
The package is trafilatura that’s causing the issue
It’s common for open source projects to leave off
setuptools
as a dependency, and almost always it’s an oversight. Fairly easy to fix. See https://github.com/pytest-dev/pytest-rerunfailures/pull/98 for an example if you’d be interested in making a contribution to the
trafilatura
project
b
That did the trick 👍
💯 1