https://pantsbuild.org/ logo
#general
Title
# general
b

better-nail-12700

06/10/2020, 7:53 PM
Hi everyone! We've been using Pants for our Python-based machine learning/data science monorepo to test and build multiple libraries and PEX executables that share common dependencies. Now we are running the pants test and binary goals as part of our CI/CD pipeline (Gitlab CI). To take advantage of Pants' caching system, we persist the
.cache
and
.pants.d
directories from one CI pipeline to the next. However, we are now facing the issue that the
.pants.d
directory regularly grows to 15+GB, which is a lot to download/upload in each CI stage and we are actually running into disk space issues with shared Gitlab runners. Does someone know if there is a way to purge or clean the cache directories without losing the ability to only rebuild the changed projects within the monorepo? We've naively tried to only persist the
.cache
directory from one run to the next, however, that lead to every project being rebuilt in every CI run. Any help or input would be appreciated!
👋 1
👀 1
w

witty-crayon-22786

06/10/2020, 7:55 PM
hey Florian!
so, unfortunately, the v1 caching model is not applied very thoroughly to python… you won’t be able to cache hit for many common things, which is why preserving the
.cache
directory might not be getting much.
f

fast-nail-55400

06/10/2020, 7:57 PM
the Pants CI has some code to purge caches in Travis which may be of interest as to what paths to focus on: https://github.com/pantsbuild/pants/blob/master/build-support/bin/prune_travis_cache.sh
w

witty-crayon-22786

06/10/2020, 7:58 PM
@better-nail-12700: to be 100% clear though, you mean that you’re stashing the content of the
Copy code
--cache-read-from="['<str>', '<str>', ...]" (default: [])
The URIs of artifact caches to read directly from. Each entry is a URL of a RESTful cache, a path of a filesystem cache, or a pipe-separated list of alternate caches to choose from. This list is also used as input to the resolver. When resolver is 'none' list is used as is.
directory?
@fast-nail-55400: some of that will only apply to v2
@better-nail-12700: also, which version of pants are you using?
b

better-nail-12700

06/10/2020, 8:05 PM
Hi! Thanks for the quick reply. We are calling pants without the
--cache-read-from
param. However, if my understanding is correct, Pants will default to the
.cache
and
.pants.d
directory to store dependencies and intermediary build artifacts. We are manually persisting those two directories between pipeline runs. The
.cache
directory is actually not too big, however,
.pants.d
grows to very large sizes. We are using Pants version 1.26.0.
w

witty-crayon-22786

06/10/2020, 8:06 PM
yea, the
--cache-read-from
option has a default value (although you cannot tell it from the docsite! oy). can check that you’re caching the right thing by running something like
./pants options | grep 'cache.*read.*from'
b

better-nail-12700

06/10/2020, 8:07 PM
The largest two subdirectories of
.pants.d
are
python-setup
and
pyprep
if that is relevant information
w

witty-crayon-22786

06/10/2020, 8:07 PM
yep.
b

better-nail-12700

06/10/2020, 8:07 PM
Yes,
cache.read_from
points to the correct
.cache
directory
w

witty-crayon-22786

06/10/2020, 8:07 PM
ok
not a whole lot is able to be cached for python in v1.
b

better-nail-12700

06/10/2020, 8:09 PM
Do you by any chance know how pants determines, which projects need to be re-tested and re-built? As I understand, Pants calculates hashes of the code, however, where are these stored?
w

witty-crayon-22786

06/10/2020, 8:09 PM
but. we have been working for a while now on a new python backend, atop a new engine for pants.
b

better-nail-12700

06/10/2020, 8:09 PM
Yeah, I've actually just read this this morning that v2 is about to be released. Super exciting!
👖 1
w

witty-crayon-22786

06/10/2020, 8:09 PM
@better-nail-12700: in v1, they are stored in the
.pants.d/build_invalidator
directory
but the
.pants.d
directory isn’t really supposed to be a public api… so i can’t really recommend trying to cache portions of it
depending on your usecases though, you might already be able to use v2: https://pants.readme.io/docs/pants-v1-vs-v2
b

better-nail-12700

06/10/2020, 8:13 PM
Oh, that is interesting. I will definitely look into that.
w

witty-crayon-22786

06/10/2020, 8:14 PM
linting, formatting, testing, bundling, setup-py’ing, etc are working and being used by intrepid folks.
oh, and related to @fast-nail-55400’s comment (thanks!), how to cache for v2 is much better documented: https://pants.readme.io/docs/using-pants-in-ci#directories-to-cache
@better-nail-12700: does your team have any custom plugins for pants?
b

better-nail-12700

06/10/2020, 8:20 PM
We are using
pantsbuild.pants.contrib.node
to build an Angular app inside our monorepo with pants, however, we could theoretically build that outside of Pants
w

witty-crayon-22786

06/10/2020, 8:22 PM
ah… yea, unfortunately that would be necessary to use v2/2.0, although we’re looking at how to prioritize our second+ language
b

better-nail-12700

06/10/2020, 8:27 PM
I'm actually trying to get v2 running right now. Per the new documentation, it looks like I would need v1.29.0 in order to run in v2 mode, however, v1.29.0 is not released yet. What am I missing?
w

witty-crayon-22786

06/10/2020, 8:28 PM
the most recent stable version is
1.28.0
… it has reasonably good support for v2
1.29.0
will likely be released this week… it’s currently on rc3
does the docsite say that 1.29 is required somewhere?
b

better-nail-12700

06/10/2020, 8:30 PM
Ah, I was using the
dynamic_ui
setting in pants.toml which seems to be not supported by 1.28.
w

witty-crayon-22786

06/10/2020, 8:31 PM
mm: it was renamed between 1.28 and 1.29. sorry for the trouble
v2_ui
in 1.28, with a deprecation at 1.29.
b

better-nail-12700

06/10/2020, 8:31 PM
I was looking at this page and just copied the settings: https://pants.readme.io/docs/pants-v1-vs-v2.
No worries 🙂
w

witty-crayon-22786

06/10/2020, 8:31 PM
in general, we recommend upgrading version at a time, because we ensure that things are deprecated for at least one stable version
@better-nail-12700: got it. that’s good feedback, because we probably shouldn’t have 1.29 be the default landing page until it has seen a stable release!
i’m going to grab lunch, but will be around afterward.
🔙 1
b

better-nail-12700

06/10/2020, 8:34 PM
Ok, cool. I'll play around with v2 for a little bit and get back to you 🙂
h

hundreds-father-404

06/10/2020, 9:06 PM
I’m around too now if I can help with anything with upgrading to the v2 engine!
b

better-nail-12700

06/10/2020, 9:14 PM
Ok, from the first look: it works and it's super fast, awesome! However, I'm having trouble with one of the projects. When I try to run it or the corresponding tests, it throws the following error:
Copy code
ModuleNotFoundError: No module named 'pkg_resources'
💯 1
I guess it's an issue with setuptools
h

hundreds-father-404

06/10/2020, 9:16 PM
Yay! Hm, it likely is. Does the test or one of the transitive dependencies directly import
pkg_resources
? Do you have
setuptools
in your
requirements.txt
already?
h

happy-kitchen-89482

06/10/2020, 9:16 PM
Might be a missing dependency on setuptools? v2 is much stricter about isolationg dependencies.
👍 1
h

hundreds-father-404

06/10/2020, 9:17 PM
(On a related note of dependencies, we added support in v2 Python for lockfiles. See https://pants.readme.io/docs/python-third-party-dependencies#using-a-lockfile-recommended. Once this setuptools issue gets resolved, we recommend doing this as a way to make your builds more stable and reproducible)
b

better-nail-12700

06/10/2020, 9:40 PM
The error is actually thrown by a third party library trying to import
pkg_resources
. I've added setuptools to the requirements.txt, but that did not do the trick 🤔
The package is
trafilatura
that's causing the issue
👍 1
h

hundreds-father-404

06/10/2020, 9:41 PM
You’ll then want to add the requirements target to either the
python_tests
target or whichever target uses that problematic third party library. For example, if your requirements.txt is located at
3rdparty/python
, then add to the
dependencies
field
3rdparty/python:setuptools
The package is trafilatura that’s causing the issue
It’s common for open source projects to leave off
setuptools
as a dependency, and almost always it’s an oversight. Fairly easy to fix. See https://github.com/pytest-dev/pytest-rerunfailures/pull/98 for an example if you’d be interested in making a contribution to the
trafilatura
project
b

better-nail-12700

06/10/2020, 10:08 PM
That did the trick 👍
💯 1