https://pantsbuild.org/ logo
#general
Title
# general
c

chilly-holiday-77415

01/11/2023, 10:49 PM
I’m trying to get my head around caching with Gitlab CI rather than Github - I was hoping someone had an example I could crib from since following the documentation makes me feel like I’m a bit hamstrung
it’s particularly cache invalidation I’m a bit thrown by in a Gitlab sense, especially as the documentation feels a bit like it’s just funnelling people towards toolchain and remote caching rather than presumably the much more typical “just cache your CI runner state” workflow
h

happy-kitchen-89482

01/12/2023, 2:33 AM
I wrote a post a while back on why CI provider caching doesn't work well. In a nutshell, you can't effectively "just cache your CI runner state", which is why the documentation talks about remote caching as a better solution.
We're personally more familiar with Github Actions, which is why the Pants docs do discuss that as an option, so those docs will at least tell you which directories you might want to cache, and how to key them
If anyone has a Gitlab example to share, that would be great
r

ripe-kitchen-64238

01/12/2023, 12:27 PM
😂 It's nice when coming to ask a question and finding it has only just been answered already! (even if unfortunately in this case the answer is no there isn't a GitLab example). I will have a read and see if it looks like attempting to port to the GitHub example to GitLab directly makes sense 🤔
c

chilly-holiday-77415

01/12/2023, 12:28 PM
I’m currently trying to make sense of doing it myself while quite new to gitlab ci - happy to share if I manage to succeed! First step was redirecting cache location to inside CI_PROJECT_DIR, which was last night’s endeavor
I think this is the tricky bit:
Copy code
If you're not using a fine-grained remote caching service, then you may also want to preserve the local Pants cache at $HOME/.cache/pants/lmdb_store. This has to be invalidated on any file that can affect any process, e.g., hashFiles('**/*') on GitHub Actions.
since Gitlab doesn’t have a concept of globbing for hashes, just a limited feeling helper - https://docs.gitlab.com/ee/ci/yaml/index.html#cachekeyfiles
r

ripe-kitchen-64238

01/12/2023, 12:31 PM
For some other projects, I found the only practical way was running my own GitLab runners and cache locally 🤔 [this was for caching in general...trying to store venvs and so on rather than pants], otherwise it all took a crazy amount of time and burnt through minutes far too quickly to work longer term, I can see that happening here if I've understood correctly, re cache sizes of pants growing over time 🤔
[I'm very new to Pants in general btw, so it'll be a while before I get to a "useful" point 🙂 ]
c

chilly-holiday-77415

01/12/2023, 12:32 PM
yeah, Gitlab instances being so inconsistent isn’t going to help making this easy either - I’m lucky enough to already have a fleet of private runners and a distributed cache configured
😅 1
r

ripe-kitchen-64238

01/12/2023, 12:33 PM
Much to read!
h

happy-kitchen-89482

01/12/2023, 12:49 PM
Yeah, see that post for why caching
$HOME/.cache/pants/lmdb_store
may not be worth it - you'd have to invalidate it on any change to any file, so it's not fine-grained enough to be useful. Pretty quickly the increasing time it takes to restore that growing dir will exceed the diminishing benefit you get from restoring it from a generic restore key that has drifted ever further away from your current state...
👍 1
remote caching is very fine-grained - it caches every process result separately, against a key hashed from exactly the inputs to that process.
c

chilly-holiday-77415

01/12/2023, 2:07 PM
so the reason Gitlab probably doesn’t have many great examples is that gitlab CI really isn’t very good at this 😬 I’m perservering to at least try and provide an example for future but I miss Github
r

ripe-kitchen-64238

01/12/2023, 2:10 PM
🤔 as in for the recursive hashFiles, for example? Maybe it's time I switched anyway, their more recent pricing models were very offputting 😞
[I haven't read through enough so may have totally missed the point so far on this one but...] could you work round that by implementing a hashFiles -> file as part of a Job, and then use that file as the key for the cache?
c

chilly-holiday-77415

01/12/2023, 2:14 PM
the “file as key” function actually creates a hash off the commit sha rather than md5, so I’m doing exactly that
well - I seem to have functional cache invalidation in Gitlab, but also not doing anything useful now! I think
lmdb_store
was hitting this metric but invalidating that in gitlab is a job for another day. More concerningly:
local_cache_total_time_saved_ms: 0
- jobs are just rebuilding pexes anyway, so my guess is for some reason gitlab isn’t allowing pexes to be reproducible between builds
I can see the pex’s
__main__.py
has some pretty unique looking stuff in the shebang so I’m assuming something per-invocation is being consumed in the gitlab runner environment and not consistent between executions as a best guess
6 Views