I’m trying to get my head around caching with Gitl...
# general
c
I’m trying to get my head around caching with Gitlab CI rather than Github - I was hoping someone had an example I could crib from since following the documentation makes me feel like I’m a bit hamstrung
it’s particularly cache invalidation I’m a bit thrown by in a Gitlab sense, especially as the documentation feels a bit like it’s just funnelling people towards toolchain and remote caching rather than presumably the much more typical “just cache your CI runner state” workflow
h
I wrote a post a while back on why CI provider caching doesn't work well. In a nutshell, you can't effectively "just cache your CI runner state", which is why the documentation talks about remote caching as a better solution.
We're personally more familiar with Github Actions, which is why the Pants docs do discuss that as an option, so those docs will at least tell you which directories you might want to cache, and how to key them
If anyone has a Gitlab example to share, that would be great
r
😂 It's nice when coming to ask a question and finding it has only just been answered already! (even if unfortunately in this case the answer is no there isn't a GitLab example). I will have a read and see if it looks like attempting to port to the GitHub example to GitLab directly makes sense 🤔
c
I’m currently trying to make sense of doing it myself while quite new to gitlab ci - happy to share if I manage to succeed! First step was redirecting cache location to inside CI_PROJECT_DIR, which was last night’s endeavor
I think this is the tricky bit:
Copy code
If you're not using a fine-grained remote caching service, then you may also want to preserve the local Pants cache at $HOME/.cache/pants/lmdb_store. This has to be invalidated on any file that can affect any process, e.g., hashFiles('**/*') on GitHub Actions.
since Gitlab doesn’t have a concept of globbing for hashes, just a limited feeling helper - https://docs.gitlab.com/ee/ci/yaml/index.html#cachekeyfiles
r
For some other projects, I found the only practical way was running my own GitLab runners and cache locally 🤔 [this was for caching in general...trying to store venvs and so on rather than pants], otherwise it all took a crazy amount of time and burnt through minutes far too quickly to work longer term, I can see that happening here if I've understood correctly, re cache sizes of pants growing over time 🤔
[I'm very new to Pants in general btw, so it'll be a while before I get to a "useful" point 🙂 ]
c
yeah, Gitlab instances being so inconsistent isn’t going to help making this easy either - I’m lucky enough to already have a fleet of private runners and a distributed cache configured
😅 1
r
Much to read!
h
Yeah, see that post for why caching
$HOME/.cache/pants/lmdb_store
may not be worth it - you'd have to invalidate it on any change to any file, so it's not fine-grained enough to be useful. Pretty quickly the increasing time it takes to restore that growing dir will exceed the diminishing benefit you get from restoring it from a generic restore key that has drifted ever further away from your current state...
👍 1
remote caching is very fine-grained - it caches every process result separately, against a key hashed from exactly the inputs to that process.
c
so the reason Gitlab probably doesn’t have many great examples is that gitlab CI really isn’t very good at this 😬 I’m perservering to at least try and provide an example for future but I miss Github
r
🤔 as in for the recursive hashFiles, for example? Maybe it's time I switched anyway, their more recent pricing models were very offputting 😞
[I haven't read through enough so may have totally missed the point so far on this one but...] could you work round that by implementing a hashFiles -> file as part of a Job, and then use that file as the key for the cache?
c
the “file as key” function actually creates a hash off the commit sha rather than md5, so I’m doing exactly that
well - I seem to have functional cache invalidation in Gitlab, but also not doing anything useful now! I think
lmdb_store
was hitting this metric but invalidating that in gitlab is a job for another day. More concerningly:
local_cache_total_time_saved_ms: 0
- jobs are just rebuilding pexes anyway, so my guess is for some reason gitlab isn’t allowing pexes to be reproducible between builds
I can see the pex’s
__main__.py
has some pretty unique looking stuff in the shebang so I’m assuming something per-invocation is being consumed in the gitlab runner environment and not consistent between executions as a best guess
t
I'm trying to solve the same problem without using a remote caching! Were you able to make any progress with just using basic gitlab caching?
c
I didn’t - it was more a side quest rather than a problem that was worth the time to solve so I gave up
👍 1
t
I see, thanks for sharing!