I m trying to get my head around caching with Gitlab CI rath Pants #general

I’m trying to get my head around caching with Gitl...

chilly-holiday-77415

01/11/2023, 10:49 PM

I’m trying to get my head around caching with Gitlab CI rather than Github - I was hoping someone had an example I could crib from since following the documentation makes me feel like I’m a bit hamstrung

chilly-holiday-77415

01/11/2023, 11:44 PM

it’s particularly cache invalidation I’m a bit thrown by in a Gitlab sense, especially as the documentation feels a bit like it’s just funnelling people towards toolchain and remote caching rather than presumably the much more typical “just cache your CI runner state” workflow

happy-kitchen-89482

01/12/2023, 2:33 AM

I wrote a post a while back on why CI provider caching doesn't work well. In a nutshell, you can't effectively "just cache your CI runner state", which is why the documentation talks about remote caching as a better solution.

happy-kitchen-89482

01/12/2023, 2:34 AM

We're personally more familiar with Github Actions, which is why the Pants docs do discuss that as an option, so those docs will at least tell you which directories you might want to cache, and how to key them

happy-kitchen-89482

01/12/2023, 2:34 AM

If anyone has a Gitlab example to share, that would be great

ripe-kitchen-64238

01/12/2023, 12:27 PM

😂 It's nice when coming to ask a question and finding it has only just been answered already! (even if unfortunately in this case the answer is no there isn't a GitLab example). I will have a read and see if it looks like attempting to port to the GitHub example to GitLab directly makes sense 🤔

chilly-holiday-77415

01/12/2023, 12:28 PM

I’m currently trying to make sense of doing it myself while quite new to gitlab ci - happy to share if I manage to succeed! First step was redirecting cache location to inside CI_PROJECT_DIR, which was last night’s endeavor

chilly-holiday-77415

01/12/2023, 12:31 PM

I think this is the tricky bit:

Copy code

If you're not using a fine-grained remote caching service, then you may also want to preserve the local Pants cache at $HOME/.cache/pants/lmdb_store. This has to be invalidated on any file that can affect any process, e.g., hashFiles('**/*') on GitHub Actions.

since Gitlab doesn’t have a concept of globbing for hashes, just a limited feeling helper - https://docs.gitlab.com/ee/ci/yaml/index.html#cachekeyfiles

ripe-kitchen-64238

01/12/2023, 12:31 PM

For some other projects, I found the only practical way was running my own GitLab runners and cache locally 🤔 [this was for caching in general...trying to store venvs and so on rather than pants], otherwise it all took a crazy amount of time and burnt through minutes far too quickly to work longer term, I can see that happening here if I've understood correctly, re cache sizes of pants growing over time 🤔

ripe-kitchen-64238

01/12/2023, 12:32 PM

[I'm very new to Pants in general btw, so it'll be a while before I get to a "useful" point 🙂 ]

chilly-holiday-77415

01/12/2023, 12:32 PM

yeah, Gitlab instances being so inconsistent isn’t going to help making this easy either - I’m lucky enough to already have a fleet of private runners and a distributed cache configured

😅 1

ripe-kitchen-64238

01/12/2023, 12:33 PM

Much to read!

happy-kitchen-89482

01/12/2023, 12:49 PM

Yeah, see that post for why caching

$HOME/.cache/pants/lmdb_store

may not be worth it - you'd have to invalidate it on any change to any file, so it's not fine-grained enough to be useful. Pretty quickly the increasing time it takes to restore that growing dir will exceed the diminishing benefit you get from restoring it from a generic restore key that has drifted ever further away from your current state...

👍 1

happy-kitchen-89482

01/12/2023, 12:50 PM

remote caching is very fine-grained - it caches every process result separately, against a key hashed from exactly the inputs to that process.

chilly-holiday-77415

01/12/2023, 2:07 PM

so the reason Gitlab probably doesn’t have many great examples is that gitlab CI really isn’t very good at this 😬 I’m perservering to at least try and provide an example for future but I miss Github

ripe-kitchen-64238

01/12/2023, 2:10 PM

🤔 as in for the recursive hashFiles, for example? Maybe it's time I switched anyway, their more recent pricing models were very offputting 😞

ripe-kitchen-64238

01/12/2023, 2:14 PM

[I haven't read through enough so may have totally missed the point so far on this one but...] could you work round that by implementing a hashFiles -> file as part of a Job, and then use that file as the key for the cache?

chilly-holiday-77415

01/12/2023, 2:14 PM

the “file as key” function actually creates a hash off the commit sha rather than md5, so I’m doing exactly that

chilly-holiday-77415

01/12/2023, 2:14 PM

https://gitlab.com/gitlab-org/gitlab/-/issues/18986#note_216077621

👀 1

chilly-holiday-77415

01/12/2023, 3:42 PM

well - I seem to have functional cache invalidation in Gitlab, but also not doing anything useful now! I think

lmdb_store

was hitting this metric but invalidating that in gitlab is a job for another day. More concerningly:

local_cache_total_time_saved_ms: 0

- jobs are just rebuilding pexes anyway, so my guess is for some reason gitlab isn’t allowing pexes to be reproducible between builds

chilly-holiday-77415

01/12/2023, 3:43 PM

I can see the pex’s

__main__.py

has some pretty unique looking stuff in the shebang so I’m assuming something per-invocation is being consumed in the gitlab runner environment and not consistent between executions as a best guess

thousands-continent-27390

06/22/2024, 4:40 PM

I'm trying to solve the same problem without using a remote caching! Were you able to make any progress with just using basic gitlab caching?

chilly-holiday-77415

06/24/2024, 8:17 AM

I didn’t - it was more a side quest rather than a problem that was worth the time to solve so I gave up

👍 1

thousands-continent-27390

06/24/2024, 3:45 PM

I see, thanks for sharing!

8 Views

Open in Slack

Previous Next