<@U06A03HV1> I’m looking into caching a bit (I sho...
# development
c
@witty-crayon-22786 I’m looking into caching a bit (I should know better, I know 😛 ) and find this.. odd. 🧵
Running (repeatedly):
Copy code
╰─❯ ./pants -ltrace check src/python/pants/backend/visibility/rules.py
I observe (amongst the thousands lines of log, these two in particular):
Copy code
17:42:51.82 [DEBUG] Completed: Hit: Local cache lookup: Determine distributions found in mypy.pex
17:42:51.82 [DEBUG] Completed: Scheduling: Determine distributions found in mypy.pex
There’s nothing strange with that, a cache hit, go figure 🙂 Now, when I run the same command again, but with debug rather than trace level:
Copy code
╰─❯ ./pants -ldebug check src/python/pants/backend/visibility/rules.py
I don’t see those lines at all…
w
this is probably related to the difference between caching and memoization?
once a
@rule
is memoized (i.e. held in memory by
pantsd
), it doesn’t need to re-run
cache hits only come into play the first time a
@rule
runs in a particular process (i.e. before it is memoized)
c
but why the difference between the log levels, then?
btw, I found a few stray processes, killing them off, I got the Determine message also in debug (but without the cache hit) but that makes more sense now at least…
Copy code
95153   ??  S      0:00.03 /Users/andreas.stenius/.cache/pants/named_caches/pex_root/venvs/s/3f67de6f/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
95154   ??  S      0:00.40 /Users/andreas.stenius/.cache/pants/named_caches/pex_root/venvs/s/3f67de6f/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
95194   ??  S      0:00.04 /Users/andreas.stenius/.cache/pants/named_caches/pex_root/venvs/s/3f67de6f/venv/bin/python3.9 -E -s -c from multiprocessing.resource_tracker import main;main(14)
95195   ??  S      0:00.11 /Users/andreas.stenius/.cache/pants/named_caches/pex_root/venvs/s/3f67de6f/venv/bin/python3.9 -E -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=18, pipe_handle=38) --multiprocessing-fork
^^ killed these
Copy code
17:57:23.93 [DEBUG] Completed: Scheduling: Determine distributions found in mypy.pex
was present now with
-ldebug
maybe time to sign off for today 😛
w
yeaaa. i am not 100% certain that
pantsd
invalidates correctly when the log level changes. it really ought to restart in that case.
c
rrriight 😬
w
it’s marked
daemon=True
: https://github.com/pantsbuild/pants/blob/3bf22003b6ec2942077e0100e5b2453312229746/src/python/pants/option/global_options.py#L654-L669 … which should mean that
pantsd
restarts whenever it changes
c
ah, yep:
Copy code
18:04:03.55 [DEBUG] terminating pantsd
so that would rule out memoization, right?
w
on the first run after changing the log level, yes. not on the second
re-runs of your
-ltrace
session should be relatively quiet on the second run… you’re seeing that, right?
c
yep, I believe so. amount of lines logged are considerable fewer at least 🙂
ah, and on the re-run on
-ltrace
I don’t see the
Hit:
lines either, guess that’s due to memoization
w
right
c
ok thanks, goal is to wrap my head around enough of this, to find a way to figure out why I experience a lot of “building pex for requirements” when running lint/check/test goals as I would expect them to be valid between runs when there’s only source code changes (that doesn’t touch on imports)
w
step one would be figuring out whether what you are missing is memoization, or caching
if you’re missing memoization, then you might still hit the cache (and not actually run a process: just
@rules
)
c
well, I’m pretty sure it’s not cached, as I’ll have to wait ~1min every time…
w
if you’re missing the cache, then you can run with
-ldebug
and look for a “spawning” line, which will contain the complete inputs to the process, which define whether it hits the cache.
yea. then run with
-ldebug
, and take a look at the process signature
🙏 1
and see https://github.com/pantsbuild/pants/issues/14195#issuecomment-1114087237 for more on facilities we might add to help with this
c
oh, and possibly relevant, this is in a repo with remote_cache enabled..
(I did see a cache miss log in there)
w
but yea. just diffing two of those
spawned local
lines should help
c
just have to see if it’s cached the next run now, or not 🙂
cool, thanks!
w
the local/remote aspect shouldn’t be relevant… because it’s the exact same key used for both
c
gotcha
I’ll keep an eye out, right now it is caching properly, of course 😛
cont. this thread on caching pecularities, as perhaps related.. I found this when running
./pants check some/src/file
and the spawned process for “building requirements_venv.pex” where identical apart from this little tidbit:
Copy code
input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<f72844a672f0a780c2968ea0bacf9dd01cb90556316f66b0d01fdb55655b3f44>, size_bytes: 428 }, tree: "Some(..)" },
vs
Copy code
input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<4cc4a1c0af01fa1f08a1b6636a9e07408bb6fd8d440184e897d85bfcb5ce52bf>, size_bytes: 428 }, tree: "Some(..)" },
where the inputs ought to have been identical in both cases (only some time and other runs came in between..) Not much to go on here, more than thinking about what may go into the hash calculation..?
w
you can use
./cargo run -p fs_util -- directory materialize 4cc4a1c0af01fa1f08a1b6636a9e07408bb6fd8d440184e897d85bfcb5ce52bf 428 $(pwd)/dest
to check a digest out into a directory
👍 1
it would be neat to expose @bitter-ability-32190’s digest diffing from https://github.com/pantsbuild/pants/pull/14872
💯 1
c
Oh wow, I’ve completely missed that one. So cool, thanks @bitter-ability-32190 🙌
b
I did it in 2.12 so we could unify fmt/lint processes It might've been my first dive into Rust as well
c
bah, those entries are gone now.. I’ll try that next time I catch it, thanks Stu! (been cleaning caches inbetween then and now due to other issues ((with mypy)) )
@witty-crayon-22786 OK, I’ve got a diff between two digests, but
fs_util
continues to insist those are not in my store.. tried with another digest just in case, but that’s missing too, so I get the feeling I’m doing something wrong.
tried to run it like this as well..
Copy code
fs_util --local-store-path /Users/andreas.stenius/.cache/pants/lmdb_store directory materialize fc33a6ab65993fa57d52edbff44d8a8a89faa2ec0669950d238b481da5caf15c 91 ex
fs_util directories list
works though… 🤔
(i.e. gives me output)
and materializing one of those digests works fine..
does remote caching trip this logic up?
w
not remote caching, per-se… but yea, a remote store would need to be configured if one is in use in pants
c
this is in a project using pants, not the pants project itself..
but it’s a single store used across all pants daemons by default, right? (the one in
~/.cache/pants/lmdb_store
) ??
so don’t I have a copy of all digests locally too, after running a pants command?
w
With default settings, yes:
cache_content_behavior=fetch
Mm: so: make sure that you use an absolute path with `materialize`: the cargo script changes directories
c
there seems to be options in
fs_util
for talking to a remote..? but I don’t know if I have the values to put in there 😛
oh, that could be a thing… but it worked with my test invocation.. I’ll make another try
ah, I did have the
$(pwd)
in there already…
Copy code
fs_util directory materialize 482dcf98a500c7695987f5ec1681b639b4a320e7fa3a9d96f889da0a4e7c4e1f 420 $(pwd)/dbg1
Could not walk unknown directory at "": Was not present in the local store: Digest { hash: Fingerprint<482dcf98a500c7695987f5ec1681b639b4a320e7fa3a9d96f889da0a4e7c4e1f>, size_bytes: 420 }
w
Where did you get that digest? Is it a directory or file digest?
c
from the debug log output
it’s a
Copy code
+  complete: DirectoryDigest { digest: Digest { hash: Fingerprint<bc7b8a195c792ed86f878fef6fabdd84fad38d4fc5d174a3c27d35caa32a49ed>, size_bytes: 420 }, tree: "Some(..)" },
w
...oh, shoot. I just realized that we stopped persisting all DirectoryDigests for performance reasons: only when they go into cache entries/etc
c
oh.. 😬 so… I’m toast?
I got this diff:
Copy code
-  complete: DirectoryDigest { digest: Digest { hash: Fingerprint<482dcf98a500c7695987f5ec1681b639b4a320e7fa3a9d96f889da0a4e7c4e1f>, size_bytes: 420 }, tree: "Some(..)" },
+  complete: DirectoryDigest { digest: Digest { hash: Fingerprint<bc7b8a195c792ed86f878fef6fabdd84fad38d4fc5d174a3c27d35caa32a49ed>, size_bytes: 420 }, tree: "Some(..)" },
   nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" },
-  input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<482dcf98a500c7695987f5ec1681b639b4a320e7fa3a9d96f889da0a4e7c4e1f>, size_bytes: 420 }, tree: "Some(..)" },
+  input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<bc7b8a195c792ed86f878fef6fabdd84fad38d4fc5d174a3c27d35caa32a49ed>, size_bytes: 420 }, tree: "Some(..)" },
For a process:
Copy code
"Building 51 requirements for src.python/testing.pex from the 3rdparty/python/lockfiles/default.txt resolve: Brotli==1.0.9, Django==[...] zenpy==1.1.12"
w
I think that we'd need to add a flag to optionally persist digests in more cases... essentially just for this use case
👍 1
c
so, I would think the inputs to both of these processes (that have the same description and all) would have the same input digest… but since they didn’t that’s why it was rebuilt
w
Or to... conditionally persist things based on the log level =/
c
either way, but in order to find out what changes there are… we’d need to be able to peek inside I guess 🙂
we stopped persisting all DirectoryDigests
what does this imply for caching, though? are they rebuilt as needed from FileDigests, if that’s a thing?
w
Many (Directory)Digests exist only in memory as DigestTries: they can be used transparently without being on disk
Files are always persisted though
c
Ah, I recognize that issue (as, I’ve read it at some point in time) 🙂
w
because process outputs are (generally) cached, they always end up persisted: but process inputs won’t be necessarily
so would need to conditionally persist them if enabled
nearby the log message that you were using
c
Thanks. I’ll take a stab at it this weekend if not sooner..
w
funnily enough, i think that i need this as well. taking a stab at it now.
😁 1
drafted and cherry-picked to 2.15.x here: https://github.com/pantsbuild/pants/commits/stuhood/2.15.x-with-persist-inputs we’re off on Monday, but will try to clean it up on Tuesday
🙌 1
…so, rather than getting cleaner, it got dirtier today. will pursue cleanup tomorrow.
👍 1
c
this seems to be the case every time I see
Copy code
12:53:09.10 [INFO] Remote cache/execution options updated: reinitializing scheduler...
12:53:11.01 [INFO] Scheduler initialized.
after that pants rebuilds all my pex:es…
w
it shouldn’t rebuild them… but it will need to re-memoize them. recreating the scheduler drops everything in memory, but it doesn’t invalidate any on-disk caches
c
it takes longer than if I just kill the daemon…
w
hm.
c
Copy code
12:54:34.87 [DEBUG] Completed: Remote cache lookup for: Building pytest_runner.pex
12:54:34.87 [DEBUG] Completed: Remote cache lookup for: Building pytest_runner.pex
12:54:34.87 [DEBUG] remote cache miss for: "Building pytest_runner.pex" digest=Digest { hash: Fingerprint<ae4679a4bdfaebb90dc0f9a213e4dddc52f2989a17c947778f19bfb494746e5b>, size_bytes: 142 }
12:54:34.87 [DEBUG] remote cache miss for: "Building pytest_runner.pex" digest=Digest { hash: Fingerprint<6c55ec4df6c146c95c2ef37248ea3fb6011b6b28b7f06859b0814e26af69aec7>, size_bytes: 142 }
so, it seems the remote options affect my pex fingerprint..
w
they really shouldn’t. do you have any plugins that might be generating something going into those pexes?
c
no plugins besides the toolchain one
did we get the directory stuff in the store now?
so I can look at the actual content going into those digests…
w
i’ll do that. but note that missing the remote cache doesn’t necessarily mean that something changed… the remote cache is still a cache
c
ok, so maybe that line was a colored fish, but I’m pretty sure the pexes were built rather than fetched from somewhere (as some of them take about a minute to build..)
w
to start with: were the digests different each time?
i’m rebasing the patch i mentioned above.
c
I’ll see if I can find out from todays logs.
w
if you have buildsense links for them, i can get the full definitions
👍 1
c
want to make sure I look at the equiv pexes
w
🙏 1
c
ahh.. failed to run with PANTS_SOURCE due to projects IC is 3.8..
would need to build a custom release and then get bootstrapped with that one somehow…. 😬
w
or rebase the patch to whatever release you’re using i think?
c
issue is the repo I see this on requires py 3.8, and pants on M1 requires py 3.9 so pants from source is out.
can confirm that the previous message was a red herring.. something else must have coincided with that notice.. https://pantsbuild.slack.com/archives/C0D7TNJHL/p1678298073916689?thread_ts=1670021167.185789&amp;cid=C0D7TNJHL
hmm.. I just tried another terminal, for other reasons, and it didn’t restart the daemon at all, but all memoizations etc where invalidated, so it seems there is something in my environment that is affecting my cache keys.. maybe ? 🤷
w
could be. do you pass any environment variables to processes?
c
only
Copy code
[test]
extra_env_vars = ["DJANGO_TESTING=true"]
oh, wait no a few more, they’re all hardcoded though
w
hm. can you share the buildsense links for the two tabs?
c
didn’t run them in debug mode 😕 still helpful?
w
yes
c
ok
Hmm.. circling back to my initial observartions, iirc I got triggered by the frequent building of “requirements_venv.pex” for seemingly no reason. As I observed this one, when I had a really small change I was linting, it dawned on me..
Copy code
10:24:58.27 [INFO] Completed: Building 1 requirement for requirements.pex from the 3rdparty/python/lockfiles/default.txt resolve: boto3==1.20.8
… that it subsets only the required libs from the lockfile for each particular case, so depending on what sources I lint, it will need a different requirements_venv.pex based on what libs are used for each set of targets. With a large-ish lockfile this set tends to have many permutations leading to frequent rebuilds.. feels like it would be more efficient to just create on large pex with all requirements then, as that is what’s going to be used in the end any way there shouldn’t be any issues besides potentially being a little slower to setup but beats rebuilding it frequently. So I think this is https://www.pantsbuild.org/docs/reference-python#run_against_entire_lockfile that I want to test with having turned on..
not sure if this has been the case all along, but certainly a part of it..
h
That is exactly what that option is for, to avoid all the subsetting
Although subsetting is basically as fast as it can be, it may not always be fast enough.
c
it did improve when I managed to get rid of the 500Mb+ tensorflow lib from the default resolve… but I still don’t feel we need the subsetting.. and if there is any changed behaviour as mentioned in the docs we certainly want to use the whole thing as that is what we’ll run with in production
e
Surely you don't run any binary with the whole lockfile as deps in prod?
c
ah, no there are a few overlaps between binaries, true.
e
So, even with a 500MB things don't make much sense. If subsetting is going from packed -> packed, the 500MB would have been already packed (zipped) once and its just symlinks when subsetting.
c
but subsetting down to a pex with only a handful doesn’t seem as valuable either from a functional perspective
e
Maybe we have bugs in lint code not using VenvPex internal_only?
c
well, the “Building requiements.pex” step could take >1min…
e
Yeah, I won't argue the worth or not of subsetting. It taking 1 min with no requirements.txt changes / not in a full clean cache scenario is totally unexpected.
c
yeah, I freq observed very long subsetting times for the same lockfile/resolve only different subsets being used..
e
Yeah, that makes no sense and is worth drilling on. Any distribution already involved in a subset that is intersected by a new subset should be symlinked - basically 0 time.
c
ok, this aims me in a slightly different direction than before.. will see what reproduction I can produce
b
Not to derail, but I was just thinking
internal_only
should default to
True
. There are far far far many more internal PX codepaths than not.
e
I think the Pants console timings are poor data here. This probably requires adding probes.
👍 1
I won't comment on defaults, but it probably ought to be part of required kwargs at least. Force you to pretend you know the answer at the least instead of letting it slide by unset.
b
Yeah thats the current case
e
Ah, ok - that's enough for me. I count on people to read. But yeah, folks don't, including me, and may just copy pasta and not think.