Are there any workarounds I can apply in a plugin ...
# plugins
g
Are there any workarounds I can apply in a plugin to work around the 2GB limit of files in LMDB? I'm trying to build some huge image layers in my custom OCI container (for large pexes), but when building layers it fails on the below:
Copy code
Failed to digest inputs: "Error storing Digest { hash: Fingerprint<30eeb2aa047b16db33b65652c344c6fa8c41c2e8f8d34741532b8b44f9d65227>, size_bytes: 2210641920 }: Input/output error"
The pex itself works fine using layout=packed, but an image layer has to be a .tar, which again breaks the digest. I've seen something called named caches in the code? Would that avoid this? Or do I need to do some ugly hack with splitting the file?
b
named_caches
is something plugins get to use, and not users, unfortunately. I'm curious if https://github.com/pantsbuild/pants/pull/18153 might work here. We now handle "large" files differently. Can you try out Pants
2.17.0.dev4
or later?
g
named_caches
is something plugins get to use, and not users, unfortunately.
Not sure I understand! This would be used by a plugin/backend.
Trying 2.17.0.dev4
That works; but 2.17 is... a while out šŸ˜ž
b
If you're in a plugin... you may be able to use
named_caches
But probably not šŸ˜•
Give me 15, and I'll write a TL;DR for ya
g
Awesome, thanks!
šŸ™Œ 1
b
So
named_caches
is a performance hack, mostly. That part should probably be internalized. It isn't cached in terms of the Pants engine. That also means it doesn't participate in remote caching. Under the hood, Process sandboxes whose
Process
object has
append_only_caches
entries get a symlink into the Pants cache named caches root (whatever the value of the global option is). And that's it. Enjoy the disk space and footgun. What this means, in practice, for plugin authors is: ā€¢ You need to be completely sure the code you're running is 100% concurrency-proof. Usually this means writing to a tempfile and `rename`ing the temp file to the destination. (This also explains the named
append_only_caches
, it's a hint that you probably don't want to try and edit/remove anything, as treationg is as append-only means you'll be concurrency-safe. In reality, do what you want, but be very sure_)_ ā€¢ Likewise, ensure your code is kill-safe. Your process could be killed at any point, and can also be restarted (and in parallel). SO at any point in time, you must leave the cache "valid". With those two in mind, if your
Process
object is somehow creating this 2GB file (so it isn't coming from the
input_digest
) then you could
cp
it to a
append_only_cache
in a temporary file, then
rename
that file to the final destination. And then in every single
Process
that could want the file, you'll need to pass the
append_only_cache
and load the file from the symlink. Does that make sense?
...I should put this in the docs...
g
It makes sense! So essentially; instead of returning the file via
output_dirs=...
, I'd write it to the named cache and pass around something that lets me retrieve it from the caches?
b
Yup, and that something is simply the entry in
append_only_caches
Unfortunately, right now that means you have to control every
Process
that could want that value,
g
This would be fine; I think. I assume this also means I'd need to manually think about things like digests and locations etc? Or is that handled?
b
mm not sure what you mean. Put another way?
g
Hmm! Not sure what I'm trying to ask. If I'm working with a regular Process I give it an input digest for the files I want to read, and I get an digest back for the files I wrote based on the output_dirs etc. It seems to me like by not getting the digest, I'm on my own for ensuring file references are valid, writing to the right place, etc.
b
Yup! very much so šŸ™‚
A nice, shiny footgun in many ways šŸ™‚
g
Hmm. I've been thinking about moving the build process out of Pants anyways and doing it more traditionally in somewhere like ~/.cache/containers. Which seems to me like it'd be the same effort; just a different location.
It seems to me either I do that now; or wait until 2.17.... or split the tarballs.
@bitter-ability-32190 I started bumping my plugins to support 2.17.0 but I'm seeing a weird error which might be related to your PR...
Copy code
E         	Engine traceback:
E         	  in `run` goal
E         	  in Build OCI image
E         	  in Resolve transitive targets
E         	  in Resolve direct dependencies of target - tmpc2qc5x8d/oci:example
E         	  in Inferring dependency from the pex_binary `entry_point` field
E         	  in Creating map of first party Python targets to Python modules
E         	  in Find all Python targets in project
E         	  in Find all targets in the project
E         	  in Finding files: **
E         	
E         	Exception: Failed to read link "/tmp/pants-sandbox-pSiPlO/.python-build-standalone": Absolute symlink: "/tmp/immutable_inputstGflKc/.tmpEYXnDy/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Any thoughts? It seems like I can build images fine locally, but tests don't work... wondering whether it's something related to test/pants bootstrap in that situation.
b
Are you grabbing everything as the output from a sandbox?
New python processes has a
append_only_cache
that uses that name
g
Not that I know of; and only the second to top frame is in my code.
b
E         	  in Finding files: **
...How is this running? What is running here? It almost looks like Running Pants in a Pants sandbox?
g
b
You'll want to ignore that symlink somehow. Either in a
pants.toml
or in a flag: https://www.pantsbuild.org/v2.17/docs/reference-global#pants_ignore
g
This seems like a bug; looks like all those things are picked up:
Copy code
E         	Exception: Failed to read link "/tmp/pants-sandbox-jJqAyo/.cache/pex_root": Absolute symlink: "/home/runner/.cache/pants/named_caches/pex_root"
b
If you're running pants from within a pants sandbox, you gotta be sure you aren't having the inner pants try and look at pants-support files in the sandbox šŸ™ƒ
g
But this is the pants testutil... Surely it sets things up properly? :s
b
That.. I have no clue on. šŸ˜…
You only see this on upgrade to 2.17?
g
Hmm! Now I'm seeing it on code that definitely 100% worked on 2.14-2.16a0...
b
Yeah ok, see if you can isolate it and file a bug
g
Oof, a bit too complex testing setup throwing me for a loop. Since I do matrix testing with different host pants vs pants target I had the same bug show up in two different places. 2.14-2.16 break when testing 2.17 with test coverage enabled, but work fine otherwise. 2.17 breaks when testing any version even without coverage.
Will see if I can summarize this next week... What a mess. šŸ˜„
@bitter-ability-32190 Are there any hacks for
package
+ named caches? I just tried my "split tarballs" approach but it doesn't work because completed layers ends up over 2 G anyways. So now I'm eyeing named caches; but if that still has a 2GB limit due to needing a digest output I'm hardblocked until 2.17 is stabilized. šŸ˜ž