https://pantsbuild.org/ logo
#development
Title
# development
b

bored-art-40741

03/28/2021, 9:36 PM
Does the engine provide a mechanism to get just the
FileDigest
from a file within a
Digest
or
Snapshot
? I see I can get
FileContent
, but I don't actually need the bytes of the file, just the fingerprint
The context here is that I already know the expected digest, and I want to validate that what was produced by a
Process
invocation (
coursier fetch ...
) actually matches the digest from the lockfile
h

hundreds-father-404

03/28/2021, 9:39 PM
No.
The context here is that I already know the expected digest
Could you use
await Get(Digest, CreateDigest([file_content])
to get the expected
Digest
, then compare that way?
a

average-vr-56795

03/28/2021, 9:43 PM
Interestingly it kind of does...
👀 1
b

bored-art-40741

03/28/2021, 9:43 PM
Yeah, I can. I just figured that for performance reasons I'd rather avoid slurping all of the bytes into memory within the rule, and I couldn't think of any reason why the engine wouldn't expose something like
Get(FileDigest, ExtractFileDigest(digest, "some/file"))
Yeah, what I'm doing is very similar to
DownloadFile
, but it looks like that's implemented in Rust
So that was a bit of a dead end for me
a

average-vr-56795

03/28/2021, 9:43 PM
If you make a
DownloadFile
which has a bogus URL, if the digest of the
DownloadFile
happens to be in the local cache it'll return you the
Digest
of the
Directory
containing exactly that file...
So you could do:
Process
-> subset = digest of directory containing one file Fake
DownloadFile
-> digest of directory containing one file (or error if it's not found) and compare them
(I realise this is overly convoluted, but...)
b

bored-art-40741

03/28/2021, 9:44 PM
Ah but that's not what I need. I have the
Digest
, I need the
FileDigest
Oh I see
ehh, yeah, too hacky though
f

fast-nail-55400

03/28/2021, 9:45 PM
use Snaphot
a

average-vr-56795

03/28/2021, 9:45 PM
Ooh, yes! I forgot
Snapshot
existed!
b

bored-art-40741

03/28/2021, 9:46 PM
How does
Snapshot
help?
f

fast-nail-55400

03/28/2021, 9:47 PM
it contains a list of files/dirs in a Digest and their individual digests
think of it as the “directory listing” of a Digest
b

bored-art-40741

03/28/2021, 9:47 PM
OK, that's what I want, but that's not the API I see on
Snapshot
a

average-vr-56795

03/28/2021, 9:47 PM
Wait, I don't think it does on the
Python
side?
1
Only on the Rust side
b

bored-art-40741

03/28/2021, 9:47 PM
I see a
digest
of the entire snapshot, and a list of files and dirs, not their individual digests
f

fast-nail-55400

03/28/2021, 9:48 PM
it may have another name, let me find an example
a

average-vr-56795

03/28/2021, 9:49 PM
That said, exposing either the sha256 of files in
Snapshot
, or an intrinsic to go from a
Digest
or
Snapshot
+ path to its sha256 doesn't seem unreasonable. (Right now we haven't because "we happen to know the sha256 of these files" is an implementation detail rather than part of the API, but it could reasonably be part of the API, even if a future change to the internals means it's less efficient to compute than it currently is, which is unlikely in the short/medium term)
b

bored-art-40741

03/28/2021, 9:50 PM
Yeah, I've been idly noting in my head that there are good reasons to keep
sha256
an implementation detail, but with a little design that can be maintained. It basically boils down to making
FileDigest
serializable, and for it to self-describe the underlying hash in a way that doesn't preclude adding or removing hash functions in the future
e.g.
Map[FingerprintAlg, Fingerprint]
, where both are really just aliases for
str
b

bored-art-40741

03/28/2021, 9:52 PM
Yes, but there's no way to produce one without slurping in all of the bytes and ctor'ing it in memory
f

fast-nail-55400

03/28/2021, 9:53 PM
yeah 😞
Snapshot
only produces the file and directory names
and asking the engine for
DigestContents
pulls in all of the bytes
a

average-vr-56795

03/28/2021, 9:54 PM
But would be trivial (both in code and runtime overhead) to extend to also include `FileDigest`s, I think?
f

fast-nail-55400

03/28/2021, 9:54 PM
so maybe we need to a new primitive to just not fully hydrate a
Digest
b

bored-art-40741

03/28/2021, 9:54 PM
Yeah, I can easily go
Digest -> FileContent -> FileDigest
, but the
FileContent.content
is unnecessary for my use-case, and likely has sufficient network overhead to make a meaningful performance difference
Yeah it seems like this could just be an extension to
Snapshot
f

fast-nail-55400

03/28/2021, 9:55 PM
do you know the file name you want?
you can use
DigestSubset
to slice a Digest down to just the file you want
b

bored-art-40741

03/28/2021, 9:55 PM
Snapshot.files
(or some new field for backcompat) could be a
Map[str, FileDigest]
Yeah getting down to and validating that my
Digest
has a single file with a known path is not a problem
a

average-vr-56795

03/28/2021, 9:57 PM
I suspect we may want it to be a method rather than an eagerly hydrated
Map
, to allow for memory optimisations, but they're morally equivalent 🙂
b

bored-art-40741

03/28/2021, 9:59 PM
Yeah. I'd also be fine with my original suggestion:
Get(FileDigest, ExtractFileDigest(some_digest, "file/path"))
f

fast-nail-55400

03/28/2021, 9:59 PM
open an issue?
b

bored-art-40741

03/28/2021, 10:00 PM
Sure
h

hundreds-father-404

03/28/2021, 10:17 PM
Thanks! How blocking is this? Sounds fun for me to do as a background task this week, but trying to prioritize
b

bored-art-40741

03/28/2021, 10:18 PM
Not blocking at all. The in-memory workaround is fine to keep iterating; the improved API would just be a drop-in optimization
👍 1