Does the engine provide a mechanism to get just th...
# development
b
Does the engine provide a mechanism to get just the
FileDigest
from a file within a
Digest
or
Snapshot
? I see I can get
FileContent
, but I don't actually need the bytes of the file, just the fingerprint
The context here is that I already know the expected digest, and I want to validate that what was produced by a
Process
invocation (
coursier fetch ...
) actually matches the digest from the lockfile
h
No.
The context here is that I already know the expected digest
Could you use
await Get(Digest, CreateDigest([file_content])
to get the expected
Digest
, then compare that way?
a
Interestingly it kind of does...
👀 1
b
Yeah, I can. I just figured that for performance reasons I'd rather avoid slurping all of the bytes into memory within the rule, and I couldn't think of any reason why the engine wouldn't expose something like
Get(FileDigest, ExtractFileDigest(digest, "some/file"))
Yeah, what I'm doing is very similar to
DownloadFile
, but it looks like that's implemented in Rust
So that was a bit of a dead end for me
a
If you make a
DownloadFile
which has a bogus URL, if the digest of the
DownloadFile
happens to be in the local cache it'll return you the
Digest
of the
Directory
containing exactly that file...
So you could do:
Process
-> subset = digest of directory containing one file Fake
DownloadFile
-> digest of directory containing one file (or error if it's not found) and compare them
(I realise this is overly convoluted, but...)
b
Ah but that's not what I need. I have the
Digest
, I need the
FileDigest
Oh I see
ehh, yeah, too hacky though
f
use Snaphot
a
Ooh, yes! I forgot
Snapshot
existed!
b
How does
Snapshot
help?
f
it contains a list of files/dirs in a Digest and their individual digests
think of it as the “directory listing” of a Digest
b
OK, that's what I want, but that's not the API I see on
Snapshot
a
Wait, I don't think it does on the
Python
side?
1
Only on the Rust side
b
I see a
digest
of the entire snapshot, and a list of files and dirs, not their individual digests
f
it may have another name, let me find an example
a
That said, exposing either the sha256 of files in
Snapshot
, or an intrinsic to go from a
Digest
or
Snapshot
+ path to its sha256 doesn't seem unreasonable. (Right now we haven't because "we happen to know the sha256 of these files" is an implementation detail rather than part of the API, but it could reasonably be part of the API, even if a future change to the internals means it's less efficient to compute than it currently is, which is unlikely in the short/medium term)
b
Yeah, I've been idly noting in my head that there are good reasons to keep
sha256
an implementation detail, but with a little design that can be maintained. It basically boils down to making
FileDigest
serializable, and for it to self-describe the underlying hash in a way that doesn't preclude adding or removing hash functions in the future
e.g.
Map[FingerprintAlg, Fingerprint]
, where both are really just aliases for
str
b
Yes, but there's no way to produce one without slurping in all of the bytes and ctor'ing it in memory
f
yeah 😞
Snapshot
only produces the file and directory names
and asking the engine for
DigestContents
pulls in all of the bytes
a
But would be trivial (both in code and runtime overhead) to extend to also include `FileDigest`s, I think?
f
so maybe we need to a new primitive to just not fully hydrate a
Digest
b
Yeah, I can easily go
Digest -> FileContent -> FileDigest
, but the
FileContent.content
is unnecessary for my use-case, and likely has sufficient network overhead to make a meaningful performance difference
Yeah it seems like this could just be an extension to
Snapshot
f
do you know the file name you want?
you can use
DigestSubset
to slice a Digest down to just the file you want
b
Snapshot.files
(or some new field for backcompat) could be a
Map[str, FileDigest]
Yeah getting down to and validating that my
Digest
has a single file with a known path is not a problem
a
I suspect we may want it to be a method rather than an eagerly hydrated
Map
, to allow for memory optimisations, but they're morally equivalent 🙂
b
Yeah. I'd also be fine with my original suggestion:
Get(FileDigest, ExtractFileDigest(some_digest, "file/path"))
f
open an issue?
b
Sure
h
Thanks! How blocking is this? Sounds fun for me to do as a background task this week, but trying to prioritize
b
Not blocking at all. The in-memory workaround is fine to keep iterating; the improved API would just be a drop-in optimization
👍 1