it looks like when we produce `Tree`s for the `output direct Pants #development

it looks like when we produce `Tree`s for the `out...

witty-crayon-22786

09/25/2021, 5:22 AM

it looks like when we produce `Tree`s for the

output_directories

of `ActionResult`s, we store them in the

files

database (we have

files

and

directories

labeled databases currently). this mostly works for local access (since they are looked up in the same place), but it fails to upload in

ensure_remote_has_recursive

for a remote store, because we don’t recognize the blob as a

Tree

(it’s labeled as a file)

witty-crayon-22786

09/25/2021, 5:22 AM

it seems like there are a few options here, but that regardless of implementation, we will likely be using more `Tree`s and fewer `Directory`s over time (and maybe eventually deprecating `Directory`s entirely…?)

witty-crayon-22786

09/25/2021, 5:22 AM

so, two options i see:

witty-crayon-22786

09/25/2021, 5:24 AM

1. replacing the

directories

database with a

protos

database, which would store an outer envelope proto containing oneof

Directory

Tree

Command

Action

(all the protos we store in the

files

database currently) 2. creating a

trees

database in addition to the

directories

and

files

databases, and calling it a day

witty-crayon-22786

09/25/2021, 5:28 AM

i started implementing option 1, and i think it likely makes more sense than continuing to grow the number of databases we have. it would also potentially allow

ensure_remote_has_recursive

to recursively upload those other types and remove some manual code around that

curved-television-6568

09/25/2021, 5:28 AM

Sounds like 1 is the correct approach, and 2 the easy one ;)

😅 1

witty-crayon-22786

09/25/2021, 5:29 AM

cc @fast-nail-55400, @average-vr-56795, @enough-analyst-54434

average-vr-56795

09/25/2021, 8:12 AM

With 1, are you thinking of tagging what the proto type is somehow?

average-vr-56795

09/25/2021, 8:13 AM

We originally separated them for two reasons: 1. To avoid needing to validate them every time we deserialise them 2. Because different garbage collection characteristics

average-vr-56795

09/25/2021, 8:15 AM

So I'm curious how we'd achieve 1 - we could validate more often, we could tag they've been validated (either in the LMDB entry or somewhere else)...

witty-crayon-22786

09/25/2021, 3:08 PM

Yea, I was thinking that the envelope would be

message StoreTypes { oneof field { ... } }

. My thinking was that anytime we store a blob we know its type, so we can preserve that.

witty-crayon-22786

09/25/2021, 3:09 PM

Knowing the type is a prereq for any solution though? Can't choose to put it in a hypothetical

trees

database without knowing its type already.

witty-crayon-22786

09/25/2021, 3:11 PM

But I do think that continuing to keep the

files

database separate and untagged probably makes sense.

average-vr-56795

09/25/2021, 3:43 PM

Yeah, either oneof, or... IIRC @hundreds-breakfast-49010 added a schema mixin to the key so we could just add a bitset to the key

witty-crayon-22786

09/25/2021, 4:09 PM

yea, true. key tagging vs value tagging is a bit different: you have to know the type when looking it up.

witty-crayon-22786

09/25/2021, 4:10 PM

but honestly, i think that maybe

ensure_remote_has_recursive

doesn’t need to have the API it does currently, where it takes only `Digest`s… i’m pretty sure a caller will already know the types of the digests (because it got them out of an

Action

from a particular field, etc), which would avoid the need to tag them for that purpose, and we’d just tag for validation

witty-crayon-22786

09/25/2021, 4:11 PM

and at that point: yea, key tagging would work.

Open in Slack

Previous Next