< witty crayon 22786> Working through invalidation headers f Pants #development

<@U06A03HV1> Working through invalidation headers ...

ancient-vegetable-10556

01/07/2022, 8:24 PM

@witty-crayon-22786 Working through invalidation headers for JVM, in Python-land we prepend a comment which is ignored by tools that eventually read the lockfiles. Our JVM lockfile format is custom, but currently is fed straight into a JSON parser. To support invalidation headers, I could either start stripping lines that begin with

, or swap to a YAML parser (which will read JSON with comments). Preference here?

hundreds-father-404

01/07/2022, 8:26 PM

Any thoughts on TOML for lockfile format? I think it's worked quite well for Cargo

ancient-vegetable-10556

01/07/2022, 8:27 PM

Toml would be entirely fine, it’s just that we already have lockfiles out there that are JSON, so there’d need to be deprecation work

witty-crayon-22786

01/07/2022, 8:27 PM

what about just embedding a key in the JSON…?

ancient-vegetable-10556

01/07/2022, 8:27 PM

root object is a list

ancient-vegetable-10556

01/07/2022, 8:27 PM

sorry 🙂

witty-crayon-22786

01/07/2022, 8:27 PM

make it not a list?

ancient-vegetable-10556

01/07/2022, 8:28 PM

see above

witty-crayon-22786

01/07/2022, 8:28 PM

i don’t think that the deprecation work is avoidable? any of these are changes

fast-nail-55400

01/07/2022, 8:28 PM

maybe switch to JSON object and embed a

version

key? then can read in using

json.loads

, if a list then assume v1 lockfile, if an object assume v2+

witty-crayon-22786

01/07/2022, 8:29 PM

yea. regardless, it seems like the top level needs to be an object.

ancient-vegetable-10556

01/07/2022, 8:29 PM

Right

hundreds-father-404

01/07/2022, 8:29 PM

We're so early into the backend that I recommend we think about the ideal format for when we have 100 or 1000 users, rather than the few we have now. It'll be much harder to change in 3 months if we realize we need to

ancient-vegetable-10556

01/07/2022, 8:29 PM

@fast-nail-55400 your solution seems the least clunky for now

fast-nail-55400

01/07/2022, 8:30 PM

prior art is gradle’s lockfile format: https://docs.gradle.org/current/userguide/dependency_locking.html#lock_state_location_and_format

ancient-vegetable-10556

01/07/2022, 8:31 PM

yeah, Gradle’s is very similar to what we have for Python

fast-nail-55400

01/07/2022, 8:31 PM

not suggesting that we use it, but there might be useful attributes in there

fast-nail-55400

01/07/2022, 8:31 PM

also we need to switch to a JSON object any way since we may need to introduce concepts like “resolve parameters” for scala version

fast-nail-55400

01/07/2022, 8:31 PM

and would need to store them in the lockfile

ancient-vegetable-10556

01/07/2022, 8:32 PM

One thing that’s nice about having our invalidation headers as a comment is that we can ensure it lives at the top of the file and includes information on how to regenerate the files. That’s less controllable with a JSON object in the root

hundreds-father-404

01/07/2022, 8:32 PM

if we're switching to JSON object anyways, would changing to TOML be much more disruptive?

witty-crayon-22786

01/07/2022, 8:33 PM

@ancient-vegetable-10556: you can have a comment explaining how to regenerate, but the other bits of data can still be structured as keys

ancient-vegetable-10556

01/07/2022, 8:33 PM

@hundreds-father-404 With JSON objects, the parsing will always work with one line; for TOML or JSON, you’d need to pick one format to try first

ancient-vegetable-10556

01/07/2022, 8:33 PM

@witty-crayon-22786 That’s why I’m asking about how to implement comments, which are unsupported by JSON

hundreds-father-404

01/07/2022, 8:34 PM

With JSON objects, the parsing will always work with one line; for TOML or JSON, you’d need to pick one format to try first

I don't follow. What do you eman?

witty-crayon-22786

01/07/2022, 8:34 PM

in JSON, there is a sort-of convention for:

Copy code

{
  "#": "This is a comment!",
  "key": "value",
}

🤔 1

ancient-vegetable-10556

01/07/2022, 8:34 PM

OK, does the Python json serializer guarantee key order?

witty-crayon-22786

01/07/2022, 8:35 PM

sorting keys is optional… not sure if that means its guaranteed, but

ancient-vegetable-10556

01/07/2022, 8:35 PM

sort_keys is the opposite of what I want

fast-nail-55400

01/07/2022, 8:35 PM

Copy code

Note

This module's encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered.

fast-nail-55400

01/07/2022, 8:35 PM

(from https://docs.python.org/3/library/json.html)

ancient-vegetable-10556

01/07/2022, 8:36 PM

thanks @fast-nail-55400

ancient-vegetable-10556

01/07/2022, 8:39 PM

@hundreds-father-404

json.load

will produce a list or a dict depending on what the contents of the serialized form is (and you can process based on the type of the deserialized object); for toml we’d need to determine whether the file is toml or json somehow (e.g. trying a toml deserializer and falling back to json)

witty-crayon-22786

01/07/2022, 8:40 PM

re: toml: I have no preference. The lockfile format isn't meant to be readable... although there is maybe some benefit to avoiding nesting if you're going to merge conflicts.

➕ 1

witty-crayon-22786

01/07/2022, 8:40 PM

But even with toml, I never assume that I have correctly merged a conflict... have to re-run to confirm.

ancient-vegetable-10556

01/07/2022, 8:41 PM

@witty-crayon-22786 the invalidation header is meant to be readable, so that people can get loudly reminded that it’s not meant to be hand-edited

➕ 1

hundreds-father-404

01/07/2022, 8:41 PM

for toml we’d need to determine whether the file is toml or json somehow (e.g. trying a toml deserializer and falling back to json)

But only for the sake of backwards compatibility if we want to keep the existing JSON lockfiles still working for now, right? That will go away

witty-crayon-22786

01/07/2022, 8:41 PM

@ancient-vegetable-10556: sure: my comment was re: toml vs json, not about whether there should be a comment.

ancient-vegetable-10556

01/07/2022, 8:43 PM

I’m certain that JSON objects are not the right tool for an invalidation header comment, which leads me back to the starting point here 🙂 swapping in a YAML (de)serialiser and making it use JSON formatting would be the lowest impact to me right now, followed by adding in a way to strip out the invalidation header; adding in a new parser format would be tedious for now, but potentially useful in the long run; maybe

witty-crayon-22786

01/07/2022, 8:45 PM

If not JSON, then toml please.

➕ 1

ancient-vegetable-10556

01/07/2022, 8:46 PM

hundreds-father-404

01/07/2022, 8:47 PM

big +1 to TOML over YAML. YAML has so many gotchas

ancient-vegetable-10556

01/11/2022, 10:16 PM

How strongly do we feel about the lockfile metadata header being readable along with the rest of the lockfile? It’s currently not readable in Python (we pass the lockfile contents directly to pip in Python land, the invalidation header is a comment and thus ignored), and it would be easiest if we did it that way for JVM, but there’s no reason why that has to be the case per se

witty-crayon-22786

01/11/2022, 10:17 PM

ideally there should be a readable comment containing the re-run instruction, but i don’t care if any of the rest of the metadata is readable.

ancient-vegetable-10556

01/11/2022, 10:19 PM

Cool.

witty-crayon-22786

01/11/2022, 10:19 PM

honestly, even the human readable “run this to regenerate” comment isn’t required though. the point of the metadata is to be able to automatically tell users when/what to run

witty-crayon-22786

01/11/2022, 10:19 PM

(imo)

ancient-vegetable-10556

01/11/2022, 10:19 PM

Yeah, the “run to regenerate” comment isn’t as important as “don’t touch this” tbqh

witty-crayon-22786

01/11/2022, 10:20 PM

mm, true. that’s nice to have.

ancient-vegetable-10556

01/11/2022, 10:21 PM

OK. To wrap up this whole thread, in the interests of maintaining the smallest possible scope of work, I’m going to maintain the current lockfile format, but prepend a header block that will be stripped out at read time, we can swap over to TOML in future work if (e.g.) Tom has a real need for it

witty-crayon-22786

01/11/2022, 10:41 PM

that would mean that it’s not actually JSON…?

ancient-vegetable-10556

01/11/2022, 10:41 PM

Right; does that matter?

witty-crayon-22786

01/11/2022, 10:42 PM

it doesn’t matter, but since you’re already changing what data is stored based on whether something is readable via a particular format or not, changing to another format at the same time seems trivial

witty-crayon-22786

01/11/2022, 10:42 PM

so, imo: make the TOML change at the same time.

witty-crayon-22786

01/11/2022, 10:44 PM

it seemingly just amounts to:

Copy code

try:
    content = toml.loads(..)
except:
    # Is either old, or corrupted: attempt to load as JSON, and assume v1
    content = json.loads(..)
    content['version'] = 'v1'

ancient-vegetable-10556

01/11/2022, 10:46 PM

fine

Open in Slack

Previous Next