<@U06A03HV1> Working through invalidation headers ...
# development
a
@witty-crayon-22786 Working through invalidation headers for JVM, in Python-land we prepend a comment which is ignored by tools that eventually read the lockfiles. Our JVM lockfile format is custom, but currently is fed straight into a JSON parser. To support invalidation headers, I could either start stripping lines that begin with
#
, or swap to a YAML parser (which will read JSON with comments). Preference here?
h
Any thoughts on TOML for lockfile format? I think it's worked quite well for Cargo
a
Toml would be entirely fine, it’s just that we already have lockfiles out there that are JSON, so there’d need to be deprecation work
w
what about just embedding a key in the JSON…?
a
root object is a list
sorry 🙂
w
make it not a list?
a
see above
w
i don’t think that the deprecation work is avoidable? any of these are changes
f
maybe switch to JSON object and embed a
version
key? then can read in using
json.loads
, if a list then assume v1 lockfile, if an object assume v2+
w
yea. regardless, it seems like the top level needs to be an object.
a
Right
h
We're so early into the backend that I recommend we think about the ideal format for when we have 100 or 1000 users, rather than the few we have now. It'll be much harder to change in 3 months if we realize we need to
a
@fast-nail-55400 your solution seems the least clunky for now
f
a
yeah, Gradle’s is very similar to what we have for Python
f
not suggesting that we use it, but there might be useful attributes in there
also we need to switch to a JSON object any way since we may need to introduce concepts like “resolve parameters” for scala version
and would need to store them in the lockfile
a
One thing that’s nice about having our invalidation headers as a comment is that we can ensure it lives at the top of the file and includes information on how to regenerate the files. That’s less controllable with a JSON object in the root
h
if we're switching to JSON object anyways, would changing to TOML be much more disruptive?
w
@ancient-vegetable-10556: you can have a comment explaining how to regenerate, but the other bits of data can still be structured as keys
a
@hundreds-father-404 With JSON objects, the parsing will always work with one line; for TOML or JSON, you’d need to pick one format to try first
@witty-crayon-22786 That’s why I’m asking about how to implement comments, which are unsupported by JSON
h
With JSON objects, the parsing will always work with one line; for TOML or JSON, you’d need to pick one format to try first
I don't follow. What do you eman?
w
in JSON, there is a sort-of convention for:
Copy code
{
  "#": "This is a comment!",
  "key": "value",
}
🤔 1
a
OK, does the Python json serializer guarantee key order?
w
sorting keys is optional… not sure if that means its guaranteed, but
a
sort_keys is the opposite of what I want
f
Copy code
Note

This module's encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered.
a
thanks @fast-nail-55400
@hundreds-father-404
json.load
will produce a list or a dict depending on what the contents of the serialized form is (and you can process based on the type of the deserialized object); for toml we’d need to determine whether the file is toml or json somehow (e.g. trying a toml deserializer and falling back to json)
w
re: toml: I have no preference. The lockfile format isn't meant to be readable... although there is maybe some benefit to avoiding nesting if you're going to merge conflicts.
1
But even with toml, I never assume that I have correctly merged a conflict... have to re-run to confirm.
a
@witty-crayon-22786 the invalidation header is meant to be readable, so that people can get loudly reminded that it’s not meant to be hand-edited
1
h
for toml we’d need to determine whether the file is toml or json somehow (e.g. trying a toml deserializer and falling back to json)
But only for the sake of backwards compatibility if we want to keep the existing JSON lockfiles still working for now, right? That will go away
w
@ancient-vegetable-10556: sure: my comment was re: toml vs json, not about whether there should be a comment.
a
I’m certain that JSON objects are not the right tool for an invalidation header comment, which leads me back to the starting point here 🙂 swapping in a YAML (de)serialiser and making it use JSON formatting would be the lowest impact to me right now, followed by adding in a way to strip out the invalidation header; adding in a new parser format would be tedious for now, but potentially useful in the long run; maybe
w
If not JSON, then toml please.
1
a
ok
h
big +1 to TOML over YAML. YAML has so many gotchas
a
How strongly do we feel about the lockfile metadata header being readable along with the rest of the lockfile? It’s currently not readable in Python (we pass the lockfile contents directly to pip in Python land, the invalidation header is a comment and thus ignored), and it would be easiest if we did it that way for JVM, but there’s no reason why that has to be the case per se
w
ideally there should be a readable comment containing the re-run instruction, but i don’t care if any of the rest of the metadata is readable.
a
Cool.
w
honestly, even the human readable “run this to regenerate” comment isn’t required though. the point of the metadata is to be able to automatically tell users when/what to run
(imo)
a
Yeah, the “run to regenerate” comment isn’t as important as “don’t touch this” tbqh
w
mm, true. that’s nice to have.
a
OK. To wrap up this whole thread, in the interests of maintaining the smallest possible scope of work, I’m going to maintain the current lockfile format, but prepend a header block that will be stripped out at read time, we can swap over to TOML in future work if (e.g.) Tom has a real need for it
w
that would mean that it’s not actually JSON…?
a
Right; does that matter?
w
it doesn’t matter, but since you’re already changing what data is stored based on whether something is readable via a particular format or not, changing to another format at the same time seems trivial
so, imo: make the TOML change at the same time.
it seemingly just amounts to:
Copy code
try:
    content = toml.loads(..)
except:
    # Is either old, or corrupted: attempt to load as JSON, and assume v1
    content = json.loads(..)
    content['version'] = 'v1'
a
fine