Hey all, I'm interested in fixing <https://github....
# development
b
Hey all, I'm interested in fixing https://github.com/pantsbuild/pants/issues/16999. I have a few questions: 1. I've proposed a schema for the output in https://github.com/pantsbuild/pants/issues/16999#issuecomment-1260112083. Any thoughts about it? E.g. anything else to include? Is json the right format? Should it be versioned somehow? 2. How do I implement it? AIUI, one approach might be generating a file as part of the
build_docker_image
rule and feeding it through https://github.com/pantsbuild/pants/blob/c5e902e6a630495c17e234012c001ed4f97b4173/src/python/pants/backend/docker/goals/package_image.py#L335 to set
relpath
... but I'm not sure of the right way to do this. 3. Once there's a schema and I know how to create a file, what's best way to actually create the JSON output? In our app, I'd define pydantic models and let pydantic handle the serialisation, rather than build dicts manually; is there something similar available in pants?
h
Thanks for this suggestion! I do think JSON is the right format. Versioning the format can’t hurt. Just stick a
version: 1
at the top level and no need to think about it until we need to change it…
👍 1
Re generating JSON, I’d start with just dicts maybe? Simplest to get going
Although the dataclasses are good documentation of the format, and they are easy to turn into dicts, so…
I’d defer to @curved-television-6568 on the best place to add this in the code, but it does seem like it should happen at the same time as image generation. However note that the actual writing out to
dist
needs to happen on every run (the thing that is written out can be cached) as it is a requested side effect
c
I like the use schemas in code. It serves both as documentation and typing. As we don’t have a dependency in Pants to any schema library (that I’m aware of) you could use dataclasses and the use the
dataclasses.todict()
on an instance of that.
Oh, btw for
./pants publish
there is structured output available already: https://www.pantsbuild.org/docs/reference-publish#output
If structured data is required also for
package
goal, then perhaps make it in the manner as for
publish
, so that each implementation can add their specific data to it. It also ensures that it gets written every time without invalidating the cache of the package/publish rule. https://github.com/pantsbuild/pants/blob/cdf21fc69954aec999ed57ed988bf67a31148bf9/src/python/pants/backend/docker/goals/publish.py#L42 https://github.com/pantsbuild/pants/blob/cdf21fc69954aec999ed57ed988bf67a31148bf9/src/python/pants/core/goals/publish.py#L138-L145 https://github.com/pantsbuild/pants/blob/cdf21fc69954aec999ed57ed988bf67a31148bf9/src/python/pants/core/goals/publish.py#L279-L284
It’s schema-less, though…
b
Thanks for the feedback. I've added a version field and
dataclasses
directly sounds good. In terms of implementation, my impression is that
publish --output
works well for something orchestrating pants (e.g. a CI script or interactively), but I think this output would be useful within pants, such as a test target wanting to use a docker image in
runtime_package_dependencies=[...]
, or feeding into a deploy template synthesised within pants. I think this means it'd be useful as part of
./pants package
output, and also that it can't just be
./pants package --output whatever.json
. Do I misunderstand?
c
If you have a rule that needs this information, it can access it from the
BuiltPackage.get_output_data()
(or whatever the method should be called), so it’s accessible to rules/plugins as well without extra effort.
b
Ah, I see, thanks. Is it accessible to non-custom rules/plugins, like a pex being available via
runtime_package_dependencies=[...]
?
c
Accessible yes, but I’m not exactly sure of what your question is asking for 😉
b
We (my company, using this feature) would like: 1. a
shunit2_test
or
python_test
to be able to access the docker image (to run it, and/or run some sort of introspection on it) 2. local development commands like starting a set of images via docker-compose (could be orchestrated via
experimental_run_shell_command
) 3. infrastructure-as-code able to access this to generate a CloudFormation template or similar (could be a
./pants run path/to/template_generator.py
, for instance) When an executable is packaged as a pex, it seems to be easy enough to depend on it via existing pants mechanisms. and it's be more than a bit annoying for executables that happen to be packaged via docker to require a custom plugin/rule. The original discussion talked about putting this info into
dist/
like any other packaged artefact (which is a mechanism we're already using for other things, like pex executables and lambda zip packages), whereas this new mechanism I don't understand yet, hence asking questions to try to double check it satisfies what I hope it might 😄
c
Ahh… ok, right. Gotcha. 🤔
Ok, yeah then it makes sense to me that when you package a docker image you also produce some metadata in for instance
dist/..
so that may be used as a file/resource input digest for dependees of that target.
👍 1
b
I've found a bit of time to start implementing this docker metadata export, using JSON-via-dataclasses as suggested: https://github.com/pantsbuild/pants/pull/17299 It's incomplete: I've written up some questions there. Thanks for your tips so far.
👍 3
Thanks for the pointer to how to create files @happy-kitchen-89482 . I've updated the PR and it seems to be working 🎉