Hello, I'm having issues chaining tasks. I'v read ...
# general
b
Hello, I'm having issues chaining tasks. I'v read https://www.pantsbuild.org/dev_tasks.html and I uses
Copy code
def product_types(cls):
    return ['expanded_yaml']
and
Copy code
def prepare(cls, options, round_manager):
    super(MyGenerator, cls).prepare(options, round_manager)
    round_manager.require_data('expanded_yaml')
but then how do I pass the value I'm interrested in ? (i.e the output directory in my case) I tried
self.context.products.register_data('expanded_yaml', target_workdir)
for the producer and `self.context.products.get_data('expanded_yaml')`for the consumer. It seemed to work but after cleaning all caches and invoking directly gen on the consumer target, the result of
self.context.products.get_data('expanded_yaml')
is None …
e
As a sanity check, does
./pants --explain gen
show that your task - installed in the gen goal - depends on the task that calls
self.context.products.register_data('expanded_yaml', target_workdir)
?
b
not sure how to read the output :
Copy code
./pants --explain gen
Goal Execution Order:

bootstrap -> imports -> unpack-jars -> deferred-sources -> gen

Goal [TaskRegistrar->Task] Order:

bootstrap [substitute-aliased-targets->SubstituteAliasedTargets_bootstrap_substitute_aliased_targets, jar-dependency-management->JarDependencyManagementSetup_bootstrap_jar_dependency_management, bootstrap-jvm-tools->BootstrapJvmTools_bootstrap_bootstrap_jvm_tools, provide-tools-jar->ProvideToolsJar_bootstrap_provide_tools_jar]
imports [ivy-imports->IvyImports_imports_ivy_imports]
unpack-jars [unpack-jars->UnpackJars_unpack_jars]
deferred-sources [deferred-sources->DeferredSourcesMapper_deferred_sources]
gen [antlr-java->AntlrJavaGen_gen_antlr_java, antlr-py->AntlrPyGen_gen_antlr_py, jaxb->JaxbGen_gen_jaxb, protoc->ProtobufGen_gen_protoc, ragel->RagelGen_gen_ragel, thrift-java->ApacheThriftJavaGen_gen_thrift_java, thrift-py->ApacheThriftPyGen_gen_thrift_py, wire->WireGen_gen_wire, yaml-template->YamlTemplateApply_gen_yaml_template, twirl->TwirlGen_gen_twirl, controllers->ControllersGen_gen_controllers, routes->RoutesGen_gen_routes]
e
What are the the names of the task types you wrote and what order do you expect them to run in?
b
the producer is
YamlTemplateApply_gen_yaml_template
and the consumer
ControllersGen_gen_controllers
e
ok - so they are both installed in the gen goal and in the right order, yaml before contoller
b
yes, I was a bit surprise that I have to declare the
backend_packages
in the right order in the pants.ini to make it work.
e
It seemed to work but after cleaning all caches and invoking directly gen on the consumer target,
Does the consumer target express a dependency on a target pointing to the file the yaml task operates on?
(express a BUILD file dependency?)
b
yes
e
Yes, I was a bit surprise that I have to declare the
backend_packages
in the right order in the pants.ini to make it work.
Flagging this for a re-visit after figuring out main issue.
b
(ok ^_^)
e
OK - do the tasks use self.invalidated... guards?
b
hmm both are
SimpleCodegenTask
and I implement mainly
execute_codegen
e
ok
b
if you want to have a look, I'm at this point https://gist.github.com/lgirault/7a748a911fc6cff528ad9be5ebf6588a
e
b
yes ?
e
So, only when execute_codegen is called (which is only when target has changed) is the product registered
It needs to be registered on every run, even if grabbing the result from the cache.
Let me see if SimpleCodegen has a hook for this...
It does not.
b
XD
damn
So I should do a pr on pants ?
e
I think all prior uses of SimpleCodegen in the pants codebase worked against some implicit products (java and scala and python sources)
You should, yes.
All the dark corners are yours @brief-engineer-67497
b
^_^
so I'm still figuring out python. what is your advice to branch my build on a local version of pants ?
e
We have that documented - just a second
See 'Running from sources' at the top of this doc: https://www.pantsbuild.org/howto_develop.html
b
nice !
thank you very much
ok, I'll work on that
e
The API is probably something like
register_products(self, target, target_workdir)
. So same shape as
execute_codegen
with a default noop implementation and the main difference that it's called for every vt after execute_codegen runs for the invalid ones: https://github.com/pantsbuild/pants/blob/14bfee29bb2f44c3b10e4144c67a118805522c45/src/python/pants/task/simple_codegen_task.py#L237
But just a sec before diving in - let me look at you gist a bit
b
ok,
e
Other impls don't pass data through products, they generate synthetic targets to own the generated files and inject those targets in the graph
What does your yaml step generate?
b
(btw I know that the registered target should be a dict indexed by the producing target, this is a first draft)
e
The target that captures the generate stuff is
Resources
b
my yaml step apply some templating
e
What is the end result? CSS, HTML, something else?
b
a bit like macros in c
yaml to yaml
it's to handle "generics"
e
aha - ok
So, traditionally for codegen, things would work like this:
b
and inject those targets in the graph
that seems like what I'm trying to do
e
You'd introduce a
yaml_template
target that would own the templated yaml in use BUILD files. Your YamlTemplateApply would operate only on those targets and it would emit
yaml
targets as a result. The
ControllersGen
task would then only operate on
yaml
targets.
b
maybe not what I've done but conceptually
e
Here, better target names are likely
route_templates
and
routes
if the yamls are special routing yamls and not general yaml files.
b
well I'm using the templating for a specific purpose but the engine is really generic (no pun intended)
it could be used for other purposes
I'll try to make it open source
e
Let's work backwards. Is
ControllersGen
generic to any yaml file or should it only work on specific yaml files?
b
this one need specific yaml files. ones that uses openapi
actually even more specific, openapi allow extensions and I use these fields to store information required by the code gen
Other impls don't pass data through products, they generate synthetic targets to own the generated files and inject those targets in the graph
yet the documentation speaks of
Products: How one Task consumes the output of another
. But I can see that some people want to remove them 😛
Copy code
# TODO(John Sirois): Kill products and simply have users register ProductMapping subtypes
    # as data products.  Will require a class factory, like `ProductMapping.named(typename)`.
e
Yes, these are all a bit unrelated. It is true that codegen generally takes in a special target type (
protobuf
) and synthesizes to a 'normal' target type (
java-library
) instead of using products. It's also true that products currently has a split between products that are just string keys vs products that are full blown custom types (the comment you reference).
b
I was just being mischievous ^_^ but anyway I'm kind in a frozen state. I should go through the building graph node injection to achieve my target ?
e
Sorry - paging between a few things.
So, the critical bit here is
ControllersGen
which looks broken in the gist. Presumably if you hand it random yaml it dies?
b
no problem, as usual, I'm very grateful for your time
yes
e
So it needs a special input target or field on an existing target from which to get special yaml
Right now you have a generic yaml templating task feeding it random yaml
b
e
Yeah
Ideas:
b
and on a general architecture point of view I totally agree with your point but even If I make a generic yaml_template target and a specific mics_spec target wich extends the first one I have no way to ensure that it's a valid openapi spec
in a way, the compilation of the generated controller source is part of this validation
but I'm all ears
e
1. Since template yaml is not valid yaml (presumably), the template files should have a different extension, say
.tyml
. 2. The generic
YamlTemplateApply
task could operate on any target with
.tyml
sources and produce matching
.yaml
output sources. 3. The
ControllersGen
task could operate on
MicsSpec
targets only, retrieving the mapped
.yaml
sources.
If that makes sense at a high level I can point to a few more details.
b
it makes a lot of sense. just to nitpick, yaml is a very "open" format, so my template yaml is actually valid yaml (on a structural point of view)
e
OK. Would it be onerous or wrong to demand a special extension?
b
not at all
just to give you an example, In my template file I have
responses: $tref:DataResponse[<mics-type://myResource>]
and in my output file it would be
Copy code
responses:
      '400':
        $ref: '#/components/responses/error'
      '401':
        $ref: '#/components/responses/error'
      '404':
        $ref: '#/components/responses/error'
      '403':
        $ref: '#/components/responses/error'
      '500':
        $ref: '#/components/responses/error'
      '200':
        description: Success !
        content:
          application/json:
            schema:
              type: object
              required:
              - status
              - data
              properties:
                status:
                  type: string
                  enum:
                  - ok
                data:
                  $ref: <mics-type://MyResource>
      '503':
        $ref: '#/components/responses/error'
e
OK. Then the only other details are: 1. In
YamlTemplateApply
synthetic_target_type
would return whatever the input target type was - basically it can't know or care where the
.tyml
is coming from or going to. 2. That's it actually!
This would allow
YamlTemplateApply
to remain generic and operate on
MicsSpec
, for example, without knowing it.
The product mapping goes away
And
ControllersGen
just operates on
MicsSpec
targets that own yaml files (the synthetic generated ones or otherwise hand-written ones that never ran through
YamlTemplateApply
).
Does that make sense?
(if so - no pants modifications needed)
b
it makes sense. The only point is about the special extension. Using yaml I have syntaxic coloration out of the box without any configuration … I'll just check what It require to get it back (because on a semantic point of view, you're right a separate suffix would be appreciable)
e
OK - cool.
b
ok the "config" is like a no brainer
1. In
YamlTemplateApply
synthetic_target_type
would return whatever the input target type was - basically it can't know or care where the
.tyml
is coming from or going to.
hmm actually not sure what you mean by "return whatever the input target type was "
so it can be a YamlTemplate target or a MicsSpec target and it would "return" the same target ?
b
ha ok, so you mean it quite litterally
XD
e
And
is_gentarget
would simply look to see if the target has sources matching
.tyml
.
So it is type-agnostic as well.
b
ok so that's for the
YamlTemplateApply
part but then what about the communication with
ControllersGen
? it would operates only on
MiscSpec
targets but how do I operate on the one synthetized by YamlTemplateApply rather than the original one listed in the dependencies ?
e
This will mean, for each
a -> mics
in the graph you'll get a
a -> (mics, mics')
after the
YamlTemplateApply
task runs. So
ControllersGen
will need to be smart enough to scan both
mics
and skip it since it owns no yaml files, and
mics'
and use it since it owns generated yaml files.
In other words,
ControllersGen
now has two target selection criteria: 1. type(target) == MicsSpec 2. any(s.endswith('.yaml') for s in target.sources)
b
ok but back to the
for each
a -> mics
in the graph you'll get a
a -> (mics, mics')
how do I add node/target to the graph ? I'm reading ProtobufGen but not sure I'm finding it
e
SimpleCodegen does it for you
b
what a nice guy !
^_^
cool
e
Also, you'll find
mics'
has Target.derived_from ==
mics
Which I don't think is useful for your case, but is for some cases.
b
ok
ok well, as we say in french "y'a plus qu'à"
(i.e all it remains is to do it)
thank you very much once again.
e
You're welcome
b
sooo … one last question
I have my target def
TemplatedYamlLibrary
and my task def
YamlTemplateApply
so far so good
e
Why
TemplatedYamlLibrary
?
b
I have my target def
ControllersLibrary
and my task
ControllersGen
e
I thought the final idea was to be target agnostic in that task
b
it actually is but I still need to declare a target
and I have a dictionary in the payload
e
it actually is but I still need to declare a target
You don't. You can implement
is_gentarget(target)
and test the given target for
.tyml
sources.
b
e
Thanks - much easier to talk about something we can both look at.
b
yes, of course
b
yes it's a typo that it's still here
e
Its a super-confusing api, but you can either define
gentarget_type
once or else implement
is_gentarget
which gets to say yes or no per active target in the current pants run.
b
look at the
is_gentarget
e
Aha
b
yes my bad, I figured that out
e
You did both
do one
ok
Was that the issue then?
b
yes my bad, typo
well the issue is in the ControllersGen execution
e
ok
b
so back to my setup, I have a TemplatedYamlLibrary target instance, let's call it
spec
and a ControllersLibrary target instance, let's call it
controllers
e
So your
is_gentarget
disagrees with your
execute_codegen
assertion.
b
indeed
good catch
but actually the print in the is_gentarget never show
spec
even if
controllers
declare
spec
as a dependencie
maybe I misuse pants but my target goal is that for one
TemplatedYamlLibrary
I can output two artefacts a controller lib and route files. So I though that I needed 3 targets
I'm not yet at the route files target/task but it's to give you some context on my train of though
b
yes ?
I think your extension is wrong.
b
hmm indeed
e
You want the codegen output extension type. You're telling SimpleCodegen what output files to collect into the synthetic target it generates
So, previously, you generated synthetic targets with 0 sources
Since the target workdir would presumably had no tyaml files emitted into it, just yaml files.
The main theme today is that SimpleCodegen is anything but simple 🙂
b
yes !
^_^
ok where I'm troubled is that the more we go, the more I feel I have no need for ControllersLibrary
well it's confusing
I have a mental mapping target/task
so I setup a target to "store" the arguments I want to pass to my CLI
e
Your mental model is pretty much spot on.
b
but here my mental schema is broken because for the
ControllersGen
task I have on one hand my
TemplatedYamlLibrary
which contain the sources but on the other hand it's
ControllersLibrary
that contains all the other args for the CLI
e
Why then have 2 targets? If
YamlTemplateApply
can work on any target with
.tyaml
sources, why can't it operate directly on the target with the
target.payload.dictionary
ControllersGen
needs?
b
it has different concerns
a third output for the yaml template is a public openapi that we will publish to the client
e
My naming: 1 target
controller
. It has sources that can be
.yaml
or
.tyaml
as wells as a
dictionary
. The
YamlTemplateApply
operates on these targets when they own
.tyaml
sources, the
ControllersGen
task always operates on these targets.
b
so in one directory I want to declare my target responsible of the
tyaml
sources and I want to setup my controllers_library target next to my server code in another diroctory
ok …
I think I see where you're going
e
Is the main issue then that a controllers data is split in 2 dirs right now?
Souce code in one tree, yaml in another?
But really they belong in same dir?
b
one source directory, three targets pointing to the same input directory
well the spec and the "meat" of the code do not belong in the same directory because with the same yaml spec we will generate also the client code
e
Will a controller always have a yaml spec? IE: does it make sense to have a
controller
target with 2 required fields:
dictionary
and
spec
, where spec is a target adress that must point to a yaml (or templated yaml)?
b
well for me it was more "pipeline" of tasks where the second one did not need to know how its input is produce : the dictionary is a general entity related to templating, but on the other hand, it's true that the content of this dictionary is very oriented
so it could make sense yes
e
Here's the problem I have - there is a ton of domain-specific important info. Is there any way you can present a full spec of what you need to do somewhere?
This is hard to puzzle out for me.
b
what I cannot do is show the code I'm building but I think there is no problem with the build/tooling
and I can try to explain in a few words because I really think that what I'm tryng to achieve is not domain dependent
e
At a minimum, a sample dir layout and what you need to be generated / compile given that dir layout would be extremely helpful
b
yes, give me a few minutes
e
OK - great. I really appreciate it.
b
ok so here is a simplified version of our mono-repo layout: As I go, I'm not sure any code sample would help understanding but please say if I'm mistaken
Copy code
mono-repo/
   specs/
      service1/
         service1-api.tyaml
      service2/
         service2-api.tyaml

      dictionary.yaml   
   api/
      service1-models/
      service2-models/
   service1/
   service2/
actually we have ~ 40 services
dictionary contains the template definitions, here an exemple I gave earlier: in service1-api.tyaml
responses: $tref:DataResponse[<mics-type://myResource>]
, dictionary.yaml contains
Copy code
Data[ResourceSchema]:
  schema:
    type: object
    required:
      - status
      - data
    properties:
      status:
        type: string
        enum:
          - ok
      data: ResourceSchema

DataResponse[Resource]:
  200:
    description: Success !
    content:
      'application/json': $tref:Data[| {"$ref":"Resource"} |]
  400:
    $ref: '#/components/responses/error'
  401:
    $ref: '#/components/responses/error'
  403:
    $ref: '#/components/responses/error'
  404:
    $ref: '#/components/responses/error'
  500:
    $ref: '#/components/responses/error'
  503:
    $ref: '#/components/responses/error'
and in my output file will be
Copy code
responses:
      '400':
        $ref: '#/components/responses/error'
      '401':
        $ref: '#/components/responses/error'
      '404':
        $ref: '#/components/responses/error'
      '403':
        $ref: '#/components/responses/error'
      '500':
        $ref: '#/components/responses/error'
      '200':
        description: Success !
        content:
          application/json:
            schema:
              type: object
              required:
              - status
              - data
              properties:
                status:
                  type: string
                  enum:
                  - ok
                data:
                  $ref: <mics-type://MyResource>
      '503':
        $ref: '#/components/responses/error'
e
OK - can you enumerate how you need to use the dictionary.yaml + .tyaml for a given service?
b
image.png
that's an old slide Plugin means sbt plugin
so I need to apply dictionary.yaml once to a tyaml to get a "usable" spec
e
SO, in brief, (tyaml -> yaml) -> spec -> (client + server)? Ie, 1 task to generate yaml, reason for the yaml agnostic, then 2 tasks to take that yaml - happens to be openapi spec, and turn into client and server?
b
the "usable" spec is used to build controllers stubs, play route files (i.e two artifacts with the same end goal, building a server)
yes !
e
OK - that is useful
b
cool !
I'm sorry english is not my main language so I may not convey the message as clearly as I hope
e
Understood and no worries. It is my main language and I have no clue how to use it effectively after 46 years.
So, drilling on 1 final point.
The templating task may be generic, but it needs 2 inputs: 1 - a dictionary with template values to plug in. 2 one or more template files to plug 1 into. Correct?
b
yes
e
Ok, and - on the client/server generation side, only yaml is needed. Neither client no server generator knows about templating, just the openapi yaml spec?
b
yes, and client and server are generated separately
actually now the clients are still hand written and we are in the process of migrating servers to the generated stubs but that is the goal
e
ok
So, I think
TemplatedYamlLibrary
needs to remain as the target type
YamlTemplateApply
operates on since
YamlTemplateApply
needs more than just
.yaml
files - it needs the dictionary too. I think, in full,
TemplatedYamlLibrary
needs 3 fields: + dictionary (required) + sources - 0 or more .tyaml files + target_type - the type of target to generate - defaulting to
Resources
This will allow you to write the generic
YamlTemplateApply
task but yet still have it emit a specific
Spec
target that both
ControllerGen
and
ClientGen
can look for and use. That
Spec
target can be hand-written in a BUILD file and point to a fully formed yanml file hand-written by a user, or be generated.
Does that work?
b
yes
I think I begin to understand why I had so much trouble to grasp how pants work. I'm really used to a schema where I have task that behave like function and I can branch them together. Whereas with pants the "lifecycle" evolves around a target. And the more we go in this discussion the more I picture a target as a blueprint and tasks succeed each other to flesh out the target
is this correct ?
so the
target_type
is actually
type(self)
so to speak
ok the only drawback I see here is that for a given spec, I will have for exemple two targets
ControllerLibrary
and
ClientLibrary
that will have two differents output but the same first step (
YamlTemplateApply
) won't be shared. Or is a kind of hash used that make the two targets able to share this first step ?
e
Not really, but I wouldn't get hung up on it either. A target is generally just files + metadata. Tasks operate on either products or else targets of interest (right file types, right metadata or a combo of both) or both, and produce new targets or else products or both. So tasks are functions with 2 possible sources of input, products or targets. They can likewise produce the same two types of output. Pants has a "new engine" that is in the process of development and it will eliminatye targets as a possible output type in favor of only products. As such, on the input side, targets will be turned into products intrinsically by the new engine.
ok the only drawback I see here is that for a given spec, I will have for exemple two targets
ControllerLibrary
and
ClientLibrary
that will have two differents output but the same first step (
YamlTemplateApply
) won't be shared.
I don't understand: YamlTemplateApply(TemplatedYamlLibrary) -> Spec ControllerLibrary(Spec) -> Contoller ClientLibrary(Spec) -> Client
Why can the last 2 task signatures use the same Spec instance generated by the 1st?
b
Yes they absolutely can. Let me reread what you wrote above then ^_^
e
I proposed
TemplatedYamlLibrary(dictionary, sources, output_target_type)
b
so YamlTemplateApply is a task, TemplatedYamlLibrary and spec targets
e
Concretely, in a BUILD:
Copy code
templated_yaml_library(
  dictionary='dictionary.yaml',
  sources=['service.tyaml'],
  target_type=spec
)
And, by the way, the .tyaml extension could now go away.
b
I'm not clear with what is the target_type
e
b
is it a target(type) ie the synthetic_target type of the build graph node outputed by TemplatedYamlLibrary ?
e
Yes
b
ok but actually the same spec is used to describe both the client and the server
e
Which is perfect - right?
You write 1 BUILD target like I showed above, it generates one target of type (alias)
spec
and both controller and client tasks operate only on
spec
targets
b
ok so Spec is a synthetic target that I never declare in a build ?
e
Correct
b
okaaay
e
You declare:
Copy code
templated_yaml_library(
  dictionary='dictionary.yaml',
  sources=['service.tyaml'],
  target_type=spec
)
You have the client and server code depend on this
The gen code generates a spec
The client and server gen code operate on the spec - which is in the dep graph because both client and server code depend on the original:
Copy code
templated_yaml_library(
  dictionary='dictionary.yaml',
  sources=['service.tyaml'],
  target_type=spec
)
The, if the user executes pants with only the client code in scope, only the client stubs get generated
If both server and client targets are in scope, both, etc...
b
yes perfect
but so concretely I have in spec/service1/BUILD :
Copy code
templated_yaml_library(
 dictionary='dictionary.yaml',
 sources=['service.tyaml'],
 target_type=spec
)
so in service1/BUILD:
Copy code
controllers_library(
 dependencies = ['spec/service1']
)
e
Yup
Ditto client code (in the future)
b
yes
e
And, to underscore, no more need for
.tyaml
, just yaml will do.
b
so it's exactly what I was trying to achieve (modulo the target_type argument )
e
The
t
is now embedded in
templated_yaml_library
.
b
but I'm still not clear on the
ControllerGen
task
e
The
target_type
is the only way to make the
YamlTemplateApply
as far as I can tell.
ControllerGen operates on
Spec
targets
No matter who wrote them, human or
YamlTemplateApply
.
Is there any other complication?
b
ok but so, In one hand I have my input files in
Spec
and on the other hand I have my CLI args in
ControllersLibrary
how do I unite both ?
e
ControllersLibrary has a mandatory
spec
argument that expects an address.
Just a sec for details on that...
Um, just a sec, more thought
b
when
execute_codegen
is called, I operate on one target
e
One questio before I answer - what are all the inputs needed by a
ControllersLibrary
target?
b
the one that will be passed to the CLI:
Copy code
args = [
            '--controllers_package', target.payload.controllers_package,
            '--effect', target.payload.effect,
            
            '--base_route', sources_root,
            '--output', target_workdir
        ]
so
output
is given by
execute_codegen
,
sources_root
by
Spec
and I have two other arguments
e
OK - and 1 more - I'm not sure which one of those args communicates the spec yaml file, but is the expectation exactly 1 or 1 or more spec yaml files as inout?
b
1 or more yaml files
e
OK, so, from your example above:
Copy code
controllers_library(
 dependencies = ['spec/service1']
)
b
Copy code
controllers_library(
    package='controllers',
   effect='future',
   dependencies = ['spec/service1']
)
e
The SimpleCodegen of
YamlTemplateApply
is alreadu expanding
dependencies = ['spec/service1']
to
dependencies = ['spec/service1', sopec.service1.prime]
b
ok, so I search in my dependencies
e
Yes
Look at direct dependencies
b
ok
e
The synthetic one will be there and you ignore the rest that are not
Spec
It would be messier if the controller gen code expected exactly one. Then you'd need to fail late when there were too many Specs / files at task runtime,
b
ok, thank you
e
I'm going offline here in ~10 minutes. Hopefully we've gotten this to a sensible state for the night.
b
yeah hope so.
it's a bit late here, I really should go home 😛
anyway thanks a lot !!