Can anyone point me at the code that makes pants go oh hey I Pants #general

Can anyone point me at the code that makes pants g...

fast-school-44220

09/02/2025, 8:54 PM

Can anyone point me at the code that makes pants go "oh, hey, I can use protobuf to generate that module. let's run it!"?

fast-school-44220

09/02/2025, 9:15 PM

Oh, is it

FirstPartyPythonMappingImpl

? How does one extend that for non-python code?

happy-kitchen-89482

09/02/2025, 10:25 PM

Not quite, the mapping just supports dependency inference.

happy-kitchen-89482

09/02/2025, 10:25 PM

The protobuf codegen backends are under

src/python/pants/backend/codegen/protobuf

happy-kitchen-89482

09/02/2025, 10:25 PM

Mostly in the various

rules.py

happy-kitchen-89482

09/02/2025, 10:26 PM

And you can see which code is shared and which is separate, across the python, java and go implementations

fast-school-44220

09/02/2025, 11:27 PM

I spent some time reading exactly that code before asking, but couldn't find the magic. I could not figure out how to add a dependency on anything other than a Target listed in a BUILD file.

happy-kitchen-89482

09/03/2025, 12:42 AM

Ah, and you’ve seen https://www.pantsbuild.org/dev/docs/writing-plugins/common-plugin-tasks/add-codegen ?

happy-kitchen-89482

09/03/2025, 12:43 AM

That gives a decent overview of how the pieces fit together

happy-kitchen-89482

09/03/2025, 12:44 AM

Basically, the glue is:

Copy code

class GeneratePythonFromProtobufRequest(GenerateSourcesRequest):
    input = ProtobufSourceField
    output = PythonSourceField

happy-kitchen-89482

09/03/2025, 12:46 AM

Which says “I know how to generate Python sources from Protobuf sources, so invoke me if you see a dep from something that knows how to consume Python sources onto protobuf sources”

happy-kitchen-89482

09/03/2025, 12:46 AM

Then, for convenience, it’s best if that dep is inferred. But you can start out with an explicit dep just to see if the other moving parts do what you expect

fast-school-44220

09/03/2025, 1:03 AM

Okay, yes. I've got it working so that if I do

export-codegen

I get the files generated. Getting concrete, I'm successfully generating

foo.h

. I have a source file that #includes foo.h, but I can't find a way to add the generated foo.h into the inferred dependencies. It worked fine when I had it as a source file, but after switching to generated, it's not in AllTatgets.

fast-school-44220

09/03/2025, 3:16 PM

Rereading what you wrote... so when my inference logic finds the

#include foo.h

and determines that there isn't a foo.h target, it should look for possible upstream sources that could turn into a foo.h and infer the dependency on those?

fast-school-44220

09/03/2025, 3:17 PM

Meaning my inference engine needs to understand my generation DAG in reverse?

fast-school-44220

09/03/2025, 6:45 PM

Ah! I suspect my issue has to do with this comment in pants.engine.target:

Copy code

For generated first-party addresses, use
`./` for the file path, e.g. `./main.py:tgt`; for all other generated targets,
use `:tgt#generated_name`.

fast-school-44220

09/03/2025, 8:00 PM

oh, happy dance! I got the header generation working using that syntax.

fast-school-44220

09/03/2025, 8:01 PM

But now the newly generated header isn't having its dependencies inferred. :(

happy-kitchen-89482

09/03/2025, 9:14 PM

Its dependencies on, e.g., non-generated code?

happy-kitchen-89482

09/03/2025, 9:15 PM

For context, sounds like you’re working on a custom C/C++ plugin of some kind?

happy-kitchen-89482

09/03/2025, 9:17 PM

Re your question above: Yes, exactly: your inference logic finds

#include foo.h

, determines that

foo.h

isn’t provided by checked-in source, but can be provided by the code generator, and adds a dep on

foo.proto

(or more precisely the relevant target containing it as a source).

fast-school-44220

09/03/2025, 9:22 PM

Yes, something like that. It's not C, but it does use the C preprocessor.

fast-school-44220

09/03/2025, 9:24 PM

And my generated header itself has a #include inside of it, but that header (primary source file, in this case) isn't showing up in the Transitive dependencies, probably because foo.h didn't exist yet when it was computed.

happy-kitchen-89482

09/03/2025, 9:56 PM

Ah hmm, yes, that is tricky. You’d have to write dep inference logic that can look at a

.proto

and figure out which deps it generates.

happy-kitchen-89482

09/03/2025, 9:57 PM

To start with, just so you can make progress, does it work if you manually add the dep to the

dependencies=

field of the

protobuf_sources

target?

fast-school-44220

09/03/2025, 10:02 PM

Yes. That works

fast-school-44220

09/03/2025, 10:06 PM

Could I do something like get the transitive targets, hydrate the sources, and then iterate until the list stabilizes? Hopefully with caching it wouldn't be too painful?

fast-school-44220

09/03/2025, 10:08 PM

Or I guess a proper graph traversal would make more sense, but same basic idea... generate more dependencies and then rescan for new ones.

fast-school-44220

09/03/2025, 10:13 PM

Or does inference happen so early in Target construction time that there's not really a good way to repeat it?

happy-kitchen-89482

09/03/2025, 10:19 PM

Yes, IIRC it happens early , and it would take some non-trivial changes for the build graph to be constructed in an iterable way. Although that would be ideal.

happy-kitchen-89482

09/03/2025, 10:21 PM

I have a feeling that we don’t do anything similar today with python/jvm/go, because the deps of the generated code are conceptually known before generation happens. They are either A) upstream generated code, or B) proto API code.

happy-kitchen-89482

09/03/2025, 10:21 PM

Sounds like you have a third case

fast-school-44220

09/03/2025, 10:27 PM

Yes. Without going into too much detail we currently have a Makepp based system that allows us to dynamically discover both new dependencies and new targets while the build is in flight and just splice them into the build graph. And since that capability is there, users have taken advantage of it in ways we never anticipated. Not ideal, but it's entrenched.

fast-school-44220

09/03/2025, 10:31 PM

I'll have to think about it some more, but I think we can probably deal with only inferring dependencies from primary sources. The bigger problem is going to be the dynamic addition of "targets" (to use the make term) to the graph.

fast-school-44220

09/03/2025, 10:33 PM

For example, we have one tool flow that builds up a database from lots of sources - more or less similar to the C data flow of source file -> object file -> library. But then there's a new set of header files generated from the database by another tool, and the names of those files depend on the names of the objects defined in the source code (which generally don't match the source file names).

fast-school-44220

09/03/2025, 10:35 PM

I'm not even sure how I would express a dependency on a 2nd generation derived file in the Address syntax.

fast-school-44220

09/03/2025, 10:36 PM

foo.c, bar.c -> foo.o, bar.o -> libfoobar.so main.c -> main.o, libfoobar.so -> mainfoobar.exe

fast-school-44220

09/03/2025, 10:39 PM

libfoobar.so depends on foo.c#foo.o, but main.c doesn't depend on anything, so how does libfoobar.so get pulled in for linking mainfoobar.exe?

fast-school-44220

09/03/2025, 10:40 PM

(again, my flow isn't compiling C code, but it illustrates the example reasonably well. note: this isn't the database driven flow I mentioned above, that's even more complex so I want to understand this easier one first)

fast-school-44220

09/03/2025, 11:18 PM

Another interesting dynamic target case we have... a tool will generate any file you ask for with the file name pattern aaa_x_y_z where the contents of the file are determined by the values of x, y, and z. And some extra "accessory" files get generated too, the names of which are effectively random, but deterministic. And the (x, y, z) tuple space is so large there's no way to predict what values will be needed prior to inferring a dependency on the aaa_x_y_z file.

happy-kitchen-89482

09/04/2025, 4:44 PM

In answer specifically to “libfoobar.so depends on foo.c#foo.o, but main.c doesn’t depend on anything, so how does libfoobar.so get pulled in for linking mainfoobar.exe?“: You’d have a

cc_binary()

target (or similar, I’m pretending that your code is C/C++ for simplicity) that has explicit

dependencies

on its entry point, and then dep inference can take over.

happy-kitchen-89482

09/04/2025, 4:45 PM

In your case, how can

main.c

not depend on anything? I would have assumed it must

#include

and invoke the code in the lib?

happy-kitchen-89482

09/04/2025, 4:45 PM

But if not, then the lib would need to be an explicit dep as well

fast-school-44220

09/04/2025, 4:50 PM

sure, but it's just a header file inclusion, building main.o doesn't need the .so object.

happy-kitchen-89482

09/04/2025, 4:50 PM

But zooming out to the general problem, your use case is really interesting, and kind of the opposite of the JVM use case. In JVM you need to compile the deps of foo.java before you can compile foo.java itself, because the classfiles of those deps must be on the compiler’s classpath when foo.java is compiled. This means that deps have to be inferred entirely from sources, as they must all be recursively known before any compiling can happen. But in your case it sounds like you have preprocessor phase that is used for dynamic dep discovery. So you preprocess in reverse order, and then compile everything concurrently (since the preprocessor gives you independent translation units)?

fast-school-44220

09/04/2025, 4:51 PM

Yes, good summary

happy-kitchen-89482

09/04/2025, 4:51 PM

Sure, each

.o

builds entirely independently after the preprocessor runs, since the result of the preprocessor is a single file translation unit that can be compiled with no further inputs. You only bring everything together at link time.

happy-kitchen-89482

09/04/2025, 4:51 PM

C/C++ is radically different from JVM in this regard.

happy-kitchen-89482

09/04/2025, 4:52 PM

(and I prefer the preprocessor model to the JVM model, but that’s just my 2 cents)

happy-kitchen-89482

09/04/2025, 4:53 PM

So, this is definitely possible, but you may have to ignore some existing Pants mechanisms, and go one level lower in the APIs

happy-kitchen-89482

09/04/2025, 4:53 PM

Is this something you can put in a public repo for us to look at?

fast-school-44220

09/04/2025, 4:54 PM

probably not the real stuff, but I imagine I can construct a synthetic C example that shows the case. Just a simple Makefile to use to demonstrate the flow?

happy-kitchen-89482

09/04/2025, 4:55 PM

So you need rules to implement something like “start from some given ‘root’ files, run the preprocessor on each, examine the output of that preprocessor, infer deps from it, and recurse on that process until you’ve built up a transitive closure of preprocessed translation units”

happy-kitchen-89482

09/04/2025, 4:55 PM

Then you compile all those translation units entirely independently (I’m assuming a C-like model here)

happy-kitchen-89482

09/04/2025, 4:55 PM

Then you link them all together

happy-kitchen-89482

09/04/2025, 4:56 PM

Does that sound right?

fast-school-44220

09/04/2025, 4:57 PM

Yes, that's right

happy-kitchen-89482

09/04/2025, 5:03 PM

Ah, but protobuf complicates things

happy-kitchen-89482

09/04/2025, 5:03 PM

How does that fit in?

fast-school-44220

09/04/2025, 5:03 PM

note that it's not actually protobuf, we don't have any python code in this particular flow

fast-school-44220

09/04/2025, 5:04 PM

I was just using it as an example of a preprocessor

happy-kitchen-89482

09/04/2025, 5:10 PM

happy-kitchen-89482

09/04/2025, 5:13 PM

In that case, I think you can achieve that quite straightforwardly by ignoring most of the existing Pants machinery relating to dep inference and targets and all that. That was designed with different use cases in mind, and is very heavyweight. I would suggest writing this more or less from first principles. You do need some input targets, but I’m not sure you need to model all the intermediate stuff as targets. Just create your own ad-hoc dataclasses, and keep it all as lightweight as possible. If you can do this in the open we can advise.

fast-school-44220

09/04/2025, 5:14 PM

Would I still be able to hook into the graph traversal system by doing that?

happy-kitchen-89482

09/04/2025, 5:15 PM

What would be the need to do so?

happy-kitchen-89482

09/04/2025, 5:15 PM

But yes, you can still get Targets from Addresses as needed

fast-school-44220

09/04/2025, 5:16 PM

I'd still want to be able to call a goal and have Pants build up the graph from primary sources through the conversion steps to the goal.

fast-school-44220

09/04/2025, 5:17 PM

And especially take advantage of the caching and remote execution features.

happy-kitchen-89482

09/04/2025, 5:18 PM

So caching and remote execution is baked in to the engine at a very low level. So any time you invoke a

Process

, those are applied to it

happy-kitchen-89482

09/04/2025, 5:18 PM

You aren’t missing out on that by working at a lower level

happy-kitchen-89482

09/04/2025, 5:19 PM

And you can still hook into a goal

happy-kitchen-89482

09/04/2025, 5:19 PM

It’s more about how to model all the intermediate stuff

happy-kitchen-89482

09/04/2025, 5:20 PM

So for example, you may not need to generate

Target

instances for every source file your preprocessor creates. You could just track files directly.

happy-kitchen-89482

09/04/2025, 5:21 PM

That’s a bit handwavy, of course the devil is in the details. That’s why I’m wondering if you can work on a redacted version of this in the open.

fast-school-44220

09/04/2025, 5:23 PM

At first glance, it looks to me like it would be easiest to just create a new IntermediateTarget class that has a GeneratedSources instead of a SourcesField and the Dependencies are not inferred until after the sources have been Hydrated. But I realize that just because I can describe it that way, doesn't mean anything is setup to allow it to be implemented.

fast-school-44220

09/04/2025, 5:24 PM

I'll check in with my coworkers and see what we can cook up that can be worked publicly.

happy-kitchen-89482

09/04/2025, 6:06 PM

Well, I’m not sure you need a

Target

Field

class of any kind, is my point.

happy-kitchen-89482

09/04/2025, 6:06 PM

Start without and see how far you get

happy-kitchen-89482

09/04/2025, 6:07 PM

I mean, you do need a

cc_binary()

-like target type to represent the binary, an actual one in a BUILD file. And you need a

cc_sources()

-like target type to represent the sources on disk

happy-kitchen-89482

09/04/2025, 6:07 PM

But I’m not sure you have to represent intermediate generated sources that way

happy-kitchen-89482

09/04/2025, 6:13 PM

Also I should mention that while generally we are switching to calling rules by name: https://github.com/pantsbuild/pants/issues/19730

happy-kitchen-89482

09/04/2025, 6:13 PM

It is not properly documented yet

happy-kitchen-89482

09/04/2025, 6:13 PM

But there are issues with calling by name recursively, so you might need to continue using

Get

until we iron those out

fast-school-44220

09/04/2025, 6:17 PM

ah. I did see some of that when reading code and I was wonder what was up with the that calling syntax.

happy-kitchen-89482

09/04/2025, 6:30 PM

I would recommend using call-by-name wherever you can, and if you get rule graph construction failures that switching to

Get

resolves, then that is likely the recursive issue

happy-kitchen-89482

09/04/2025, 6:30 PM

Which obviously we will fix

10 Views

Open in Slack

Previous Next