Hello :wave: I'm currently exploring Pants for our...
# general
b
Hello 👋 I'm currently exploring Pants for our Python monorepo at work, and am running into some difficulties with the protobuf code generation part of things. I'm not sure if this is a bug, misconfiguration and misunderstanding on my part, or something else, so any help is appreciated. At a high level, we have some protobufs that we need to compile into Python code and into Rust code (compiling the Rust code is pretty straightforward and isn't part of what I'm looking for here; we're currently looking at Pants to help manage the Python side of our codebase.) Additionally, we'd like to package up the generated Python code into a distribution. Since the protobuf definitions are off on their own, and since we generate code for more than just Python, I don't want the generated code to be dropped into the same directory. I've looked at the documentation that suggests adding
python_source_root
to the
protobuf_library
target (https://www.pantsbuild.org/docs/protobuf#protobuf-and-source-roots). The documentation suggests that this will put the generated code in the specified source root; however, in practice it looks like it just creates that directory structure under
dist/codegen
. Additionally, dependency resolution for code that's using the generated
*_pb2
modules doesn't appear to be working (as judged by the output of
./pants dependencies path/to/the/file/that/uses/protobufs.py
Another wrinkle is that we'd like to create a Python distribution containing just the generated protobuf code, but I'm not even sure how to express that in Pants notation... particularly with the need to have
__init__.py
files created as well. There seem to be a lot of moving parts here, and I'm not sure if there's a bug or (more likely) some incomplete understanding on my part, possibly coupled with some inconsistent file layout choices in this repository. That's a lot of information, and I can provide more as needed... I just wanted to lay out my perception of the high-level issues as a way to hopefully start a discussion. Thanks in advance for any advice folks may have.
👋 1
h
Hi Chris, welcome to Pants! I'll try and sort through the issue here, but will also let @hundreds-father-404 chime in if I get anything wrong.
b
@happy-kitchen-89482 Thanks very much 👍
h
Hey Chris! Welcome. First, to confirm, are you using 2.2 or newer? That's when we added dep inference for Protobuf
b
Thanks @hundreds-father-404 👋
Yes, I'm using 2.2.0
h
however, in practice it looks like it just creates that directory structure under dist/codegen.
You're running
./pants export-codegen
to get this, right? That is expected. That goal is purely for you to be able to view what files Pants is generating and to consume it externally, e.g. for IDE autocompletion. You do not need to ever manually run that
export-codegen
goal for codegen to work. Pants runs things in chroots. It will include the generated files in those chroots for you, and it should always be up-to-date.
export-codegen
will show you what those final paths are, but with a
dist/codegen
prefix.
b
OK, then I think that bit should be OK.
If I've got a root directory called
foo/bar/baz
, and I set that as a root in
pants.toml
and as a
python_source_root
for my
protobuf_library()
, I should expect the generated code to ultimately reside in
foo/bar/baz
and be dependency-resolved accordingly? That is, if I've got that as a directory containing just an empty
__init__.py
file, would the generated code then get recognized as part of a proper Python package?
h
Re dep inference, for now, try to manually add the deps and see if we can get the runtime working. I suspect that something with source roots could be off, and that getting the manual thing working will get dep inference working 1. Choose a Python file that uses Protobuf, and some simple thing you can run with it like
./pants test
where it fails. 2. For that Python file's owning
python_library
target, add to the
dependencies
field the path to that Protobuf file's directory, e.g.
src/protos/models
. (This assumes that you have one BUILD file per directory, with empty
python_library()
and
protobuf_library()
definitions) When running something like
./pants test
, you can use
--no-process-execution-cleanup-local-dirs
so that you can preserver the tmpdir and look to see that the paths are showing up how you'd expect No need to get all imports of Protobuf working - we only need to get at least one working, so that we can isolate the problem.
I should expect the generated code to ultimately reside in foo/bar/baz
Yes. If you have
src/protos/models/f.proto
, and you set the
python_source_root
to
foo/bar/baz
, then Pants should be creating the file
foo/bar/baz/models/f_pb2.py
.
if I've got that as a directory containing just an empty init.py file
Hmm, it's very plausible issues with
__init__.py
are at play. Let's start with confirming the
_pb2.py
file shows up properly, then we can check if the
__init__.py
files are showing up
b
OK, I'll give that a shot and report back; thanks!
h
My work is done... 🙂
b
🤔 I got a test working after renaming a bunch of directories (our code layout is a bit muddled at the moment 😕) I have my directory containing just an
__init__.py
file, and the protobuf code generation drops the new files off in that directory, and I can use them. The automatic dependency resolution doesn't appear to work, though, and I have to manually add it. Additionally, I can't seem to build a distribution for code that uses the generated code (let alone a distribution for just the generated code) because Pants says that no
python_distribution
owns the proto file in question.
But if I move that distribution definition up to a top-level directory (that ultimately contains both the proto file and the generated python code, it all works
I suppose I can deal with that 🙂
I'm going to see if I can somehow use https://www.pantsbuild.org/docs/reference-relocated_files with this to prevent me having to move a bunch of stuff around right now (ideally, I'd like to get Pants running on the code as it exists now, and then use that as a kind of facade to keep things running smoothly while I rearrange the layout under the hood)
@hundreds-father-404 @happy-kitchen-89482 Thanks for your help!
h
I suppose I can deal with that 🙂
I'm glad there's a workaround, but that's not very satisfying and we'd love to figure this out. A major goal for Pants is that you do not need to reorganize your codebase (at least much) to get Pants working. And we have more work to do if that is not the case. I don't think
relocated_files
will work in the context of Protobuf; that won't trigger codegen, and it only works on files that you essentially want to have unmodified, outside of their path. Like JSON files or PNG files
Would it be possible to either share parts of your code, or create a minimal reproduction? The interesting things here are entirely file paths + imports + your pants.toml - not the actual logic/code.
b
Sure... it may take a little while to get things cleaned up and sort out exactly which parts were contributing to the behavior, but once that's sorted out, I can come up with a reproduction.
After digging a bit more, I think the crux of my issue was that our protobuf file structure on disk didn't really match the protobuf package structure, and so it wasn't really playing nicely with Pants' code generation, which was expecting "normal" protobufs. To "fix" this in Pants would appear to require being able to relocate individual generated Python files within a package after generation, rather than just using the
python_source_root
to direct where to put the entire package. That seems kind of niche, and a byproduct of us doing things incorrectly with protobufs. As such, I don't really view it as a problem with Pants. We need to clean up our protobufs for other reasons anyway; this merely gives us another reason to do it sooner 😅
h
Interestinggg! I didn't know that's a thing.
our protobuf file structure on disk didn't really match the protobuf package structure
What does this look like? My experience with Protobuf is limited, and I've only seen it where the
package
clause == the directory path (minus source roots)
b
That's basically what it should look like, and I think it's what most tooling in the ecosystem expects. I'm not quite sure how ours came to be how it currently is, but it had a package name that wasn't the same as the directory it was in, and we also wanted it to be nested within another directory. Kinda wacky.
👍 1
Looks like we had some other tooling that compensated for that, too. Now, I'm just going to fix it up, and then let Pants take care of the rest 🎉
💯 1
h
Awesome, thanks for the update! Let us know if you're still having issues with things, or more general feedback / feature requests
👍 1