So I m finally getting down to business and looking at a way Pants #plugins

So I'm finally getting down to business and lookin...

flat-zoo-31952

10/19/2021, 5:52 PM

So I'm finally getting down to business and looking at a way to integrate pants with an RPM-based system. I see several challenges to getting this to work, and I want to start laying them out here so I can understand the problem space before I start hacking away: 1. How can I tell pants how python modules can be provided by these system packages, as opposed to pip-installable packages? 2. How will I be able to run build steps with these (and only) system packages installed? Containers? Chroot? 3. If I use one of these solutions for build steps, I imagine I'll need to run multiple commands to set up and clear the build environment. Is this compatible with the way Pants executes rules now?

fast-nail-55400

10/19/2021, 5:54 PM

FYI there is prior art in Pants v1. I wrote an RPM plugin for Pants while at Foursquare: https://github.com/foursquare/fsqio/tree/master/src/python/fsqio/pants/rpmbuild

🙏🏻 1

fast-nail-55400

10/19/2021, 5:57 PM

It’s been long enough that I don’t remember much about it, and obviously v2 != v1, but some of how that plugin handled RPM builds might point at answers for your questions.

fast-nail-55400

10/19/2021, 5:59 PM

the plugin used a docker container to run the RPM builds using a standard template: https://github.com/foursquare/fsqio/blob/master/src/python/fsqio/pants/rpmbuild/tasks/dockerfile_template.mustache

proud-dentist-22844

10/19/2021, 6:15 PM

I think someone else was looking at doing a plugin for deb files. Implementation wise, I think it would be good if we can leverage

fpm

so that you can at least build the RPMs on any platform, not just on RPM platforms.

proud-dentist-22844

10/19/2021, 6:16 PM

fpm

can be leveraged for both deb and rpm. So, more shared stuff and less to maintain for each system package type we add.

flat-zoo-31952

10/19/2021, 6:18 PM

I think this could only be done at a really, really high level, or perhaps for a restricted set of use cases. There's quite a bit of difference between deb and rpm fundamentally, and then figuring out two dependency chains seems complicated as well

flat-zoo-31952

10/19/2021, 6:21 PM

I mean, if your goal is to simply package apps into os-level packages, I think FPM could work for that use case. But I'm looking to leverage RPM ecosystem at every step in the build process (build, test, run, package) and to manage dependencies in terms of RPM repo streams. I don't think it's the same use case. I could be wrong though. What would you like to see in something like this?

proud-dentist-22844

10/19/2021, 6:21 PM

Yes. a lot of the target/rule logic will differ, but they can share an FpmSubsystem.

proud-dentist-22844

10/19/2021, 6:23 PM

Eventually, I want to build RPMs and debs. But, I want to package up a virtualenv (not a pex) in the system package. So, I guess I’m bypassing a lot of the issues integrating with system python.

proud-dentist-22844

10/19/2021, 6:29 PM

For question number 2: I wonder how much control the remote execution API provides for system package management, or if there are any guarantees about which system is used to run your build. Locally, pants can run on MacOSX and so might need docker desktop or other additional infra to be able to run such pants build steps. But remotely? I wonder about REAPI and how it will affect the design here.

fast-nail-55400

10/19/2021, 6:30 PM

remote execution currently requires the platform of the host and remote executor to match. this is a known limitation in the Pants REAPI client. (of which I’ve made a lot of the recent changes to, so feel free to ask me about it)

proud-dentist-22844

10/19/2021, 6:31 PM

I haven’t actually used the remote stuff yet. So, if I’m running pants locally on a darwin system, the remote executor has to be darwin as well? Not linux?

fast-nail-55400

10/19/2021, 6:32 PM

yes although unclear if that has ever been tried in practice since I don’t believe any of the existing REAPI servers work on macOS, just linux

proud-dentist-22844

10/19/2021, 6:32 PM

oh. That’s going to be an interesting rabbit hole once I get there… But I think I’ve digressed from the OP topic.

fast-nail-55400

10/19/2021, 6:32 PM

it’s annoying since when I test the Pants REAPI client on my macOS laptop I have to start a Linux VM and run both the server and Pants client in the VM

🤦 1

proud-dentist-22844

10/19/2021, 6:34 PM

Do the distros have to match as well? EL vs debian based?

fast-nail-55400

10/19/2021, 6:34 PM

They shouldn’t at least for Python since pex does its interpreter selection running in the remote execution environment.

flat-zoo-31952

10/19/2021, 6:35 PM

Yeah the whole issue with system packaging, afaict, there's no way to make a system appear to have an arbitrary set of system packages without involving some kinda of chroot mechanism (with containers being a much more complete chroot mechanism)

fast-nail-55400

10/19/2021, 6:35 PM

but once you build a pex with a native C module and try to use it on the other platform, boom

💥 1

fast-nail-55400

10/19/2021, 6:36 PM

we do mix the platform into the cache key so not an issue, but pex sometimes would be a “platform: none” and technically usable on both platforms

fast-nail-55400

10/19/2021, 6:37 PM

although I’m blanking on the particular form of how the platform incompatibilities showed up, so don’t take my word for it on the “native C module” reference. could have been something else.

proud-dentist-22844

10/19/2021, 6:37 PM

Well, I am very interested in this topic… I really want to make a mess of ruby go 💥. That mess of ruby is used to marshall building debs and rpms in containers. It’s an awful mess.

fast-nail-55400

10/19/2021, 6:38 PM

but easy enough to replicate the incompatibility, merely by spinning up one of the servers in a docker container and pointing Pants at it.

fast-nail-55400

10/19/2021, 6:40 PM

my v1 RPM plugin ran

docker

directly. if the remote execution environment could run

docker

, then that approach probably could still work (since the Pants v2 rule would just be asking for a

Process

to be run)

fast-nail-55400

10/19/2021, 6:40 PM

but if the remote executor itself is running in a docker container, now you have docker-in-docker which is … fun

flat-zoo-31952

10/19/2021, 6:41 PM

DinD can be avoided 99% of the time just be re-using the host's docker daemon by mounting the docker binary and daemon socket

flat-zoo-31952

10/19/2021, 6:48 PM

Reading this (rpmbuild_task.py) now @fast-nail-55400. I don't really know how caching worked in v1. Is the workunit thing cache-related? Or were you just relying on the local docker daemon to cache builds itself?

👀 1

fast-nail-55400

10/19/2021, 6:50 PM

yeah looks like it relied on just orchestrating the docker invocation, so if you were iterating on a package, then it would be docker caching that would speed builds up

fast-nail-55400

10/19/2021, 6:50 PM

that would be the case even in v2 if all the rule did was write a Dockerfile and invoke docker

fast-nail-55400

10/19/2021, 6:50 PM

those layers wouldn’t end up in the REAPI CAS

fast-nail-55400

10/19/2021, 6:51 PM

(similar issue in CI systems with trying to cache Docker layers built by a CI job)

fast-nail-55400

10/19/2021, 6:51 PM

I wonder how much of the the Pants v2 docker support would be usable to the effort

fast-nail-55400

10/19/2021, 6:53 PM

those layers wouldn’t end up in the REAPI CAS

at least not without an

docker export

or some other equivalent (skopeo is actually a pretty good tool for stuff like that). also by “REAPI CAS” I also mean Pants’ local cache

flat-zoo-31952

10/19/2021, 6:54 PM

Yeah what I think would be interesting here is being able to export docker layers and metadata in a way the CAS can deal with

flat-zoo-31952

10/19/2021, 6:55 PM

Kinda interested in buildah for this, in that it's kinda a throwback to a command-and-commit style of building container images

fast-nail-55400

10/19/2021, 6:58 PM

when I was researching a remote execution product for Toolchain,

umoci

(https://github.com/opencontainers/umoci) was very useful for image manipulation.

fast-nail-55400

10/19/2021, 6:58 PM

(and https://github.com/containers/skopeo was useful for image downloads and conversion). both deal with the image directly without involving having to spin up a docker container.

flat-zoo-31952

10/19/2021, 7:00 PM

thanks, being able to make this work without an actual docker daemon is a requirement for me

fast-nail-55400

10/19/2021, 7:00 PM

looking at the Pants v2 docker subsystem, look like it still invokes docker

flat-zoo-31952

10/19/2021, 7:01 PM

yeah that's not the wave for my use case (and I'll need to describe this use case better)

flat-zoo-31952

10/19/2021, 7:02 PM

i'm actually less interested in building containers for consumption as artifacts than I am in using containers to create hermetic environments for build steps that depend on system-level packaging constructs

fast-nail-55400

10/19/2021, 7:03 PM

including in remote execution?

flat-zoo-31952

10/19/2021, 7:04 PM

eventually yeah

flat-zoo-31952

10/19/2021, 7:06 PM

as it stands, iiuc, pants does hermetic runs by creating a temp process execution dir, copying whatever files need to be copied in there, running the command, and then copying out result files that it needs to pick up, and caches those

fast-nail-55400

10/19/2021, 7:06 PM

correct, at least locally. for remote execution, that will depend on what the particular server in use does and what can be configured.

fast-nail-55400

10/19/2021, 7:08 PM

but similar concept, temp directory that is wiped away once the outputs are captured to the CAS

flat-zoo-31952

10/19/2021, 7:08 PM

well, I'll focus on local for a sec, for the sake of discussion

flat-zoo-31952

10/19/2021, 7:11 PM

Let's say instead of a tmp dir, I used a new container as the basis of my execution environment; and when the process(es) completed, I could either commit that writable layer to a new image that I could export, or I could copy specific files out of it. Either way, I have some immutable result that could go into CAS

flat-zoo-31952

10/19/2021, 7:11 PM

Does that make sense?

fast-nail-55400

10/19/2021, 7:14 PM

yes. note though at least in the Pants/REAPI execution model, you would still have a tmpdir for the build action but it run be a bash script with the tool invocations needed to invoke the container and then ensure the (let’s say OCI image format) image is available in the tmpdir for capture to the CAS

fast-nail-55400

10/19/2021, 7:14 PM

since CAS capture requires the output to be in the action’s tmpdir

fast-nail-55400

10/19/2021, 7:15 PM

(and future invocations can then load that output in the next action’s input root by referencing its digest)

fast-nail-55400

10/19/2021, 7:16 PM

although that execution model is not that performant currently if the input root’s size is really large due to cost of writing to disk

fast-nail-55400

10/19/2021, 7:16 PM

we hit that with the Go plugin when we originally supported mounting the Go SDK into the input root

fast-nail-55400

10/19/2021, 7:16 PM

and container images conceivably will be really large

fast-nail-55400

10/19/2021, 7:16 PM

(although the “append-only” cache feature of the Pants execution model could help there, i.e. “named caches”)

flat-zoo-31952

10/19/2021, 7:18 PM

there's probably a lot of things I could do for performance if it comes to that...

fast-nail-55400

10/19/2021, 7:19 PM

but I agree with your idea of manipulating the image as its own entity

fast-nail-55400

10/19/2021, 7:19 PM

it’s why I ended up using

umoci

for example for the Toolchain research. we could unpack an image to disk, modify it in unpacked form as just a filesystem layout (in OCI format), and repack it without any docker invocation.

👍🏻 1

flat-zoo-31952

10/19/2021, 7:21 PM

noted

flat-zoo-31952

10/19/2021, 7:23 PM

so this at least answers question #3: dump a script into the process execution dir that invokes the steps you need and makes sure the output is there to be captured

flat-zoo-31952

10/19/2021, 7:24 PM

and probably question #2 will be answered by playing with container tools to find something that has reasonable performance for this purpose

flat-zoo-31952

10/19/2021, 7:24 PM

but for #1... I guess I should provide an example of what I mean...

fast-nail-55400

10/19/2021, 7:25 PM

re #1, would a concept similar to maven’s jar “scope” be relevant?

fast-nail-55400

10/19/2021, 7:25 PM

i.e., “provided” scope

flat-zoo-31952

10/19/2021, 7:26 PM

So what if I want to tell Pants the the python

requests

module comes not from PyPI but from the fedora

python3-requests

package? And how could I hook that notion into the depedency inference system?

flat-zoo-31952

10/19/2021, 7:26 PM

I'm not that familiar with maven, but I'll google that

fast-nail-55400

10/19/2021, 7:27 PM

(for maven, a jar with “provided” scope is included in the Java classpath for compiing but is excluded from the “runtime” classpath as already provided by the deployment environment)

flat-zoo-31952

10/19/2021, 7:28 PM

hmm, that's not quite the same thing, although that might be a concept to think about in this whole design process

fast-nail-55400

10/19/2021, 7:28 PM

for #1, can you clarify what code is trying to use

requests

and how is it being packaged into the RPM?

fast-nail-55400

10/19/2021, 7:29 PM

for example, you should be able to build a pex and embed the pex into the RPM without having to solve

tell Pants the the python requests module comes not from PyPI but from the fedora python3-requests package

fast-nail-55400

10/19/2021, 7:30 PM

which implies that there is non-pex-packaged Python code in the RPM

flat-zoo-31952

10/19/2021, 7:30 PM

I'm not looking to build pexes at all

fast-nail-55400

10/19/2021, 7:31 PM

okay but then re the question, what target would need that dependency inference?

flat-zoo-31952

10/19/2021, 7:31 PM

My ultimate build artifacts will likely be sets of RPMs. Some of the inputs to building those will need to be knowledge of their dependencies

fast-nail-55400

10/19/2021, 7:32 PM

probably depends on the type of “source” being fed to RPM then?

flat-zoo-31952

10/19/2021, 7:33 PM

So if I want to build an RPM for a python module that just repeatedly uses

requests

to ping

<https://httpstat.us/200>

, I'll need the python file that does that, plus the metadata that that RPM depends on

python3-requests

fast-nail-55400

10/19/2021, 7:34 PM

which would have been encoded in the .spec file. are you proposing that Pants write the RPM spec instead of the spec being still written by the developer?

fast-nail-55400

10/19/2021, 7:35 PM

(for my Pants v1 plugin, the spec remained hand-written, rpm’s own dependency scan still applied for generating dependencies)

flat-zoo-31952

10/19/2021, 7:35 PM

Yes, the spec would be a template that would get filled in by build metadata determined by this Pants plugin

fast-nail-55400

10/19/2021, 7:36 PM

maintain a static mapping of PyPi module name to RPM package name?

flat-zoo-31952

10/19/2021, 7:36 PM

like a separate module_mapping?

fast-nail-55400

10/19/2021, 7:36 PM

yeah

fast-nail-55400

10/19/2021, 7:37 PM

or have a default rule and use a module_mapping table as an override)

flat-zoo-31952

10/19/2021, 7:37 PM

Makes sense

fast-nail-55400

10/19/2021, 7:37 PM

also this really isn’t pants dep inference, since you aren’t injecting a dep on another target. This is just this plugin filling in the RPM spec.

fast-nail-55400

10/19/2021, 7:38 PM

(putting aside deps between RPM packages managed by Pants)

flat-zoo-31952

10/19/2021, 7:39 PM

but the plugin would need to use pants inferred deps as input to this mapping

flat-zoo-31952

10/19/2021, 7:39 PM

i guess I just need the rule that captures that output...

flat-zoo-31952

10/19/2021, 7:41 PM

like is there a rule output type that captures the notion "these are the set of imports found in the first-party code targeted by this run"

fast-nail-55400

10/19/2021, 7:45 PM

Yes. Use

DependenciesRequest

TransitiveTargetsRequest

and filter down to python requirement targets.

🙏🏻 1

flat-zoo-31952

10/19/2021, 7:53 PM

awesome... this is coming into shape

flat-zoo-31952

10/19/2021, 7:58 PM

Thanks for helping, and letting me bounce these ideas off you. I expect this to be a really ambitious but worthwhile undertaking. And I'm a lot more confident in my current company that this could become something we could open source if it works out.

👍 1

flat-zoo-31952

10/19/2021, 8:07 PM

Maybe the RPM stuff isn't the most portable for everyone, but the ability to run build steps in containers and then use the images as output could be a game-changer for interacting with system packaging and other native code

Open in Slack

Previous Next