Hi folks I m working on a tool I m calling a binstub launche Pants #general

Hi folks! I'm working on a tool I'm calling a "bi...

limited-art-78990

01/13/2025, 2:04 PM

Hi folks! I'm working on a tool I'm calling a "binstub launcher" for Pants. This tool is designed to address two main challenges: 1. Native terminal experience: For example,

bin/mkdocs --help

vs.

pants run //:mkdocs -- --help

. The goal is to make the developer experience feel more seamless, enabling IDEs to discover executables on the PATH and aligning with official tool documentation. 2. Performance improvements: Specifically, enabling quick, parallel execution of multiple Pants targets (related: #7654). How the Tool Works Initial Execution When running

bin/mkdocs

, the launcher will: • Check the cache. • If the binary isn’t cached, it will build the target by running:

Copy code

pants package //:mkdocs

• Copy the package into the cache:

Copy code

LAUNCHER_CACHE=somewhere
CACHE_ID="Unsure how to generate this"
LAUNCHER_BINARY=$LAUNCHER_CACHE/$CACHE_ID/mkdocs
cp dist/mkdocs.pex $LAUNCHER_BINARY

• Finally, it will execute the binary:

Copy code

exec $LAUNCHER_BINARY

Subsequent Executions: The cache will be hit, serving the binary immediately, bypassing Pants. This is similar to tools like Hermit or dotslash, which handle pre-built executables efficiently. Leveraging Pants Metadata Pants already provides a build graph and can introspect targets, which I’d like to leverage to generate a unique identifier for caching. For example, with Bazel, I’d use genquery to obtain target metadata for this purpose. However, I'm unsure how to achieve something similar with Pants. Specifically, I’d like to avoid using the shasum of the executable (which requires building it first) and instead generate a cache key based on target metadata. I'm aware of the introspection methods, but I don't know how to access this information inside of Pants. For example:

Copy code

python_sources(name = "lib")

# Non-existent custom macro / rule which generates an output containing metadata
# accessible within pants by other targets
pants_metadata(
  name = "metadata",
  targets = [
    "//:mkdocs",
  ]
)

pex_binary(
    name = "launcher",
    dependencies = [
        ":lib",
        ":metadata",
    ],
    entry_point = "launcher.py",
)

Questions for the Community • Is there a way to extract metadata from Pants targets to generate a unique identifier for this purpose? • Are there any existing mechanisms in Pants that might help streamline this approach? • Any feedback or suggestions to improve this design? Looking forward to hearing your thoughts and ideas! 😊

curved-manchester-66006

01/13/2025, 4:21 PM

I think this sounds cool and is an interesting avenue of exploration? I don't have well formed thoughts yet but: • Would checking if the

CACHE_ID

changed require invoking Pants for the TBD digest query, complicating the parallel invocation use case? • in the case of things like

mkdocs

(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all? • I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs" • We have some bash scripts along the lines of "invoke the pex from

dist

if it exists otherwise build it", which obviously suffer from version skew on any change.

limited-art-78990

01/14/2025, 6:38 AM

Thanks for sharing your thoughts! Thinking about it some more, I was thinking about the following flow: BUILD

Copy code

generate_binstubs(
  name = "generate_binstubs"
  targets = [
    "//:mkdocs"
  ]
)

This is a runnable rule, executed like

Copy code

pants run //:generate_binstubs

This will create symlinks in the

bin

using the pattern from Hermit

Copy code

# create launcher script
echo "..pants binstub launcher code.." > bin/pants-binstub-launcher

# create symlink bin/mkdocs
PACKAGE=mkdocs
PACKAGE_WITH_SHA=mkdocs-1102938asdasd

ln -s pants-binstub-launcher $PACKAGE_WITH_SHA 
ln -s $PACKAGE_WITH_SHA $PACKAGE

The user will end up with the following files in the

bin

directory:

Copy code

lrwxr-xr-x@  1 maarten  staff    20 14 Jan 07:28 mkdocs -> mkdocs-1102938asdasd
lrwxr-xr-x@  1 maarten  staff    22 14 Jan 07:28 mkdocs-1102938asdasd -> pants-binstub-launcher
-rw-r--r--@  1 maarten  staff     4 14 Jan 07:28 pants-binstub-launcher

Would checking if the
CACHE_ID
changed require invoking Pants for the TBD digest query, complicating the parallel invocation use case?

I want the

pants run

statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.

in the case of things like
mkdocs
(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all?

Great idea! The

generate_binstubs

could also add a test to validate if the binstubs currently in the repo actually represent the ones that would-be generated. Like a diff check and fail if it's different!

I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs"

I think they are saying that you don't want to use TTLs for things that are built inside the repo 🤔. In the

mkdocs

case, this library is "built" inside the repo (wheel is pulled down and converted into pex). So if you would put a TTL on this, even though the chance is low looking at the nature of the library, you might get the wrong version due to a cache hit.

We have some bash scripts along the lines of "invoke the pex from
dist
if it exists otherwise build it", which obviously suffer from version skew on any change.

Haha jep same! It has to be possible to make this smarter with the help of Pants!

curved-manchester-66006

01/14/2025, 5:50 PM

I want the pants run statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.

I think the part I'm missing is how do you tell when the cache_id is invalidated?

limited-art-78990

01/19/2025, 1:22 PM

@curved-manchester-66006 I’m trying to copy the implementation of Hermit. For example the following BUILD file

Copy code

# Define a `pex_binary` target for the MkDocs CLI.
pex_binary(
    name="mkdocs",
    entry_point="mkdocs",
)

# Generate peek output for the passed in dependencies with associated target digest
pants_peek(
    name="mkdocs_peek",
    dependencies=[
        ":mkdocs",
    ],
)

# generate binstubs based on peek digest
generate_binstubs(
    name="generate_binstubs",
    dependencies=[
        ":mkdocs_peek",
    ],
)

If the mkdocs target updates, then the peek output digest changes, resulting in a new symlink with a new CACHE ID when the

generate_binstubs

target is run. If

bin/mkdocs

is called, the symlink will be different, resulting in a cache miss busting the cache! Let me know if this makes any sense 😂. Or if there is an easier way to go about this!

careful-address-89803

01/23/2025, 3:19 AM

Possibly related, pants can now (well, in 2.24) export bins of pants-provided tools https://www.pantsbuild.org/prerelease/reference/goals/export#bin It puts everything in

./dist/bin

so you can just add it to your path and invoke directly. I'm pretty sure it supports caching of the export with standard pants machinery. Tapping into the machinery is documented in a walkthrough on making tools exportable (I think you can just follow the first half about subclassing ExportableTool). I think you could have a target tap into the machinery by requesting AllTargets, filtering your target type by fieldsset, and then generating the ExportResult. One difficulty with that approach would be that you probably want to reuse the PEX machinery. It's possible to invoke it with a request to the rule graph. I think this function call invokes package (you can see the rest of the context around it, which is a plugin to wrap a `package`able target as

resources

) I have the feeling there's more to it then that, but it might not?

careful-address-89803

01/23/2025, 3:22 AM

As for getting the introspection data within pants: Usually there's a particular class you can use to request that information (eg

DependenciesRequest

). You can see that in the implementation of the peek goal, it's just making many requests to the rule graph

👍 1

careful-address-89803

01/23/2025, 3:25 AM

Also if you implemented it with the builtin export machinery, you could get a list of `ExportableTool`s by just asking for its union membership with

union_membership.get(ExportableTool)

(that's how we get the suggestions in "All valid exportable binaries: ['pex-cli', 'scc', 'shunit2']")

careful-address-89803

01/23/2025, 3:43 AM

An alternative view for calculating a hash: you could have a look at system_binary and how it does fingerprinting. It's more for tools that are expected to be installed on the system and therefore have defined versions (instead of just "latest in the repo") which can be easily discovered (ex with "--version")

👍 1

limited-art-78990

04/09/2025, 8:38 AM

Good morning! Thanks for the input @careful-address-89803, sorry I totally missed these messages 🤦

limited-art-78990

04/09/2025, 8:43 AM

I've used the export bin function like

Copy code

[export]
bin = [
    'pex-cli',
]

Which exports the

pex

binary to

dist/export/bins/pex

. But let's imagine the following scenario: 1. A new version of pants is released with a new version of the pex binary (bumps from v1 to v2) 2. The pants version is bumped in

pants.toml

and pushed into our work repo 3. New changes are pulled down by a developer 4. The developer invokes

dist/export/bins/pex --version

Unless I'm misunderstanding, the version would be

v1

? And the developer would need to manually run

pants export

to get the latest version of the

pex

binary?

careful-address-89803

04/10/2025, 12:52 AM

yeah, once they're exported they aren't necessarily kept up to date with Pants. A couple easy solutions: •

pants export

is cached (I'm pretty sure) so you could just run it a lot • the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcher

./my_mkdocs

would run

dist/export/bin/pex --version

, and if it didn't match the expected it could run

pants export --bin=pex-cli

and then proceed I'm not sure how useful my comment about tapping into the export machinery actually is. In theory, you could use the export machinery to also generate your launcher stubs. You'd also have access to all of the information from peek, so you could do custom invalidation. But I still think you'd need to use some of the above points for keeping it up to date anyhow (at least if you wanted to avoid invoking pants each time).

limited-art-78990

04/16/2025, 11:42 AM

• the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcher
./my_mkdocs
would run
dist/export/bin/pex --version
, and if it didn't match the expected it could run
pants export --bin=pex-cli
and then proceed

This is what I'm currently doing! Currently I specify a couple files which if changed, cause a new invocation of

pants package ...

. Because Pants knows the entire dependency tree, I'm hoping I can generate this list of files from pants, so I don't have to maintain this list manually.

careful-address-89803

04/21/2025, 5:00 PM

Does

pants dependencies --transitive --format=json path/to:target

do what you need?

2 Views

Open in Slack

Previous Next