Hi folks! I'm working on a tool I'm calling a "bi...
# general
l
Hi folks! I'm working on a tool I'm calling a "binstub launcher" for Pants. This tool is designed to address two main challenges: 1. Native terminal experience: For example,
bin/mkdocs --help
vs.
pants run //:mkdocs -- --help
. The goal is to make the developer experience feel more seamless, enabling IDEs to discover executables on the PATH and aligning with official tool documentation. 2. Performance improvements: Specifically, enabling quick, parallel execution of multiple Pants targets (related: #7654). How the Tool Works Initial Execution When running
bin/mkdocs
, the launcher will: • Check the cache. • If the binary isn’t cached, it will build the target by running:
Copy code
pants package //:mkdocs
• Copy the package into the cache:
Copy code
LAUNCHER_CACHE=somewhere
CACHE_ID="Unsure how to generate this"
LAUNCHER_BINARY=$LAUNCHER_CACHE/$CACHE_ID/mkdocs
cp dist/mkdocs.pex $LAUNCHER_BINARY
• Finally, it will execute the binary:
Copy code
exec $LAUNCHER_BINARY
Subsequent Executions: The cache will be hit, serving the binary immediately, bypassing Pants. This is similar to tools like Hermit or dotslash, which handle pre-built executables efficiently. Leveraging Pants Metadata Pants already provides a build graph and can introspect targets, which I’d like to leverage to generate a unique identifier for caching. For example, with Bazel, I’d use genquery to obtain target metadata for this purpose. However, I'm unsure how to achieve something similar with Pants. Specifically, I’d like to avoid using the shasum of the executable (which requires building it first) and instead generate a cache key based on target metadata. I'm aware of the introspection methods, but I don't know how to access this information inside of Pants. For example:
Copy code
python_sources(name = "lib")

# Non-existent custom macro / rule which generates an output containing metadata
# accessible within pants by other targets
pants_metadata(
  name = "metadata",
  targets = [
    "//:mkdocs",
  ]
)

pex_binary(
    name = "launcher",
    dependencies = [
        ":lib",
        ":metadata",
    ],
    entry_point = "launcher.py",
)
Questions for the Community • Is there a way to extract metadata from Pants targets to generate a unique identifier for this purpose? • Are there any existing mechanisms in Pants that might help streamline this approach? • Any feedback or suggestions to improve this design? Looking forward to hearing your thoughts and ideas! 😊
c
I think this sounds cool and is an interesting avenue of exploration? I don't have well formed thoughts yet but: • Would checking if the
CACHE_ID
changed require invoking Pants for the TBD digest query, complicating the parallel invocation use case? • in the case of things like
mkdocs
(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all? • I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs" • We have some bash scripts along the lines of "invoke the pex from
dist
if it exists otherwise build it", which obviously suffer from version skew on any change.
l
Thanks for sharing your thoughts! Thinking about it some more, I was thinking about the following flow: BUILD
Copy code
generate_binstubs(
  name = "generate_binstubs"
  targets = [
    "//:mkdocs"
  ]
)
This is a runnable rule, executed like
Copy code
pants run //:generate_binstubs
This will create symlinks in the
bin
using the pattern from Hermit
Copy code
# create launcher script
echo "..pants binstub launcher code.." > bin/pants-binstub-launcher

# create symlink bin/mkdocs
PACKAGE=mkdocs
PACKAGE_WITH_SHA=mkdocs-1102938asdasd

ln -s pants-binstub-launcher $PACKAGE_WITH_SHA 
ln -s $PACKAGE_WITH_SHA $PACKAGE
The user will end up with the following files in the
bin
directory:
Copy code
lrwxr-xr-x@  1 maarten  staff    20 14 Jan 07:28 mkdocs -> mkdocs-1102938asdasd
lrwxr-xr-x@  1 maarten  staff    22 14 Jan 07:28 mkdocs-1102938asdasd -> pants-binstub-launcher
-rw-r--r--@  1 maarten  staff     4 14 Jan 07:28 pants-binstub-launcher
Would checking if the
CACHE_ID
changed require invoking Pants for the TBD digest query, complicating the parallel invocation use case?
I want the
pants run
statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.
in the case of things like
mkdocs
(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all?
Great idea! The
generate_binstubs
could also add a test to validate if the binstubs currently in the repo actually represent the ones that would-be generated. Like a diff check and fail if it's different!
I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs"
I think they are saying that you don't want to use TTLs for things that are built inside the repo 🤔. In the
mkdocs
case, this library is "built" inside the repo (wheel is pulled down and converted into pex). So if you would put a TTL on this, even though the chance is low looking at the nature of the library, you might get the wrong version due to a cache hit.
We have some bash scripts along the lines of "invoke the pex from
dist
if it exists otherwise build it", which obviously suffer from version skew on any change.
Haha jep same! It has to be possible to make this smarter with the help of Pants!
c
I want the pants run statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.
I think the part I'm missing is how do you tell when the cache_id is invalidated?
l
@curved-manchester-66006 I’m trying to copy the implementation of Hermit. For example the following BUILD file
Copy code
# Define a `pex_binary` target for the MkDocs CLI.
pex_binary(
    name="mkdocs",
    entry_point="mkdocs",
)

# Generate peek output for the passed in dependencies with associated target digest
pants_peek(
    name="mkdocs_peek",
    dependencies=[
        ":mkdocs",
    ],
)

# generate binstubs based on peek digest
generate_binstubs(
    name="generate_binstubs",
    dependencies=[
        ":mkdocs_peek",
    ],
)
If the mkdocs target updates, then the peek output digest changes, resulting in a new symlink with a new CACHE ID when the
generate_binstubs
target is run. If
bin/mkdocs
is called, the symlink will be different, resulting in a cache miss busting the cache! Let me know if this makes any sense 😂. Or if there is an easier way to go about this!
c
Possibly related, pants can now (well, in 2.24) export bins of pants-provided tools https://www.pantsbuild.org/prerelease/reference/goals/export#bin It puts everything in
./dist/bin
so you can just add it to your path and invoke directly. I'm pretty sure it supports caching of the export with standard pants machinery. Tapping into the machinery is documented in a walkthrough on making tools exportable (I think you can just follow the first half about subclassing ExportableTool). I think you could have a target tap into the machinery by requesting AllTargets, filtering your target type by fieldsset, and then generating the ExportResult. One difficulty with that approach would be that you probably want to reuse the PEX machinery. It's possible to invoke it with a request to the rule graph. I think this function call invokes package (you can see the rest of the context around it, which is a plugin to wrap a `package`able target as
resources
) I have the feeling there's more to it then that, but it might not?
As for getting the introspection data within pants: Usually there's a particular class you can use to request that information (eg
DependenciesRequest
). You can see that in the implementation of the peek goal, it's just making many requests to the rule graph
👍 1
Also if you implemented it with the builtin export machinery, you could get a list of `ExportableTool`s by just asking for its union membership with
union_membership.get(ExportableTool)
(that's how we get the suggestions in "All valid exportable binaries: ['pex-cli', 'scc', 'shunit2']")
An alternative view for calculating a hash: you could have a look at system_binary and how it does fingerprinting. It's more for tools that are expected to be installed on the system and therefore have defined versions (instead of just "latest in the repo") which can be easily discovered (ex with "--version")
👍 1
l
Good morning! Thanks for the input @careful-address-89803, sorry I totally missed these messages 🤦
I've used the export bin function like
Copy code
[export]
bin = [
    'pex-cli',
]
Which exports the
pex
binary to
dist/export/bins/pex
. But let's imagine the following scenario: 1. A new version of pants is released with a new version of the pex binary (bumps from v1 to v2) 2. The pants version is bumped in
pants.toml
and pushed into our work repo 3. New changes are pulled down by a developer 4. The developer invokes
dist/export/bins/pex --version
Unless I'm misunderstanding, the version would be
v1
? And the developer would need to manually run
pants export
to get the latest version of the
pex
binary?
c
yeah, once they're exported they aren't necessarily kept up to date with Pants. A couple easy solutions: •
pants export
is cached (I'm pretty sure) so you could just run it a lot • the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcher
./my_mkdocs
would run
dist/export/bin/pex --version
, and if it didn't match the expected it could run
pants export --bin=pex-cli
and then proceed I'm not sure how useful my comment about tapping into the export machinery actually is. In theory, you could use the export machinery to also generate your launcher stubs. You'd also have access to all of the information from peek, so you could do custom invalidation. But I still think you'd need to use some of the above points for keeping it up to date anyhow (at least if you wanted to avoid invoking pants each time).
l
• the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcher
./my_mkdocs
would run
dist/export/bin/pex --version
, and if it didn't match the expected it could run
pants export --bin=pex-cli
and then proceed
This is what I'm currently doing! Currently I specify a couple files which if changed, cause a new invocation of
pants package ...
. Because Pants knows the entire dependency tree, I'm hoping I can generate this list of files from pants, so I don't have to maintain this list manually.
c
Does
pants dependencies --transitive --format=json path/to:target
do what you need?