limited-art-78990
01/13/2025, 2:04 PMbin/mkdocs --help
vs. pants run //:mkdocs -- --help
. The goal is to make the developer experience feel more seamless, enabling IDEs to discover executables on the PATH and aligning with official tool documentation.
2. Performance improvements: Specifically, enabling quick, parallel execution of multiple Pants targets (related: #7654).
How the Tool Works
Initial Execution
When running bin/mkdocs
, the launcher will:
• Check the cache.
• If the binary isn’t cached, it will build the target by running:
pants package //:mkdocs
• Copy the package into the cache:
LAUNCHER_CACHE=somewhere
CACHE_ID="Unsure how to generate this"
LAUNCHER_BINARY=$LAUNCHER_CACHE/$CACHE_ID/mkdocs
cp dist/mkdocs.pex $LAUNCHER_BINARY
• Finally, it will execute the binary:
exec $LAUNCHER_BINARY
Subsequent Executions:
The cache will be hit, serving the binary immediately, bypassing Pants. This is similar to tools like Hermit or dotslash, which handle pre-built executables efficiently.
Leveraging Pants Metadata
Pants already provides a build graph and can introspect targets, which I’d like to leverage to generate a unique identifier for caching. For example, with Bazel, I’d use genquery to obtain target metadata for this purpose.
However, I'm unsure how to achieve something similar with Pants. Specifically, I’d like to avoid using the shasum of the executable (which requires building it first) and instead generate a cache key based on target metadata. I'm aware of the introspection methods, but I don't know how to access this information inside of Pants. For example:
python_sources(name = "lib")
# Non-existent custom macro / rule which generates an output containing metadata
# accessible within pants by other targets
pants_metadata(
name = "metadata",
targets = [
"//:mkdocs",
]
)
pex_binary(
name = "launcher",
dependencies = [
":lib",
":metadata",
],
entry_point = "launcher.py",
)
Questions for the Community
• Is there a way to extract metadata from Pants targets to generate a unique identifier for this purpose?
• Are there any existing mechanisms in Pants that might help streamline this approach?
• Any feedback or suggestions to improve this design?
Looking forward to hearing your thoughts and ideas! 😊curved-manchester-66006
01/13/2025, 4:21 PMCACHE_ID
changed require invoking Pants for the TBD digest query, complicating the parallel invocation use case?
• in the case of things like mkdocs
(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all?
• I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs"
• We have some bash scripts along the lines of "invoke the pex from dist
if it exists otherwise build it", which obviously suffer from version skew on any change.limited-art-78990
01/14/2025, 6:38 AMgenerate_binstubs(
name = "generate_binstubs"
targets = [
"//:mkdocs"
]
)
This is a runnable rule, executed like
pants run //:generate_binstubs
This will create symlinks in the bin
using the pattern from Hermit
# create launcher script
echo "..pants binstub launcher code.." > bin/pants-binstub-launcher
# create symlink bin/mkdocs
PACKAGE=mkdocs
PACKAGE_WITH_SHA=mkdocs-1102938asdasd
ln -s pants-binstub-launcher $PACKAGE_WITH_SHA
ln -s $PACKAGE_WITH_SHA $PACKAGE
The user will end up with the following files in the bin
directory:
lrwxr-xr-x@ 1 maarten staff 20 14 Jan 07:28 mkdocs -> mkdocs-1102938asdasd
lrwxr-xr-x@ 1 maarten staff 22 14 Jan 07:28 mkdocs-1102938asdasd -> pants-binstub-launcher
-rw-r--r--@ 1 maarten staff 4 14 Jan 07:28 pants-binstub-launcher
Would checking if theI want thechanged require invoking Pants for the TBD digest query, complicating the parallel invocation use case?CACHE_ID
pants run
statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.
in the case of things likeGreat idea! The(that maybe change rarely) but not internal things, maybe could enforce consistency at CI time? Or 'rebuild' them all?mkdocs
generate_binstubs
could also add a test to validate if the binstubs currently in the repo actually represent the ones that would-be generated. Like a diff check and fail if it's different!
I've read https://dotslash-cli.com/docs/limitations/#potential-version-skew-between-code-changes-and-dotslash-changes a few times and not sure if the recommendation is "maybe use TTLs" or "you will shoot yourself in the foot with TTLs"I think they are saying that you don't want to use TTLs for things that are built inside the repo 🤔. In the
mkdocs
case, this library is "built" inside the repo (wheel is pulled down and converted into pex). So if you would put a TTL on this, even though the chance is low looking at the nature of the library, you might get the wrong version due to a cache hit.
We have some bash scripts along the lines of "invoke the pex fromHaha jep same! It has to be possible to make this smarter with the help of Pants!if it exists otherwise build it", which obviously suffer from version skew on any change.dist
curved-manchester-66006
01/14/2025, 5:50 PMI want the pants run statement to already gather information from pants and write the cache id's into the symlinks themselves. This to prevent having to reach out to pants and having the parallel issue.I think the part I'm missing is how do you tell when the cache_id is invalidated?
limited-art-78990
01/19/2025, 1:22 PM# Define a `pex_binary` target for the MkDocs CLI.
pex_binary(
name="mkdocs",
entry_point="mkdocs",
)
# Generate peek output for the passed in dependencies with associated target digest
pants_peek(
name="mkdocs_peek",
dependencies=[
":mkdocs",
],
)
# generate binstubs based on peek digest
generate_binstubs(
name="generate_binstubs",
dependencies=[
":mkdocs_peek",
],
)
If the mkdocs target updates, then the peek output digest changes, resulting in a new symlink with a new CACHE ID when the generate_binstubs
target is run. If bin/mkdocs
is called, the symlink will be different, resulting in a cache miss busting the cache!
Let me know if this makes any sense 😂. Or if there is an easier way to go about this!careful-address-89803
01/23/2025, 3:19 AM./dist/bin
so you can just add it to your path and invoke directly. I'm pretty sure it supports caching of the export with standard pants machinery.
Tapping into the machinery is documented in a walkthrough on making tools exportable (I think you can just follow the first half about subclassing ExportableTool). I think you could have a target tap into the machinery by requesting AllTargets, filtering your target type by fieldsset, and then generating the ExportResult. One difficulty with that approach would be that you probably want to reuse the PEX machinery. It's possible to invoke it with a request to the rule graph. I think this function call invokes package (you can see the rest of the context around it, which is a plugin to wrap a `package`able target as resources
)
I have the feeling there's more to it then that, but it might not?careful-address-89803
01/23/2025, 3:22 AMDependenciesRequest
). You can see that in the implementation of the peek goal, it's just making many requests to the rule graphcareful-address-89803
01/23/2025, 3:25 AMunion_membership.get(ExportableTool)
(that's how we get the suggestions in "All valid exportable binaries: ['pex-cli', 'scc', 'shunit2']")careful-address-89803
01/23/2025, 3:43 AMlimited-art-78990
04/09/2025, 8:38 AMlimited-art-78990
04/09/2025, 8:43 AM[export]
bin = [
'pex-cli',
]
Which exports the pex
binary to dist/export/bins/pex
. But let's imagine the following scenario:
1. A new version of pants is released with a new version of the pex binary (bumps from v1 to v2)
2. The pants version is bumped in pants.toml
and pushed into our work repo
3. New changes are pulled down by a developer
4. The developer invokes dist/export/bins/pex --version
Unless I'm misunderstanding, the version would be v1
? And the developer would need to manually run pants export
to get the latest version of the pex
binary?careful-address-89803
04/10/2025, 12:52 AMpants export
is cached (I'm pretty sure) so you could just run it a lot
• the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcher ./my_mkdocs
would run dist/export/bin/pex --version
, and if it didn't match the expected it could run pants export --bin=pex-cli
and then proceed
I'm not sure how useful my comment about tapping into the export machinery actually is. In theory, you could use the export machinery to also generate your launcher stubs. You'd also have access to all of the information from peek, so you could do custom invalidation. But I still think you'd need to use some of the above points for keeping it up to date anyhow (at least if you wanted to avoid invoking pants each time).limited-art-78990
04/16/2025, 11:42 AM• the comment about fingerprinting would allow you to ensure you had the correct version of the exported bin. ex, your launcher stubs could check that the version matched the expected. So running a stub launcherThis is what I'm currently doing! Currently I specify a couple files which if changed, cause a new invocation ofwould run./my_mkdocs
, and if it didn't match the expected it could rundist/export/bin/pex --version
and then proceedpants export --bin=pex-cli
pants package ...
. Because Pants knows the entire dependency tree, I'm hoping I can generate this list of files from pants, so I don't have to maintain this list manually.careful-address-89803
04/21/2025, 5:00 PMpants dependencies --transitive --format=json path/to:target
do what you need?