This is probably a dumb question but here goes I have a `pex Pants #general

This is probably a dumb question, but here goes: ...

early-twilight-26336

06/21/2024, 7:48 PM

This is probably a dumb question, but here goes: I have a

pex_binary

target, that, when run, reads some files from the local file system. I can't count on those files being present on the local file system, so would like to always run a script to populate those files before the

pex_binary

is run with

pants run :the_pex_binary

. I have a feeling this may be somewhat antithetical to Pants' design, or out of scope, but wanted to check.

early-twilight-26336

06/21/2024, 7:49 PM

I've been looking at

run_shell_command

and

adhoc_tool

but they don't seem quite right. For one, if I define:

Copy code

run_shell_command(
    name="script",
    command="echo hello"
)

and then make it a dependency of the

pex_binary

, it doesn't seem to run.

wide-midnight-78598

06/21/2024, 7:55 PM

This feels like something maybe the new workspace environment can help with eventually. But, if I’m reading this, you want a side-effect to run before your pex_binary. Will the pex_binary contain the newly created files?

early-twilight-26336

06/21/2024, 8:03 PM

you want a side-effect to run before your pex_binary.

that's right. The binary won't contain the newly created files. At least, it doesn't need to. But if that's a way to do this, I'm game to try it.

wide-midnight-78598

06/21/2024, 8:04 PM

Well, I guess, if it's not necessarily in the pex - why not have a shim inside your code?

early-twilight-26336

06/21/2024, 8:07 PM

yeah I'm trying to avoid code changes since the context is a migration from a previous build system. But I could do that. what if I go the route of putting the files in the PEX? It's sort of unclear to me how those files are accessible through general file system APIs.

wide-midnight-78598

06/21/2024, 8:11 PM

I mean, there might be a way to use the shell script or another thing, but it feels like cramming determinism into a non-deterministic system (or vice versa, I'm not sure). Like: I want reproducible builds, but I conditionally want to change some part of the filesystem, which, after I make my build - might then change again ... before my next build? It's a bit abstract, so I'm trying to make it more concrete. The shim inside a pex file, assumes that you want these files created at runtime (which is what it sounded like), but it's also running through pants. Not suggesting it can't be done, but I haven't done it using Pants. But, using either a macro or a plugin, the world's your oyster

wide-midnight-78598

06/21/2024, 8:11 PM

I guess for me, what you're trying to achieve specifically might be more useful - so we're not XYing this problem

early-twilight-26336

06/21/2024, 8:16 PM

yeah, fair. So we have a previous build in Earthly. It's got some tasks like "run service X". The service, on startup, loads some BentoML models from the filesystem into memory. So in a dependent Earthly task, we made sure those BentoML models were present in the expected place by reading a YAML file and downloading them using the BentoML CLI (which fetches them from GCS). Then the service would run, see the files, and load them properly. Previously this was all happening in a Docker container, which I could probably do again, but am trying to avoid this time around.

early-twilight-26336

06/21/2024, 8:18 PM

I do have control over where we look for the BentoML model files. So if I can include them in the PEX and point the code at that spot, that's fine.

wide-midnight-78598

06/21/2024, 8:22 PM

Ah interesting, any reason to move off Earthly? I've never used it. Re question: just need to think for a sec

early-twilight-26336

06/21/2024, 8:25 PM

many reasons: • everything running in Docker is incredibly slow (you're always rebuilding images) • it's so general as to be useless - you need to manually define all your tasks like linting/formatting/tests/run/etc. It's not really aware of much besides Docker. • you still need something like Poetry to manage your deps, and once we hit about 8 pyproject.tomls in the repo it just became too much (it's a monorepo, but a small one)

🤯 1

early-twilight-26336

06/21/2024, 8:26 PM

(thanks for humouring me on this weird question, I appreciate it)

❤️ 1

wide-midnight-78598

06/21/2024, 8:30 PM

Oh wow, okay, I guess i really didn't know much about earthly - I thought it was more than just docker. Cool, thanks for letting me know! So, me personally, just speaking for myself - this feels like the kinda thing I would end up doing in Docker, or in pre-run bindings in a

scie

https://github.com/a-scie/jump I'm pretty sure this would all be do-able in a plugin. Natively though, I'm a little less sure. You've tried shell, which didn't work out (https://www.pantsbuild.org/2.21/docs/shell#testing-your-packaging-pipeline), and adhoc tool would be my next place to look. I'm curious if

archive

might be an option

wide-midnight-78598

06/21/2024, 8:31 PM

As far as I know, files and resources are expected to be in place: https://www.pantsbuild.org/2.21/docs/using-pants/assets-and-archives

wide-midnight-78598

06/21/2024, 8:32 PM

But yeah, Shell really feels like it would have been the move https://www.pantsbuild.org/2.21/reference/targets/shell_command

wide-midnight-78598

06/21/2024, 8:32 PM

Adhoc tool (https://www.pantsbuild.org/2.21/reference/targets/adhoc_tool), being a slightly improved approach

early-twilight-26336

06/21/2024, 8:32 PM

ah ok haven't looked at that (archive) yet, thanks for the pointer. It could be that I'm just doing shell wrong too. Docker does seem like the easy route, but I think the difficulty I would have is running the tests. They are integration tests that basically start the service and throw requests at it. So it does all this file loading then too.

wide-midnight-78598

06/21/2024, 8:33 PM

Yep, fair, I personally avoid docker when I can - as there's a burden in using it, that I often don't need.

wide-midnight-78598

06/21/2024, 8:34 PM

This was the thing I've bneen struggling to find as well: https://www.pantsbuild.org/2.21/reference/build-file-symbols/http_source Not sure if there is any value?

early-twilight-26336

06/21/2024, 8:36 PM

hmm what does it do exactly?

wide-midnight-78598

06/21/2024, 8:41 PM

As I understand it, it would allow using a remote URL as an HTTP source, pulling that into the pipeline - but not re-downloading if it exists. So, not quite reading a yaml file, but 🤷

wide-midnight-78598

06/21/2024, 8:42 PM

I'm curious about this though, what you've described feels possible - but, as I've never done it - I'm just less knowledgeable as a result. I might give it a shot tonight when I'm home

wide-midnight-78598

06/21/2024, 8:43 PM

Also, if you want a practical

adhoc_tool

example, this is one of me building a sveltekit project, so you can see the steps - all of which are technically side-effects, but some of those effects are pulled back into the sandboxes https://gist.github.com/sureshjoshi/98fb09f2a340f7c1dad270c4887865a0

early-twilight-26336

06/21/2024, 8:49 PM

I see. This archive/assets docs page is helping a lot actually. I've been thinking about this wrong, not surprisingly.

A
file
target is for loose files that are copied into the chroot where Pants runs your code. You can then load these files through direct mechanisms like Python's
open()
or Java's
FileInputStream

this sounds like what I want. I can basically put the BentoML model files into the repo dir in the expected structure, and have the code load from there. and as far as I understand it,

shell_command

essentially gives you

files

, is that right? So I could then use that to run the BentoML CLI to create the

files

instead of having to put them in manually.

👍 1

early-twilight-26336

06/21/2024, 8:50 PM

thanks for that adhoc link, I may need it before long 😅

wide-midnight-78598

06/21/2024, 8:54 PM

and as far as I understand it,
shell_command
essentially gives you
files
, is that right? So I could then use that to run the BentoML CLI to create the
files
instead of having to put them in manually.

I don't know about BentoML - but if there is some way to have loose files in your repo, using files/resources is how you would collect them into the system

wide-midnight-78598

06/21/2024, 8:55 PM

I'll be back this evening to see if I can get that side-effect concept working. I'm super curious now

early-twilight-26336

06/21/2024, 8:55 PM

hah ok thanks, really appreciate your help. I'm going to plug away at it some more, will keep you posted.

👍 1

wide-midnight-78598

06/22/2024, 12:55 AM

So, this ended up being a pretty cool little side track, using some targets I'd never needed to touch before

wide-midnight-78598

06/22/2024, 1:02 AM

https://github.com/sureshjoshi/pants-shell-command-example I wrote an example with a few different ways to access local files. Easiest being a local resource, then downloading a resource (via

http_source

) and then the closest I could think to make an easy example of what you want to do. adhoc_tool might be a cleaner way to do it, with more reliably caching, but I used a shell command to call a script to read from a manifest.txt file, and then download an image, and stick it into the pex

wide-midnight-78598

06/22/2024, 1:03 AM

In main, I just open each of the files and print something out from it

early-twilight-26336

06/22/2024, 1:31 AM

oh man this is super helpful, thanks a ton! I should be able to get something going.

👍 1

wide-midnight-78598

06/22/2024, 1:54 AM

If you have a CLI tool, then the shell command, or adhoc tool are probably safe bets - and then you have to carefully pass dependencies from

output_xyz

to the next layer. That was also the first time I'd needed to use

experimental_wrap_as_resource

since otherwise, a

file

is generated, and it's kinda loose in the pex_binary target

early-twilight-26336

06/22/2024, 1:55 AM

yeah I think I'll need to use a

file

in the end, the code that loads the model is doing regular FS access

early-twilight-26336

06/22/2024, 2:07 AM

ok this is interesting. If I comment out the _`experimental_wrap_as_resources`_ and have the pex depend directly on

:run-downloader

, I get this warning when running:

Copy code

❯ pants run src:bin             
22:00:53.21 [WARN] The target src:bin (`pex_binary`) transitively depends on the below `files` targets, but Pants will not include them in the built package. Filesystem APIs like `open()` may be not able to load files within the binary itself; instead, they read from the current working directory.

Instead, use `resources` targets. See <https://www.pantsbuild.org/resources>.

Files targets dependencies: ['src:run-downloader']

and the

run-downloader.sh

doesn't actually run! (I can tell because I added

exit 1

to the beginning). Is that expected? I get that the PEX won't contain the files, but shouldn't they still be available in the working dir?

wide-midnight-78598

06/22/2024, 2:11 AM

That's interesting - I guess since the pex won't include it, it just doesn't bother?

wide-midnight-78598

06/22/2024, 2:12 AM

Maybe have something else depend on run-downloader?

early-twilight-26336

06/22/2024, 2:20 AM

the warning goes away with this, but it still doesn't run. So confused...

Copy code

pex_binary(
    name="bin",
    dependencies=[
        ":lib",
        ":archive"
    ],
    entry_point="main.py"
)

archive(
  name="archive",
  format="zip",
  files=[":run-downloader"],
)

python_sources(
    name="lib",
    dependencies=[
        ":local-file",
        ":downloaded-image",
    ],
    sources=["**/*.py"],
)

# experimental_wrap_as_resources(
#     name="wrapped_downloader_output",
#     inputs=[":run-downloader"],
# )

shell_command(
    name="run-downloader",
    command="./downloader.sh",
    execution_dependencies=[":scripts", ":manifest"],
    output_files=["dilbert-rng.gif"], # This must match your expected file(s) (or use output_directory)
    tools=["curl", "head"],
)

wide-midnight-78598

06/22/2024, 2:22 AM

and you're still running in my repo?

early-twilight-26336

06/22/2024, 2:22 AM

yep

early-twilight-26336

06/22/2024, 2:22 AM

if I do

pants package src:archive

it does run

wide-midnight-78598

06/22/2024, 2:23 AM

I guess a pex can't contain an archive?

wide-midnight-78598

06/22/2024, 2:24 AM

Never thought about that, never tried it

wide-midnight-78598

06/22/2024, 2:28 AM

I know you can pass along a pex_binary to an archive, so maybe it's one-directional

wide-midnight-78598

06/22/2024, 2:30 AM

And the reason you want it to run this way, is so that if you run

pants run src:bin

you want to ensure the shell command is run

wide-midnight-78598

06/22/2024, 2:30 AM

Even if the output isn't put into the pex?

early-twilight-26336

06/22/2024, 2:30 AM

yeah exactly

early-twilight-26336

06/22/2024, 2:31 AM

I mean, maybe that doesn't make sense. But I was hoping when the PEX ran it would have access to the

file

, somewhere

early-twilight-26336

06/22/2024, 2:33 AM

if I turn off the

entry_point

to get a REPL, and then poke around with

os

I'm in the source tree. So, maybe I guess I want to put the files into the source tree?

wide-midnight-78598

06/22/2024, 2:35 AM

🤷 But as I mentioned, might be worth checking out adhoc tool and seeing if maybe that has some of the hooks you're interested in. It's a lot more powerful for making pipelines

early-twilight-26336

06/22/2024, 2:35 AM

fair enough, I'll keep looking. Thanks agaiin!

👍 1

early-twilight-26336

06/22/2024, 4:02 AM

seems like

adhoc_tool

has the same issues as

shell_command

. But,

run_shell_command

will just let me write files to the project tree which should probably be enough. The only sad part is I still can't make the

pex_binary

depend on the

run_shell_command

so I have to do

pants run src:the_run_shell_command; pants run src:bin

. Was hoping a

cli.alias

would help, but

pants run

only accepts one target. Would be nice if you could do

pants run target1 target2

and just have them run in sequence.

4 Views

Open in Slack

Previous Next