What's the simplest way to create a zip file conta...
# general
a
What's the simplest way to create a zip file containing dependencies in a directory? I want to build a lambda layer, it's just a zip with some packages installed. For example, if I want to add
requests
and
structlog
to a layer, I need build a zip with the following structure:
Copy code
python/
   requests/
   structlog/
Looks like the archive target only supports packages and files. Do I need to create a custom target for this? Is that something you'd like contributed back?
e
There is no simple way using Pants today. The way is to: 1.
pants package
a
pex_binary(..., include_sources=False, include_tools=True)
2. Run
PEX_TOOLS=1 my.pex venv create/it/here
3. Run
mv create/it/here/lib/python3.10/site-packages python
4. Now zip up that
python
dir.
Pants probably needs to add lambda layer support in the end or you need to write custom rules (a plugin) or use an adhoc plugin with the newish facilities there ... of which I know little.
a
Okay. I might go kick the tyres on the custom rules. I think I'm going to need this later today
Ideally I'd like to be able to say
Copy code
python_lambda_layer(
  dependencies=[
    "3rdparty/python#requests",
    "3rdparty/python#structlog"
  ])
and just package it
e
Yeah, needs work.
šŸ‘ 1
a
I like work
ā¤ļø 1
b
Our favorite kind of community member! Go for it, and feel free to ask for help if you get stuck.
ā¤ļø 1
a
Okay, so assuming I'm dumb enough to jump into building a plugin to build a lambda layer, with a week's "experience" of pants, how do I go about this? I've gone through the tutorial written by a friendly community member, and I've got a plugin that supports the package rule. I have a fieldset that takes a bunch of dependencies. All I need to do is: ā€¢ Create a temporary directory ā€¢ Copy each dependency to that temporary directory ā€¢ Create a zip of the directory I can see that there's a
dirutil
module that will create me a temp dir under the build root, but the docs are full of terrifying warnings about IO in rules. Is there something I need to be aware of here in order not to enrage the Cache Gods?
e
You probably should use the Pants reference as your core resource in all this with higher level guides like SJ's as gentler intros. In short, Python is only there for loops, conditionals and manipulation of data structures, no IO. The two major forms of IO are addressed here: Filesystem: https://www.pantsbuild.org/docs/rules-api-file-system Run subprocesses: https://www.pantsbuild.org/docs/rules-api-process One step back you might want to take is to scrutinize heavily "I have a bunch of dependencies". Pants currently farms all Python dependency work out to Pex via subprocesses (which uses Pip). So it passes requirement strings to Pex subprocesses and it does the rest. The above links should help with executing subprocesses and then manipulating their filesystem outputs as a start. That's just code though that wraps effectively a set of steps you'd otherwise perform in a shell if you were doing this by hand. You want to have those steps fully sussed out and working 1st. Then translate those shell steps to code using Pants Process invocations and FS manipulation primitives ("intrinsics"). I'm sure you'll have more questions, but that should give you some more reading to do and get you further down the line.
a
Thanks, @enough-analyst-54434 - that's helpful. I'll start by writing a shell script that does the same work and go from there.
e
Great, that's the way to go. I will warn you you will get something working that way but it won't be optimal. Pants clearly already resolves dependencies via Pex and so ideally you'd reuse that cached work. That is a much more complex target, although, IIRC I sketched those ~CLI steps in another thread. Suffice to say, if you use, say, Pip directly, you'll be re-running resolves that have likely already been done, not utilizing the lock file and there is the problem of where you get Pip from in the 1st place.
a
So I could call out to pex to build me a loose packed binary, take the digest from that, and then ... if I run a process with that digest, I'll have the deps in my tree, right?
e
No. If you look at a PEX file it does not "have deps" in the way you think. It stores them in an internal format.
a
Sorry, I know my questions are extremely dumb, there's a whole bunch of conceptual knowledge that's a little hard to gain from reading the docs.
e
They aren't dumb, but you have walked into a huge topic.
šŸ‘ 1
a
That's interesting, because when I've unzipped a pex, they're present in the .deps directory - what am I missing?
e
The Python resolution system has been through a ton of optimization to deal with Pants local sandboxing and remoting both being too slow if you do things naively.
šŸ‘ 1
Ok, do you know what you're looking at in the . deps directory? They aren't wheels or installed wheels for one.
a
So you helpfully enumerated a 4 step process previously, which I assumed was "here's how you'd do this in a shell script or Makefile". Is that how you would idiomatically do this in a custom target?
Ok, do you know what you're looking at in the . deps directory
1 week of "experience" - I am firmly in the realm of unconscious incompetence
e
Yes, but using Pex subprocesses. Let me find my maybe hallucinated suggestion from earlier.
a
It's at the top of this thread
e
Ah right, yeah - do that!
a
Roger. I'll have a go.
Thanks again for putting up with me.
e
Again, you're working on something that would give most Pants maintainers trouble. Absolutely no worries. I'd just recommend working on this in a means conducive to async communications. Weekends are tough as are timezones. I happen to be up early for US Pacific, for example, but that can't be counted on. Ideally a public repo or draft PR would gather your work for iteration. Words are pretty poor in general and slack threads are hard to gather when working through something complex like this over the course of days.
a
I'll gladly open a draft PR - I'm not expecting people to respond in any given timeframe - just thought I'd take a look at this over the weekend so I'm not stuck during work time. The support is much appreciated, and not expected.
Okay, I have a working prototype for a lambda layer target: https://github.com/bobthemighty/pants-lambda-layer. There's a lot left to do, including not hardcoding
python3.9
into the middle of a path, and handling arm runtimes, but it works, and I can run the resulting layer. Outstanding questions: IS there anything here that is grotesquely stupid? I've cobbled this together by reading rules in the existing pants codebase, mostly the export and package utils. I'm pretty sure I don't need to take a
PexPex
in my target, because I should just be able to run the requirements.pex, right? A quick test failed there because I didn't have
PATH
set up in the env when running the process. Is there a better way than this of getting a digest of installed requirements that I've requested in my target? The bit where we strip out the pex tools is a bit fugly and liable to break. Idiomatically, would you break this down into multiple rules? It seems like I probably want a result type for LambdaVenvPex, and a result type for ZippedLambdaLayer, rather than a single massive script.
e
There is nothing major off - that's definitely the gist. As to style breaking up rules and rule helpers, opinions on style are a dime a dozen. I appreciate you asking, but I think adding tests will dictate how you must break things up to be testable and anything beyond that is not super relevant. The issues are all just mostly the details you've called out: + You probably don't want to request a
Pex
, but a
VenvPex
instead since a Pex will yield a zipped up PEX which you immediately unzip - wasted work that can be significant for large ML deps. A
VenvPex
will re-use the packed deps Pants has almost certainly already peddled in. + You should probably get the site-packages dir via something like a Process that runs ~
venv/bin/python -c 'import site; print(site.getsitepackages())'
+ You add
--pip
only to turn around and strip out the tools adding it adds - you can probably just not add Pip.
šŸ™Œ 2