Is there a simple way to add an auxiliary file to ...
# general
q
Is there a simple way to add an auxiliary file to a pex distribution after it's been bundled? For context, we have a pickle file that is produced by a step outside our main build (ie. training a model in our cluster and saving it as a pickle), and we'd like to add it to a pex distribution that contains the relevant modules to load the pickle. The best I've been able to achieve so far is unzipping the pex into a tmp dir, explicitly coping the file(s), and running the
pex
CLI again to produce the same pex w/ the extra file. This seems brittle, so was curious if there's any functionality in
PexBuilder
that could make this cleaner
The main issue here is that the build system (either Bazel or Pants, but in our case Bazel) isn't aware of the pickle at build time, so in essence we need to inject a file non-hermetically into the pex
e
A PEX is a zip (by default, there are 2 other layouts you can use), so add files using your favorite API or zip tool.
The
PEX
env var will be available to the running PEX code to locate the original zip file (it gets unpacked and spread at runtime). I'm not sure if the existing unpacking will unpack your added files or not; so you may need this trick to unpack the file / get its stream yourself.
👍 1
q
Will give that a try, thanks @enough-analyst-54434!
Also haven't forgotten about the enhancement to allow pants-jupyter-plugin to use an existing pex executable, am hoping to get to that next week 🙂
e
Are you talking about the new magic
__pex__
import handling in modern Pex? I have lost track of various threads and can't remember if I discussed this with you or someone else.
q
We discussed that as well (& we tried that out and it seems to work great!), I was referring to the previous thread where we tried to use pants-jupyter-plugin in a locked-down environment behind a firewall and it failed to fetch the
pex
executable from github
e
Ah, right - yeah.
q
The zipfile trick should get us in a much better spot where we no longer need to deal with pickle breaks (and will save us needing to convert everything to a platform-independent serialization format like ONNX, which is a bear)
e
Yeah, works fine:
Copy code
$ cat foo.py 
with open("extra") as fp:
    print(fp.read())

$ pex --exe foo.py -o foo.pex
$ echo "Remora-like" > extra
$ zip foo.pex extra 
  adding: extra (stored 0%)
$ ./foo.pex 
Remora-like
Ah, wait a sec ... not quite.
Ok, yeah - you need the
PEX
env var trick:
Copy code
$ cat foo.py 
import contextlib
import os
import zipfile


with contextlib.closing(zipfile.ZipFile(os.environ["PEX"])) as zf:
    print(zf.read("extra"))

$ pex --exe foo.py -o foo.pex
$ echo "Remora-like" > extra
$ zip foo.pex extra 
  adding: extra (stored 0%)
$ rm extra 
$ ./foo.pex 
b'Remora-like\n'
👍 1
I forgot to remove the local
extra
file the 1st time after adding it to the zip and the PEX code was just reading that loose file.
q
This worked great; going to write a small wrapper around
zipfile
that also fixes the permissions so we can accomplish this w/ a tool in our repo, thank you again for the pointers
❤️ 1