In PEX I can ship arbitrary directories using `-D`...
# general
a
In PEX I can ship arbitrary directories using
-D
- these end up in the target
site-packages
. Is there a way to install data files outside
site-packages
as well, without using setup.py or Python build mechanisms? If not, I can repurpose
-D
to ship data files into, say
site-packages/some-data
. However is there an easy way to discover the path of the venv when running the pex-file? We build with pex tools but do not unpack the pex-file manually.
e
There is no way to add data files besides a Python build mechanism of your choice at the moment. The venv path is exported in
VIRTUAL_ENV
per the tradition. The original virtualenv tool:
Copy code
$ pex virtualenv -c virtualenv -- test.virtualenv
created virtual environment CPython3.10.6.final.0-64 in 225ms
  creator CPython3Posix(dest=/home/jsirois/dev/BruceEckel/tmp-pants-bug/test.virtualenv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/jsirois/.local/share/virtualenv)
    added seed packages: pip==22.3.1, setuptools==65.6.3, wheel==0.38.4
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
$ source test.virtualenv/bin/activate
(test.virtualenv) $ echo $VIRTUAL_ENV
/home/jsirois/dev/BruceEckel/tmp-pants-bug/test.virtualenv
(test.virtualenv) $ deactivate
$
Python builtin venv module:
Copy code
$ python3.7 -mvenv test.venv
$ source test.venv/bin/activate
(test.venv) $ echo $VIRTUAL_ENV
/home/jsirois/dev/BruceEckel/tmp-pants-bug/test.venv
(test.venv) $ deactivate
$
Pex:
Copy code
$ pex --venv -- -c 'import os; print(os.environ["VIRTUAL_ENV"])'
/home/jsirois/.pex/venvs/d6ac065eb87770b57b44ec865fdbae963f56be3c/5985ed09b49a653d6596b0e14d134c5456cf1a9f
Alternatively, one way to handle this would be to use pkgutil or importlib to load the data files as resources. Another would be to add a utility module side-by-side with the data files with a function that returned the paths to the files using
__file__
:
Copy code
mypackage/data_package/
  __init__.py
  data_file1
  ...
  data_fileN
Where
mypackage/data_package/__init__.py
has ~contents:
Copy code
def data_files() -> Iterable[str]:
  me = PurePath(__file__)
  return [f for f in me.parent.iterdir() if f != me]
@abundant-autumn-67998 all that said, I have never really understood data files. From my work on Pex I've seen lots of distributions through bug reports and fwict where data files end up going seems semi-random from dist to dist. From a logical perspective, I get that data files are distinct from sys.path contents, but - unless the data files include .py files you never want imported by mistake, I have never understood why, in a post-egg world, placing them anywhere other than in site-packages inside your installed dist tree, was a thing. If you know this world and there is some standard for data files, I'd be happy to have Pex grow a
--data-files
option similar to the existing
-D
for convenience.
a
Ah we could use
VIRTUAL_ENV
. I said data files but really this is for code. We support a lightweight way to deploy code outside any python package by just pointing a directory with a .py file. I don't think any special data file support is needed since we could put this in a special directory in site-packages. Data files within a package are easily supported by
package_data
.