This might be a silly question but the usually app...
# general
b
This might be a silly question but the usually approaches wasn't working. I have a folder
x
in project root directory and I want to access this file from sub directories like
with open('file')
and as such.
h
What happens if you make the path be relative to the root of where you have the
./pants
script?
b
it works in those cases.
h
Okay good. I would always use the path relative to your buildroot (ie where you have ./pants located). This is how Pants sets up your Python Path and how it understands things like imports
b
if my structure is
lib/x
and I use
.load('lib/x')
when running the goals from root dir, it works.
but the problem is when I build them as pex and run it, it doesn't work.
g
Would defining a resource help here? https://www.pantsbuild.org/resources.html
What’s the error you get when running the pex?
b
it says can't find the path.
g
Chatted a bit with @brave-policeman-49804 outside of the thread here, use case is shared nlp models shared by different python projects. Issue could be pex-related (seems similar to https://github.com/pantsbuild/pex/issues/340) or with how spaCy loads things, there’s some more investigation happening around there to debug this.
h
The more robust thing to do is wrap those files in a
resources()
target, and then load them using
pkg_resources
(or
pkgutil
on python3) instead of via direct filesystem operations. Then they will load no matter how they end up on the PYTHONPATH (loose files, or wrapped in an archive).
πŸ‘ 1
But this only works if you control the loader and can replace those
open(...)
statements with
pkgtutil
calls.
I think you can mark a pex as not zip-safe, in which case it'll expand itself when it first executes, and then you should be able to access the files from the filesystem, either via relative path, or you can do some trickery using your root module's
__path__
to find where your source root got expanded to, and then use the full path to that.
βœ”οΈ 1
b
Totally get your point. For me, I don't have a control over the load( ) statement. But, like you mentioned is it possible to get the root module path?
e
Yes. Probably the most straightforward way with a non-zip-safe pex / non-pex context is just use the
__file__
attribute of the module calling
load(...)
to calculate a relative path from. IE: if that module is at x/y/loader.py and the file to load is
x/data/model
you could say something like:
Copy code
$ tree /tmp/example/
/tmp/example/
└── x
    β”œβ”€β”€ data
    β”‚   └── model
    └── y
        └── loader.py

3 directories, 2 files
$ cat /tmp/example/x/data/model 
toy
$ cat /tmp/example/x/y/loader.py 
import os


_DATA_DIR = os.path.join(os.path.dirname(__file__), '..', 'data')


def load():
  with open(os.path.join(_DATA_DIR, 'model')) as fp:
    return fp.read()


if __name__ == '__main__':
  print('Loaded model: {}'.format(load()))

$ PYTHONPATH=/tmp/example python3.7 -m x.y.loader
Loaded model: toy
b
thanks for the response @enough-analyst-54434. This was a bit tricky for me to experiment. It works on the paper. To my understanding, The major issue again comes when the project is built and converted to
.pex
, during the loading of the files, it searches for the file outside the
.pex
file. One obvious workaround is keeping the executable files separate from the data files and refer to
non-module
files with abs data file paths. Have I got this right?
e
That is right - that's one option. Another is to mark the pex as not zip safe by adding
zip_safe=False
to the associated
python_binary
target. That will cause the pex to unzip itself and run from the unzipped pex ensuring filesystem based access works.