A question about packaging resources for consumpti...
# general
e
A question about packaging resources for consumption in a pex_binary: The docs detail how to use
resources
along with the native package loader (
importlib.resources.read_text
) to read files from a binary target, vs
files
which are not packaged with a
pex_binary
. In my case, I need to make a file available for a library, so I have no control over their loading mechanism -- in this case the library is
alembic
and they load the file through the
file
API. Since I have no control over the library, I figure
resource
won't work and I'll need to expose a file. It seems from the documentation like
archive
is the way to go in that case, but I was surprised to find that
archive
does not package loose files that are dependencies of the
packages
argument and instead expects a
files
argument as input. Is there a better way to package up all the loose files that a binary transitively depends on? Imagine a dependency tree like:
3rdparty/python:alembic
->
my_alembic_util.py
->
...
->
main.py
It feels strange to me to need to list
files
when defining an archive next to
main.py
when it's really
my_alembic_util.py
that knows about the file dependency. Thank you!
e
If I'm reading your situation correctly, you can just specify on the `my_alembic_util.py`python_source target that it has a dependency on the
files
target generator which includes all the loose files. Then since
main.py
depends on `my_alembic_util.py`(pants will pick this up by dependency inference), the pex binary with
main.py
as an entrypoint should end up including those files already. (because of transitive dependencies)
I commonly use an idiom like
Copy code
BASE_DATA_DIR = Path(__file__).parent / "subdir/containing/my/files"
so that its easy to get the full path to the files.
e
Oh interesting, I haven't actually inspected the pex, but I saw this warning when I built it:
Copy code
19:33:04.46 [WARN] The target backend:mega_fastapi (`pex_binary`) transitively depends on the below `files` targets, but Pants will not include them in the built package.
e
oh that's interesting. I haven't actually seen an error like that before
e
Any idea who might know more?
b
Ooh, I know more. Alembic + pex is a bit of fun.
resources
is definitely required here, so it's a matter of making that work. What we do is: 1. recursively all the relevant resources to a temporary directory 2. switch to that directory to run alembic
Copy code
import contextlib
import importlib.abc
import importlib.resources
import logging
import os
from pathlib import Path
import tempfile
from typing import Callable
from typing import Iterable
from typing import Iterator

import alembic.config
import alembic.script

logger = logging.getLogger(__name__)


def _copy_resources_to_path(
    dirname: Path, files: Iterable[importlib.abc.Traversable]
) -> None:
    for f in files:
        name = dirname / f.name
        if f.is_file():
            with name.open("wb") as output:
                output.write(f.read_bytes())
        else:
            assert f.is_dir()
            name.mkdir()
            _copy_resources_to_path(name, f.iterdir())


@contextlib.contextmanager
def switch_to_directory_with_migrations(package: str) -> Iterator[None]:
    original = os.getcwd()
    with tempfile.TemporaryDirectory() as dirname:
        files = (
            f
            for f in importlib.resources.files(package).iterdir()
            if f.name in ("alembic.ini", "migration")
        )
        _copy_resources_to_path(Path(dirname), files)

        os.chdir(dirname)
        try:
            yield
        finally:
            os.chdir(original)


@contextlib.contextmanager
def alembic_runner(package: str) -> Iterator[Callable[[list[str]], None]]:
    with switch_to_directory_with_migrations(package):
        yield lambda args: alembic.config.main(argv=args)
This was written quite a while ago, and there may be better versions now, e.g: • I don't remember why we used the alembic python API vs.
subprocess.run(...)
-ing the CLI • I think alembic might now has some CLI options for telling it where to find the config, that didn't exist when we wrote that (i.e. maybe the
os.chdir
is now unnecessary) • Pexes are almost always run with files as normal files on disk (e.g. zips are unpacked to venvs), so the
importlib.resources
isn't required if you aren't doing anything fancy with zipapps, and thus finding the files in the venv/relative to the source code can also work (we don't do this because we support running
alembic revision ...
via that code above, to add new migrations, and mutating the venv isn't good)
e
Thanks Huon! What I'm hearing you say is that if our code does an
abspath(__file__)
and finds the path that way, opening the file as usual should work. But if it's trying to load from a relative path, it might hit snags in which case we can do this copying trick. Do you think it's worth including a build rule that does this resources -> file switcharoo?
Or maybe I misunderstood and even if we're doing
abspath
we still need to use
resources
and not
files
😅
b
yes, there's two layers: 1. getting the files into the built artefact: this requires
resources
(with
files
, the required files won't be in the PEX at all. you can explore this by building the artifacts with
pants package ...
and introspecting the result in
dist/
) 2. when the artifact runs (either
pants run ...
or running the result of a package), accessing those files: potentially
importlib.resources
but
open
-ing files relative to
__file__
will also work in many cases.
e
Is it fair to say that assuming we use
resources
and not
files
, unit tests should execute in a similar environment and so should work for us to stress test the setup?
b
reasonably similar, yeah. Another option is to validate packaged artifacts specifically (e.g. run them) using https://www.pantsbuild.org/stable/reference/targets/python_tests#runtime_package_dependencies :
python_tests(..., runtime_package_dependencies=["path/to:the_target"])
and then the artifact will be available for the test to execute. Might not be appropriate for your situation
1