early-businessperson-60137
01/14/2025, 7:52 PMresources
along with the native package loader (importlib.resources.read_text
) to read files from a binary target, vs files
which are not packaged with a pex_binary
. In my case, I need to make a file available for a library, so I have no control over their loading mechanism -- in this case the library is alembic
and they load the file through the file
API.
Since I have no control over the library, I figure resource
won't work and I'll need to expose a file. It seems from the documentation like archive
is the way to go in that case, but I was surprised to find that archive
does not package loose files that are dependencies of the packages
argument and instead expects a files
argument as input. Is there a better way to package up all the loose files that a binary transitively depends on? Imagine a dependency tree like:
3rdparty/python:alembic
-> my_alembic_util.py
-> ...
-> main.py
It feels strange to me to need to list files
when defining an archive next to main.py
when it's really my_alembic_util.py
that knows about the file dependency.
Thank you!elegant-florist-94385
01/14/2025, 8:12 PMfiles
target generator which includes all the loose files.
Then since main.py
depends on `my_alembic_util.py`(pants will pick this up by dependency inference), the pex binary with main.py
as an entrypoint should end up including those files already. (because of transitive dependencies)elegant-florist-94385
01/14/2025, 8:14 PMBASE_DATA_DIR = Path(__file__).parent / "subdir/containing/my/files"
so that its easy to get the full path to the files.early-businessperson-60137
01/14/2025, 8:29 PM19:33:04.46 [WARN] The target backend:mega_fastapi (`pex_binary`) transitively depends on the below `files` targets, but Pants will not include them in the built package.
elegant-florist-94385
01/14/2025, 9:21 PMearly-businessperson-60137
01/14/2025, 11:27 PMbroad-processor-92400
01/14/2025, 11:54 PMresources
is definitely required here, so it's a matter of making that work. What we do is:
1. recursively all the relevant resources to a temporary directory
2. switch to that directory to run alembic
import contextlib
import importlib.abc
import importlib.resources
import logging
import os
from pathlib import Path
import tempfile
from typing import Callable
from typing import Iterable
from typing import Iterator
import alembic.config
import alembic.script
logger = logging.getLogger(__name__)
def _copy_resources_to_path(
dirname: Path, files: Iterable[importlib.abc.Traversable]
) -> None:
for f in files:
name = dirname / f.name
if f.is_file():
with name.open("wb") as output:
output.write(f.read_bytes())
else:
assert f.is_dir()
name.mkdir()
_copy_resources_to_path(name, f.iterdir())
@contextlib.contextmanager
def switch_to_directory_with_migrations(package: str) -> Iterator[None]:
original = os.getcwd()
with tempfile.TemporaryDirectory() as dirname:
files = (
f
for f in importlib.resources.files(package).iterdir()
if f.name in ("alembic.ini", "migration")
)
_copy_resources_to_path(Path(dirname), files)
os.chdir(dirname)
try:
yield
finally:
os.chdir(original)
@contextlib.contextmanager
def alembic_runner(package: str) -> Iterator[Callable[[list[str]], None]]:
with switch_to_directory_with_migrations(package):
yield lambda args: alembic.config.main(argv=args)
This was written quite a while ago, and there may be better versions now, e.g:
• I don't remember why we used the alembic python API vs. subprocess.run(...)
-ing the CLI
• I think alembic might now has some CLI options for telling it where to find the config, that didn't exist when we wrote that (i.e. maybe the os.chdir
is now unnecessary)
• Pexes are almost always run with files as normal files on disk (e.g. zips are unpacked to venvs), so the importlib.resources
isn't required if you aren't doing anything fancy with zipapps, and thus finding the files in the venv/relative to the source code can also work (we don't do this because we support running alembic revision ...
via that code above, to add new migrations, and mutating the venv isn't good)early-businessperson-60137
01/15/2025, 12:49 AMabspath(__file__)
and finds the path that way, opening the file as usual should work. But if it's trying to load from a relative path, it might hit snags in which case we can do this copying trick.
Do you think it's worth including a build rule that does this resources -> file switcharoo?early-businessperson-60137
01/15/2025, 12:51 AMabspath
we still need to use resources
and not files
😅broad-processor-92400
01/15/2025, 12:55 AMresources
(with files
, the required files won't be in the PEX at all. you can explore this by building the artifacts with pants package ...
and introspecting the result in dist/
)
2. when the artifact runs (either pants run ...
or running the result of a package), accessing those files: potentially importlib.resources
but open
-ing files relative to __file__
will also work in many cases.early-businessperson-60137
01/15/2025, 1:07 AMresources
and not files
, unit tests should execute in a similar environment and so should work for us to stress test the setup?broad-processor-92400
01/15/2025, 1:10 AMpython_tests(..., runtime_package_dependencies=["path/to:the_target"])
and then the artifact will be available for the test to execute. Might not be appropriate for your situation