Hi all, is there some example of how to package a ...
# general
p
Hi all, is there some example of how to package a python library properly? Here is roughly how my code structure looks like:
Copy code
common_library
    BUILD
    dir1
        BUILD 
        file1.py
        file2.py
    dir2
        file1.py
    dir3
        ...
project1
    (python sources, some depending on common_library)
project2
    (python sources, some depending on common_library)
3rdparty
    BUILD
    requirements.txt
pants.toml
I want to be able to additionally package just the common_library and distribute it as a wheel/sdist to be used in jupyter notebooks. I ended up with BUILD of common_library containing sth like this:
Copy code
python_distribution(
    name="common_library",
    dependencies=[
        # here, in setup.py, we had find_packages() to get all src files from dir1..dirN...
    ],
    provides=python_artifact(
        name="common",
        #...
    ),
)
The thing is I don't know how to specify that common library depends on all subpackages within common_libary package - I don't want to list all files manually of course, as the real structure is much more complex. I also like the fact that BUILD files are provided for each subdir of common lib, and I don't want to have a
python_sources(source="**/*.py")
at the top level BUILD of the lib instead - is there any other way to approach this?
If that was an executable binary, I would just depend on main.py module and pants would figure out the required content for me I guess, but I'm not sure how to handle library case...
s
I'm far from a pants expert, but I ran into the same situation you're describing when I was getting going. As far as I can tell the two options you described are the only two ways to go about it. I ended up just referencing each target I wanted to include explicitly. Do note that you don't have to list every single file, just the folders that contain them (at least if you're using the
python_sources
target). So my distribution ended up looking like
Copy code
python_distribution(
    name="etxdagster",
    dependencies=[
        "src/py/lib/etxdagster/etxdagster/common:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/aws:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/common:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/databricks:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/dnax:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/ecs:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/graphql:src",
        "src/py/lib/etxdagster/etxdagster/infrastructure/spark:src",
        "src/py/lib/etxcommon/etxcommon:src",
        "src/py/lib/etxdagster/etxdagster/scripts:scripts"],
    provides=python_artifact(
        name="etxdagster",
        version="0.1.0"
    )
)
where each of the lines in
dependencies
is a bottom-level (leaf) folder
h
Pants's dep inference will work on
python_distribution
as well. You do need one explicit dep from the
python_distribution
to some entry point, but everything that entry point transitively depends on will be pulled in.
So it is enough to explicitly depend on a small set of deps whose own deps span everything you want to pull in
But if there is no such small set of deps, then...
You could write your own
setup.py
of course, but then you're giving up Pants's ability to generate them for you (including their 3rdparty requirements)
p
Thank you for clarifying, I thought I might have missed something in the doc... I will probably follow @swift-river-73520 suggestion for now and see how tedious it is to maintain 🙂 @happy-kitchen-89482 as you wrote, I want to use the Pants' ability to figure out requirements for me, that's great thing. The only problem is that my lib doesn't really have an entry point, it is just a bunch of various utils people later use in the notebooks for ML experiments... But I understand the limitations now, I will try to work around that, thanks!