Hello folks. I'm working on trying to add Pants to...
# general
l
Hello folks. I'm working on trying to add Pants to an existing repository (still fairly small) so that I can add support for building out multiple packages from the same repo. I think I've got some bits of it working, but I'm still a bit in the dark on what I need to do to get everything working properly. The project and branch that I'm working on is at https://github.com/mitodl/ol-data-pipelines/tree/putting_on_pants
h
Hello! You’ll need to teach Pants about your third-party reqs I see you have a
poetry.lock
- we’re working on adding first-class support for consuming that file In the meantime, the simplest approach is to run
poetry export
to generate a requirements.txt file in the root of your project (the “build root”). Then, add a BUILD file at the build root that says
python_requirements()
, which is a macro that converts each entry in
requirements.txt
into a Pants target https://www.pantsbuild.org/docs/python-third-party-dependencies
l
Yeah, I generated the requirements.txt locally but haven't committed it yet. I'll try adding the BUILD at the repo root. Another question I have is how I can list all of the potential targets in the repo? (or if that's even possible)
👍 1
h
./pants list ::
check out https://www.pantsbuild.org/docs/project-introspection for some similar commands like how to filter to only get
python_tests
targets, for example
l
Awesome, thanks
Cool, now it's showing some third party requirements, but I'm still not seeing the dependencies on the
lib
or
resource
targets. For example: https://github.com/mitodl/ol-data-pipelines/blob/putting_on_pants/src/ol_data_pipelines/edx/solids.py#L25-L33
Based on that I would assume that the output should show modules from both of those targets, but this is what I'm seeing.
Copy code
./pants dependencies --dependencies-type=source-and-3rdparty src/ol_data_pipelines/edx/                                                                                                                                                                                                                                                                                                       
//:dagster
//:httpx
src/ol_data_pipelines/edx/__init__.py
src/ol_data_pipelines/edx/api_client.py
src/ol_data_pipelines/edx/repositories.py
src/ol_data_pipelines/edx/schedule.py
src/ol_data_pipelines/edx/solids.py
dagster==0.9.14
httpx==0.16.1
h
Ah I see the likely issue. You have
src/BUILD
and
src/ol_data_pipelines/{edx,lib,resources}/BUILD
. All of those have
python_library()
targets which have the same files in the
sources()
field. This means that dep inference won’t work for imports of those files, as Pants can’t disambiguate which of the two targets you want to use for metadata It’s valid to have only
src/BUILD
with the
**/*.py
sources. Alternatively, we generally recommend having 1 BUILD file in each directory with nothing more than this line:
Copy code
python_library()
That uses the default
sources
field of
["*.py", "!*_test.py", "!test_*.py"]
https://www.pantsbuild.org/docs/targets#target-granularity
All of those have python_library() targets which have the same files in the sources() field.
You can see which targets “own” a file by running
./pants list path/to/file.py
. Generally, you want it to be only 1 value
l
Great, thanks 🙂
❤️ 1
h
Nvm, looks like it’s already good Oh also your source root should be
src
, as you aren’t importing
src.ol_data_pipelines
, but instead
ol_data_pipelines
(The default for that option includes
src
, so you could leave it undefined too)
l
So, for the case where I want the
edx
package to be a target that I package as an independently deployable unit, does that mean that it should have a
BUILD
definition in that directory, rather than just having the one definition at the root of
src
?
Ok, I've just updated the repo with your recommendations. For some reason though it's still not showing the
lib
and
resource
modules in the dependency list.
(I've pushed all of the changes to the branch in GitHub as well)
w
note that
./pants dependencies
is not transitive by default: if you’d like to see everything depended on transitively, need to pass
--transitive
👍 1
also, note that
resources
need to be depended on explicitly, because pants cannot infer their usage
so if the files owned by https://github.com/mitodl/ol-data-pipelines/blob/putting_on_pants/src/ol_data_pipelines/edx/BUILD need resources, you might need a dependencies list there
h
Stu, the resources folder is all Python files. I thought the same thing
w
ah. ok, sorry, i’ll back off for a bit
good luck 😃
h
K I found one issue I was fixing this morning actually. It looks like you’re using the walrus operator, which (atm) requires running the Pants tool with Python 3.8 for the AST parsing to work with dep inference, but the
./pants
bash script defaults to running with Py36, then 37, then 38 To fix, run
curl -L -o ./pants <https://pantsbuild.github.io/setup/pants>
to pull in recent changes we made to the script, then change lines 140 and 141 to only use 3.8 We’re fixing this so Pants always does the right thing, regardless of what interpreter the tool is run with. (Black and MyPy have this same issue - the
typed-ast
library did not add support for Py38, so the only way to parse an Py38 AST is to run with 3.8)
Things still aren’t working correctly, still digging. But that fixes one of the problems
l
Awesome, thanks. Doing that now.
After making that fix I'm seeing the expected resolution (and after removing the explicit dependencies in the BUILD file)
h
Ah cool! I was running on a bad test file 😄 (I was testing
./pants dependencies src/ol_data_pipelines/edx/api_client.py
and hadn’t double checked it actually made first-party imports 🤦) Sorry about the trouble with Py38 - definitely a rough edge that will be fixed after this PR later today
l
Not a problem, I appreciate the prompt help 🙂
❤️ 1
h
So, for the case where I want the edx package to be a target that I package as an independently deployable unit, does that mean that it should have a BUILD definition in that directory, rather than just having the one definition at the root of src?
Pardon the delay with this one. So, the
python_library
is sort of a misnomer. It doesn’t imply any thing about how you deploy your code, it’s nothing more than a way to set metadata on some Python source code. A more apt name would be
python_srcs
, for example. The reason to use more granular vs. coarser
python_library
targets is how fine-grained you want your metadata to be. For example, you might want to only set one file to have overridden Python interpreter constraints, vs. an entire subtree. We generally recommend 1 BUILD file per directory because it tends to be the simplest heuristic and scales well whenever you do need to add metadata For your deployables, you explicitly say which dependencies you want in it, then those and all their transitive deps will be pulled in. You can either depend on a whole
python_library
target, which is shorthand for depending on all its files in the
sources
field, or you can pick out certain files from it using “file addresses” to be more fine-grained. https://www.pantsbuild.org/docs/targets#dependencies-and-dependency-inference
l
Awesome, thanks.
My next question is in how to add package metadata that gets propagated to setup.py, etc. such as things like trove classifiers, authors, etc.
h
Check out https://www.pantsbuild.org/docs/python-distributions That also links to a plugin hook if you find you want something more complex, such as reading a file for some of the metadata, or dynamically computing the
version
l
Great, thanks