Hi, I have a question regarding pants and poetry i...
# general
r
Hi, I have a question regarding pants and poetry integration. We are building a monorepo where multiple python projects generated using poetry are included. From the documentation I understand that pants uses the
pyproject.toml
generated by poetry using BUILD. I am not able to generate a python distribution for
internal-project1
because I think there is no
main.py
file and hence it doesn't show in the target list
Copy code
repo
|
+-- src/
|   |
|   +-- internal-project1/
|   |   |
|   |   +-- pyproject.toml
|   |   +-- BUILD
|   |   +-- internal_project1/
|   |   |
|   +-- internal-project2/
|   |   |
|   |   +-- pyproject.toml
|   |   +-- BUILD
|   |   +-- internal_project2/
|   |   |
+-- pants
|
+-- pants.toml
|
+-- BUILD
|
+-- requirements.txt
This is how
pants.toml
looks like
Copy code
[GLOBAL]
pants_version = "2.8.0"
backend_packages = [
    "pants.backend.python",
    "pants.backend.python.lint.black",
    "pants.backend.python.lint.docformatter",
    "pants.backend.python.lint.flake8",
    "pants.backend.python.lint.pylint",
    "pants.backend.python.typecheck.mypy",
    "pants.backend.python.lint.isort",
]

[source]
marker_filenames = ["pyproject.toml"]

[anonymous-telemetry]
enabled = false
The BUILD file inside
internal-project1
has just
Copy code
poetry_requirements()
e
You're likely missing some required target definitions. Have you tried
./pants tailor
?: https://www.pantsbuild.org/docs/create-initial-build-files
r
Yes I ran it. It generated these BUILD files which I have shown in the repo structure. And I modified the BUILD inside
src/internal-project1
to use poetry generated
pyproject.toml
by adding
poetry_requirements()
Other than that I'm not sure what definitions I need to add. I do see it's able to generate proper targets for those python packages where it can find a main definition. The
project-internal1
has no main definition anywhere. It's just bunch of python modules
e
I'm not well versed in tailors workings, but, in general, you'll need an empty
python_sources()
target in each directory with non-main python sources.
h
Tailor should generate those
Can you post the contents of your BUILD files?
And also what command you're running to generate the distribution, and what error message you're seeing?
r
I modified the BUILD file which is inside
src/internal-project1
. I am following this https://www.pantsbuild.org/docs/python-distributions#pep-517 This is the content of BUILD. In this case
ds-research
is the
internal-project1
Copy code
resource(name="lib", source="pyproject.toml")
python_distribution(
    name="ds-research",
    dependencies=[":lib"],
    wheel=True,
    sdist=True,
    provides=setup_py(
        name="ds-research",
        version="0.1.0",
        description="DS research.",
    ),
    wheel_config_settings={"--global-option": ["--python-tag", "py37.py38.py39"]},
)
The repo looks like this
Copy code
repo
|
+-- src/
|   |
|   +-- ds-research/
|   |   |
|   |   +-- pyproject.toml
|   |   +-- BUILD
|   |   +-- ds_research/
+-- pants
|
+-- pants.toml
|
+-- BUILD
|
+-- requirements.txt
Just to point out
ds_research
contains all the necessary python modules. I am running this command from root
Copy code
./pants package src/ds-research:ds-research
The error stack is shown below. I guess it's not able to identify that
ds_research
has the necessary modules. Not sure how to fix that.
Copy code
22:39:37.59 [ERROR] 1 Exception encountered:

  ProcessExecutionFailure: Process 'Run poetry.core.masonry.api for src/ds-research:ds-research' failed with exit code 1.
stdout:

stderr:
Traceback (most recent call last):
  File "/private/var/folders/9w/9_4n35r57d707zkrk86rvh9w0000gn/T/process-executionwrO07k/chroot/../.cache/pex_root/venvs/693b0be1ae76a2a6f062957b1ce86976628da9ae/5ce1b5a57a6013508c623affedc7f3171959254a/pex", line 151, in <module>
    exec(ast, globals_map, locals_map)
  File "backend_shim.py", line 20, in <module>
    wheel_path = backend.build_wheel(dist_dir, wheel_config_settings) if build_wheel else None
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/api.py", line 68, in build_wheel
    return unicode(WheelBuilder.make_in(poetry, Path(wheel_directory)))
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/builders/wheel.py", line 70, in make_in
    poetry, target_dir=directory, original=original, executable=executable
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/builders/wheel.py", line 57, in __init__
    super(WheelBuilder, self).__init__(poetry, executable=executable)
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/builders/builder.py", line 89, in __init__
    includes=includes,
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/utils/module.py", line 77, in __init__
    source=package.get("from"),
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/utils/package_include.py", line 22, in __init__
    self.check_elements()
  File "/Users/developer/.cache/pants/named_caches/pex_root/venvs/s/e9759030/venv/lib/python3.7/site-packages/poetry/core/masonry/utils/package_include.py", line 61, in check_elements
    "{} does not contain any element".format(self._base / self._include)
ValueError: /private/var/folders/9w/9_4n35r57d707zkrk86rvh9w0000gn/T/process-executionwrO07k/chroot/ds_research does not contain any element



Use --no-process-execution-local-cleanup to preserve process chroots for inspection.
If I run
poetry build
inside
src/ds-research
, it builds the distribution correctly.
The
pyproject.toml
looks like
Copy code
[tool.poetry]
name = "ds-research"
version = "0.1.0"
description = "data science projects"
authors = ["email-address"]
packages = [
    { include = "ds_research" },
    { include = "ds_research/**/*.py" },
]

[tool.poetry.dependencies]
python = "^3.8,<3.10"
pandas = "^1.3.0"
environs = "^9.3.2"
psycopg2 = "^2.9.1"
geopandas = "^0.9.0"
google-cloud = "^0.34.0"
pre-commit = "^2.13.0"
nbdime = "^3.1.0"
awswrangler = "^2.12.1"
pymsteams = "^0.1.16"
mygeotab = "^0.8.6"

[tool.poetry.dev-dependencies]
pytest = "^5.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
.
So I was able to build the distribution with this BUILD file. But the issue now is that I need to provide every individual module which is part of
ds-research
as a dependency. Is there an easier way to provide the whole folder as a dependency?
Copy code
poetry_requirements()


python_distribution(
    name="ds_research",
    dependencies=["src/ds-research/ds_research/data:data"],
    wheel=True,
    sdist=True,
    provides=setup_py(
        name="ds_research",
        version="0.1.0",
        description="DS research.",
    ),
    wheel_config_settings={"--global-option": ["--python-tag", "py37.py38.py39"]},
)
This is the
setup.py
generated by pants when I run
./pants package src/ds-research:ds_research
Copy code
# DO NOT EDIT THIS FILE -- AUTOGENERATED BY PANTS
# Target: src/ds-research:ds-research

from setuptools import setup

setup(**{
    'description': 'DS research.',
    'install_requires': (
    ),
    'name': 'ds-research',
    'namespace_packages': (
    ),
    'package_data': {
    },
    'packages': (
    ),
    'version': '0.1.0',
})
As you can see it has no entry for
packages
h
Looking now
Ah, so you already have a non-setuptools build backend set up (poetry.core.masonry.api ), and we can have Pants just invoke that. You shouldn't need a generated setup.py at all!
So I think you want this as your BUILD file, like you had before:
Copy code
resource(name="lib", source="pyproject.toml")
python_distribution(
    name="ds-research",
    dependencies=[":lib"],
    wheel=True,
    sdist=True,
    provides=setup_py(
        name="ds-research",
        version="0.1.0",
    ),
    wheel_config_settings={"--global-option": ["--python-tag", "py37.py38.py39"]},
)
Oh, but you need to either add
generate_setup = False
on that target
Or you can turn it off globally for your entire repo, in `pants.toml`:
Copy code
[setup-py-generation]
generate-setup-default = false
I'll fix the documentation example
which omits
generate_setup = False
With either of those Pants should just invoke your existing pyproject.toml backend, and not attempt to generate a setup.py or run setuptools
r
Thank you for having a look. I do see it's using poetry build now but it still generates
setup.py
with all correct 3rd party dependencies defined in
pyproject.toml
Copy code
repo
|
+-- src/
|   |
|   +-- ds-research/
|   |   |
|   |   +-- pyproject.toml
|   |   +-- BUILD
|   |   +-- ds_research/
|   |   |   |
|   |   |   +--- BUILD
|   |   |   +--- data/
+-- pants
|
+-- pants.toml
|
+-- BUILD
|
+-- requirements.txt
The other issue I pointed out was that it's not able to figure out all the local dependencies for
ds_research
such as another package inside it called
data
unless I explicitly add it to the BUILD file inside
ds_research
. This BUILD file with explicit dependencies looks like
Copy code
python_sources(dependencies=["src/ds-research/ds_research/data:data"])
I feel like this can introduce bugs if we don't actively keep adding any new package added inside
ds_research
to the dependencies inside the BUILD
poetry
figures it out on its own such sub-packages
Deactivating set-up generation at global level using pants.toml doesn't work. This is the error stack
Copy code
23:47:01.10 [ERROR] Invalid option 'generate-setup-default' under [setup-py-generation] in /Users/developer/shantanu/pyfleet/pants.toml
23:47:01.10 [ERROR] Invalid config entries detected. See log for details on which entries to update or remove.
(Specify --no-verify-config to disable this check.)


Use --print-stacktrace for more error details and/or -ldebug for more logs. 
See <https://www.pantsbuild.org/v2.8/docs/troubleshooting> for common issues.
Consider reaching out for help: <https://www.pantsbuild.org/v2.8/docs/getting-help>
h
I guess it should be generate_setup_default (underscores, not dashes) my mistake
And you shouldn't have to add dependencies manually
I'm confused as to why it's generating a setup.py, how are you seeing that? Where is the setup.py being generated to?
Well, you shouldn't have to add dependencies manually if they can be inferred. I.e., if there is an
import
statement that reflects the dependency.
Is
ds_research/data
imported by some file in
ds_research
?
poetry doesn't really figure out dependencies, it just takes all the subpackages, because of your
Copy code
packages = [
    { include = "ds_research" },
    { include = "ds_research/**/*.py" },
]
If those subpackage dependencies can't be inferred from import statements then you either need explicit dependencies from the package onto the subpackage, or you can have a single target that globs over all the subpackages (
sources=["**/*.py"]
for example)
I see there is no actual code in
ds_research/
? so there are no imports
the
python_distribution
target does need an explicit dependency on some
python_sources
You could put a
python_sources(name="srcs", sources=["**/*.py"])
in the same directory as
python_distribution
, and have an explicit dep on
:srcs
.
r
I see the
setup.py
when I extract the generated tar dist locally. Please see the attached screenshot
Yes currently there are no other sub-packages except
data
inside
ds_research
Actually I manually added this statement in
pyproject.toml
when
pants
was failing to build the distribution and throwing error that the
ds_research
is empty or something. Even before that
poetry
was able to package them all in same dist. I suppose internally
poetry
does have some default
packages
statement.
Copy code
packages = [
    { include = "ds_research" },
    { include = "ds_research/**/*.py" },
]
h
Hmmm it really shouldn't be generating a setup.py (and even if it is it shouldn't be using it), you definitely set
generate_setup = False
on the
python_distribution
target?
When you run
./pants package
does it log something like "Run poetry.core.masonry.api for ds_research" ?
r
Hey yeah when I run
./pants package
I do see "Run poetry.core.masonry.api for ds_research"
I have tried both the global and
python_distribution
options and both time, it creates
setup.py
By the way, I think it's expected. I just did a
poetry build
and poetry is also generating
setup.py
, which I guess is expected when you are building a dist or wheel.
I guess it's poetry which is at fault here since you are just calling the poetry api. So pants can't do much about it. So this
setup.py
is one generated by poetry, not pants.
h
Aha! Mystery solved! So poetry is using setuptools under the covers, who knew?!
Ok so for your layout I'll reiterate my recommendation to have a single python_sources target in each project that globs over all sources in all subfolders, as I described above. Then you just need one explicit dep from the python_distribution target to that python_sources target. Let me know if that works!