Good afternoon. I'm trying to get pants to genera...
# general
c
Good afternoon. I'm trying to get pants to generate a lockfile on the "databricks-connect" package. Here is the package as defined in my poetry pyproject.toml
Copy code
[tool.poetry.dependencies]
databricks-connect =  "~=11.3"
poetry update handles this ok pants generate-lockfiles does not...it returns an error as follows:
Copy code
❯ pants generate-lockfiles
15:58:36.52 [INFO] Initialization options changed: reinitializing scheduler...
15:58:45.43 [INFO] Scheduler initialized.
16:06:31.66 [INFO] Completed: Generate lockfile for python-default
16:06:31.68 [ERROR] 1 Exception encountered:

Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:

stderr:
Expected one top-level project directory to be extracted from /private/var/folders/t1/vn8r4hys02n0q0cn7jjsg_lw0000gp/T/pants-sandbox-bGVGvP/.tmp/tmpo8sydw0u/usr.local.var.pyenv.versions.3.9.5.bin.python3.9/databricks-connect-11.3.14.tar.gz, found 11: delta, PKG-INFO, DBCONNECT_LICENSE.txt, pyspark, <http://MANIFEST.in|MANIFEST.in>, README.md, setup.py, databricks_connect.egg-info, lib, deps, setup.cfg

Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.
The pants version is:
Copy code
❯ pants --version
2.16.0rc7
Is there a way to get pants convinced to use a package that has more then one top-level project directory?
I need content from the "delta" package and the "pyspark" package found in the list above.
b
I don’t have a strong understanding of this aspect of pants, but I wonder if package uses an unexpected/non-standard layout for its sdist (being 270MB for an sdist certainly makes me suspicious something is weird, especially because the 13.x wheels are 2MB!) and poetry happens to be slightly more permissive… that doesn’t help resolve the situation though. I wonder if telling pants/pex to use a newer pip might help? In pants.toml: (https://www.pantsbuild.org/docs/reference-python)
Copy code
[python]
pip_version = “23.0.1”
If that’s not helping, it might be worth experimenting with the commands being run by jumping into the sandbox (second heading in https://www.pantsbuild.org/docs/troubleshooting) and seeing if you can get to a minimal reproducer outside of pants (likely invoking the https://github.com/pantsbuild/pex cli somehow)
c
Updating the pip_version did not help. Going examine the sandbox.
What did help? though was commenting out the following:
Copy code
[python]
#enable_resolves = true # This one came from <https://www.pantsbuild.org/docs/python-lockfiles>
interpreter_constraints = ['==3.9.5']
pip_version = "23.0.1"
Nevermind...yeah apparently that just turned off generating the lockfile 😉
@enough-analyst-54434: I see the that error being raised is from the "_prepare_project_directory" method in "resolve/lockfile/create.py". Looks like you may have been the author of this module. Is there a way to get pants/pex to generate a lock file when a distribution has more than one top level item? In this case I'm trying to work with a databricks 11.3 series package that contains many items (of which I need "delta" and "pyspark")
e
I have no clue what databricks is doing there. Pex works fine building a setuptools wheel, which has multiple top level packages (setuptools and pkg_resources). The issue is an sdist must have 1 top-level directory this is all contained within: https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-distribution-file-format
So, @cool-account-59189 you are totally out of luck without a change to Pex to accommodate this strangeness.
c
I was able to work around this. It looks like the databricks package in general is a composite of two other available packages:
Copy code
[tool.poetry.dependencies]
....
# databricks-connect =  "~=11.3"
delta-spark = "2.1.0" # Not directly visible in the pip list of the databricks environment, but there was a comment that said this is the what is running there.
pyspark = "3.3.0" # pinned to the same version as our databricks environment
...
e
That itself is horrific. Re-packaging packages other projects own can only lead to tears. You fool resolvers into thinking they are different but they are not and then PYTHONPATH chaos ensues.
c
Thank you for the time looking into this John. I hope you have a great day.