limited-ghost-8212
02/25/2024, 12:27 AM.
├── BUILD
├── data_lake
│ ├── commons
│ │ ├── commons/
│ │ ├── poetry.lock
│ │ ├── poetry.toml
│ │ ├── pyproject.toml
│ │ ├── README.md
│ │ └── tests/
│ ├── data-asset-1
│ │ ├── BUILD
│ │ ├── data-asset-1/
│ │ ├── lockfile.txt
│ │ ├── pyproject.toml
│ │ ├── README.md
│ │ └── tests/
│ └── data-asset-2
│ ├── BUILD
│ ├── lockfile.txt
│ ├── poetry.lock
│ ├── pyproject.toml
│ ├── README.md
│ ├── data_asset_2/
│ └── tests/
├── dist
│ └── export
│ └── python
├── Justfile
├── lockfile.txt
├── pants.toml
├── pyproject.toml
└── README.md
The contents of pants.toml are as following
[GLOBAL]
# TODO: try out ruff lint + format
pants_version = "2.20.0dev7"
backend_packages = [
"pants.backend.python",
"pants.backend.python.typecheck.mypy",
"pants.backend.experimental.python",
"pants.backend.experimental.python.lint.ruff",
]
[python]
enable_resolves = true
default_resolve = "default"
interpreter_constraints = [">=3.11,<3.12"]
[python-bootstrap]
search_path = ["<ASDF>"]
[python.resolves]
default = "lockfile.txt"
commons = "data_lake/commons/lockfile.txt"
data_asset_1 = "data_lake/data_asset_1/lockfile.txt"
data_asset_2 = "data_lake/data_asset_2/lockfile.txt"
[python-repos]
indexes = ["intenral repository"]
[python-infer]
use_rust_parser = true
[pytest]
args = ["-vv"]
[export]
py_resolve_format="mutable_virtualenv"
resolve = [
'commons',
'data_asset_1',
'data_asset_2',
]
[source]
marker_filenames = ["pyproject.toml"]
The contents of data_lake/data_asset_1/data_asset_1/BUILD are as following
poetry_requirements(
name="poetry",
source="pyproject.toml",
resolve="data_asset_1",
)
files(
name = "data_asset_1_files",
sources = ["data_asset_1/**", "README.md"],
)
file(
name="pyproject_toml",
source="pyproject.toml",
)
python_distribution(
name = "data_asset_1",
dependencies = [
":poetry",
":data_asset_1_files",
":pyproject_toml",
],
provides = python_artifact(),
generate_setup = False,
repositories = [
"internal repository"
]
)
fresh-continent-76371
02/25/2024, 10:01 PMtags/path.to/thing/that/publishes/vX.Y.Z
fresh-continent-76371
02/25/2024, 10:02 PMfresh-continent-76371
02/25/2024, 10:04 PMfresh-continent-76371
02/25/2024, 10:30 PM• We want to keep using Python virtual environment and Python editable installs for the development of a projectyes we do this
• We want to use Poetryyes - for windows, because pants does not work performantly (WSL) and we have some custom tasks which were written in sh, we use poetry
• We want to keep using VS Code as our primary IDEYes 60% VSCode, 40% PyCharm (me VSCode)
We also need help in setting up the “commons” library such that it is available as an dependency for all our projects and is available as part of the project’s virtual environment. It should also be included in the bundled package for each project. How can this be achieved?Pants does this naturally - that is • either - the user is using "in repo" so it's not versioned, but just available in a venv, from an install • we have not exercised this need much yet Pants "publishing" the library, means that an outsider can just depend upon it.
Currently our dependencies within a project have a high overlap with each other. What tradeoffs do we need to evaluate for a per project dependency management vs. global dependency management on the monorepo level?we tried to use one python-resolve. but some projects needed a different python (3.10, 3.12, 3.11 ) so that failed. and we had to have two / three resolves. YMMV
We currently see that we will have to bump project versions manually. Is there an already included feature to auto update versions? And what is the recommendation when wanting to have semantic versioning? Because the current approach with conventional commits is not straightforward in the case of a monorepo, as semantic versioning libraries today are not monorepo aware…as mentioned - we are using Conventional-commits, Semver. I have a strong opnion, that the VERSION does not matter, but having one is crucial - it is a communication and operational mechanism to save time and have clear planning. so it is needed. but what it is - does not matter, so handing that job to conv.commits and then letting versioning be in the hands of the developers (via commit) is awesome.
limited-ghost-8212
02/25/2024, 11:03 PMlimited-ghost-8212
02/25/2024, 11:24 PM• We “multi version” using conventional commits, semver - of all “publishable” packages.
◦ this is complex - but the principal is
▪︎ everything that needs to publish with a version, add to their BUILD, version_and_changelog() <-- custom
▪︎ this behind the scenes operates via “pants run //tgt/address:bump_version” - and then test/package can run etc
• versioning is handled by convention-commits, convco *( a rust CLI which has support for monorepos)• Had a quick read of the convco documentation and couldn’t figure out how monorepo versioning is supported. Probably need to look again. • It would be great if you can elaborate a bit more about your setup here. I am guessing you have some custom scripts to glue things together… • When you say
add to their BUILD
, do you mean updating python_distribution.provides.python_artifact.version
?
• When you say version_and_changelog()
, is this something Pants offers?
Pants does this naturally - that is
• either - the user is using “in repo” so it’s not versioned, but just available in a venv, from an installThe aim here is to have a shared library package that is published to our internal PyPi. Additionally, when running
pants package data_lake/data_asset_1
, I want the shared library to also get packaged along with data_asset_1 source.
This is an optimisation I am wanting to do to reduce the effort of using a python package with as artifacts submitted to an Apache Spark cluster. Rather than providing data_asset_1.tar.gz and commons.tar.gz, I am wanting to provide only data_asset_1.tar.gz with the latest commons already part of this tar.
• My struggle currently is getting the pants configuration right for this. 😞
• I was not able to make pants include the shared library to all project venvs that get created as part pants export
and to all project packages that get created as part of pants package ::
• If you have made it work, maybe share how are you achieving it so far?limited-ghost-8212
02/26/2024, 3:13 PMcurved-television-6568
02/27/2024, 8:36 AMpex_binary
target. that will create a distributable file that includes everything you need in a single file (except the python interpreter). (it is a executable archive, so you can inspect/unpack it as desired using your normal tools, pex also have tools for creating (installing) venvs on disk from a pex archive.)fresh-continent-76371
02/27/2024, 8:43 PMfresh-continent-76371
02/27/2024, 9:21 PMHad a quick read of the convco documentation and couldn’t figure out how monorepo versioning is supported. Probably need to look again.To understand how convco is used used, first we need to understand the versioning "rules" I applied to the repository, I added the "repo" tagging strategy of the following - Every versionable thing is a package, in a directory
./apps/arc/VERSION
./apps/devolver/VERSION
./apps/log_manager/VERSION
./lib/finder_2000/VERSION
./lib/knohh/VERSION
./tools/containers/cicd-general/VERSION
./tools/python/acme_cicd/VERSION
Taking a docker image as an example, it may be made up of 3 wheels, and a rust binary, from within the pants monorepo.
These dependencies may be versioned, or not versioned, it does not matter.
for the Dockerfile - we see lines like
COPY /*.whl /tmp
RUN pip install /tmp/*.whl && rm /tmp/*.whl
- Every version gets a tag (at a point in it's lifecycle) - merge2main, MR/PR, maintenance branches
the tags look like (borrowing the example above) - apps/arc/v4.5.1, lib/knohh/v0.2.33
this is what you see when you run git tag -l
(amongst other tags)
- The existence of a custom pants goal (:bump_version) tells you what should be versioned.
(you can't look for "VERSION" files because there may not be one yet).
My initial method of "what should be versioned" was to look for pants "packages" and version them all (that turns out not to be true)
and it is certainly tidier to "mark" the package-able thing as "version this please"
- A tag, represents the "bundled" package, including all it's files and dependencies
In pseudo code, this looks like
if the directory (apps/arc
has any changes) then bump the version
practically, we ask pants this question, which in turn can include the transitive dependencies (neat!)
-breath - now we can look at how convco is operated.
convco supports two features --prefix and --paths
- --prefix
- convco version --prefix apps/arc/v ...
tells convco, to look for a tag of that form
- --paths
- convco version --prefix apps/arc/v --paths apps/arc
tells convco, hunt the commits, for conventional commits only where these commits included changes to files in that (these paths).
It would be great if you can elaborate a bit more about your setup here. I am guessing you have some custom scripts to glue things together…Very, and my goal would be to push via contribution, or example, this setup so people can see it working. There was once a statement that said you can't / or should not version out of a monorepo, (so i took that as a challenge) and so far it has proved not to be true.
When you say add to their BUILD, do you mean updating python_distribution.provides.python_artifact.version ?Yes - i have a plugin that does that.
When you say version_and_changelog(), is this something Pants offers?No this is a macro, that adds "two targets" to the BUILD file, which are ":bump_version" and ":gen_changelog"
def version_and_changelog():
# this runs convco (inside a python script), and determines the next version of the package
# the PWD of this command is the "directory" of the BUILD file.
# the PANTS_BUILDROOT_OVERRIDE is the root of the workspace.
# ${PWD#$PANTS_BUILDROOT_OVERRIDE/} will equal the relative path to the versionable thing
# python $PANTS_BUILDROOT_OVERRIDE/tools/python/acme_cicd/src/acme_cicd/versioning_ops.py version lib/jlab VERSION
#
run_shell_command(
name="bump_version",
#
command="python $PANTS_BUILDROOT_OVERRIDE/tools/python/acme_cicd/src/acme_cicd/versioning_ops.py version ${PWD#$PANTS_BUILDROOT_OVERRIDE/} VERSION",
)
run_shell_command(
name="gen_changelog",
command="python $PANTS_BUILDROOT_OVERRIDE/tools/python/acme_cicd/src/acme_cicd/versioning_ops.py changelog ${PWD#$PANTS_BUILDROOT_OVERRIDE/} CHANGELOG.md",
)
either - the user is using “in repo” so it’s not versioned, but just available in a venv, from an install
The aim here is to have a shared library package that is published to our internal PyPi.
Additionally, when running pants package data_lake/data_asset_1, I want the shared library to also get packaged along with data_asset_1 source.
This is an optimisation I am wanting to do to reduce the effort of using a python package with as artifacts submitted to an Apache Spark cluster. Rather than providing data_asset_1.tar.gz and commons.tar.gz, I am wanting to provide only data_asset_1.tar.gz with the latest commons already part of this tar.This works - yes. We don't let developers push to the package repo (like normal - that is danger) - all things go through CI - but - they can build a wheel, or image, and it's versioned for them (
6.1.2-dev.cafe67677
) just the same
My struggle currently is getting the pants configuration right for this. 😞It is not easy, but you're query does push me to make a public repo example.
I was not able to make pants include the shared library to all project venvs that get created as part pants export and to all project packages that get created as part of pants package ::This reads to me like - pipeline publishes and versions shared-library and all downstream libaries, together as separate or same versions There are some complications, don't be thinking it is all roses - it is not, but as we have worked through the issues, our team has become faster because, simply; and we reduced the support burden because we have provenance, because we version all things :-D