Hi all, My college <@U04K4U74S5R> and I wanted to...
# general
d
Hi all, My college @flaky-battery-655 and I wanted to reach out to the community regarding our use case at INGU for pants build to see if there are any suggestion around how we are currently and planning on using it. We are moving away from multiple python repositories into a monorepo with multiple projects. The planned final structure is: - monorepo - apps - example-gui-tool.py - assets - src - python-api - math - models - plots - utils - tests Many of the tools in
apps
are used by non-developers in parts of the company that will depend on the algorithms written in
src
. This will also allow a more formal development process into
src
to be treated similar as numpy and pandas for internal development. Everything up to this point had been built exclusively in a windows environment. Setting up to use pants to handle our dependencies, testing, and distribution we have started working in WSL using ubuntu 22.04. My understanding that in addition to being able to run pants this should help us remove any incurred dependencies on windows enabling easier distribution to other operating systems or servers. The questions we have at the moment are: • What is the best way to distribute standalone python apps to internal workers on windows machines? ◦ We have been reviewing the docs but the support seems to centered towards pex (https://www.pantsbuild.org/docs/python-distributions) • Some members of the development team work in pycharm while others us vs code. Our current plan to to leave most development flows unaltered in the previous windows environments with pycharm and handle testing/distribution separately in ubuntu or using the WSL extension for vs code. Does this raise any red flags? In addition, we would very much appreciate any feedback regarding our proposed structure and plan this far! Let me know if there are any other details that would help in understanding our questions or if you have any questions for us!
👀 2
w
On the topic of using Pants in Windows (https://www.pantsbuild.org/v2.15/docs/prerequisites#microsoft-windows), there's one big gotcha depending on which directory in WSL you use: https://github.com/pantsbuild/pants/issues/16534 and https://github.com/pantsbuild/pants/pull/18000 In short, you can get stale test data, as the Pants daemon can't pick up changes to the shared filesystem. One (bad) method is to turn off the daemon/watcher, while the better solution is:
So a workaround is to work with a repo that is not under a network drive from WSL2's perspective. E.g., your homedir.
There is native Windows support in the works, but without a specific ETA at the moment (https://pantsbuild.slack.com/archives/C046T6TA4/p1674400161736799?thread_ts=1674397620.573459&amp;cid=C046T6TA4). Pex requires Windows support first, Pants follows. For standalone builds, there are several of the common Python installers (PyInstaller, Nuitka, etc), but a promising alternative is PyOxidizer (https://pyoxidizer.readthedocs.io/en/stable/) and I wrote a basic 80/20 Pants plugin for that. PyOx basically packages your source code along with a Python interpreter. It has certain limitations, but it's a good approach. There is also the
scie
approach, which is how Pants deploys to Mac/Linux (and WSL? I'm assuming so, but never tested) - https://github.com/pantsbuild/scie-pants - basically, that will "package" (more like transparently download at first run) a Python interpreter + a
pex
of your code (https://github.com/a-scie/jump). It's not limited to Python.
Some members of the development team work in pycharm while others us vs code. Our current plan to to leave most development flows unaltered in the previous windows environments with pycharm and handle testing/distribution separately in ubuntu or using the WSL extension for vs code. Does this raise any red flags?
Didn't fully comprehend this. If it's a Windows shop, where does Ubuntu fit in? Choice of IDE shouldn't really matter, but if you're referring to testing/distribution - could that be done in CI on Github or something? Alternatively, maybe using Pant's Docker support for a consistent build system? I'm grasping at straws on this one a bit - as I don't fully grok the dev workflow.
Lastly, as for the repo structure - Pants is happily ambivalent (mostly) to the structure of your repo. So whatever works for you. Visually, the structure you have only seems slightly odd by the concept of "apps" (which are source code) and then a nested "src/whatever" - but that's unrelated to Pants, and just me being me. And is your
tests
folder a shared set of utilities? Or are you replicating the structure of the rest of your repo in there for tests
h
Hi! My 2 cents: • Pants is fairly agnostic to layout, but I think yours makes sense. Specifically, I like that you're putting all your library code in a single combined package hierarchy, under a single source root (
src/
). That means that you don't have to have a mental model of different logical "projects" and how they map onto packages (as you would pre-monorepo), you just have a package hierarchy, and you let Pants take care of dependency management. • Re distributing binaries to run on Windows machines, are they running under WSL2? If so then Pex seems like the right answer to me.
d
@wide-midnight-78598
Didn't fully comprehend this. If it's a Windows shop, where does Ubuntu fit in?
Choice of IDE shouldn't really matter, but if you're referring to testing/distribution - could that be done in CI on Github or something? Alternatively, maybe using Pant's Docker support for a consistent build system?
I'm grasping at straws on this one a bit - as I don't fully grok the dev workflow.
I think some additional background on the structure of the team will help clarify some of our choices so far. When I mentioned the differences in IDE I did not well explain why this is a notable point. We originally started setting up pants in windows as we didn't know it wasn't supported (https://github.com/pantsbuild/pants/issues/17986). After receiving clarification we set up ubuntu on our machines within WSL2 so we can work with pants and learn how it works and what breaks it. So where ubuntu fits in is the linux distribution of choice at the moment. The choice of IDE becomes important with ongoing development, we are a team of data scientist from physics backgrounds with varying degrees of experience in proper development workflows. A significant portion of the team will mostly be performing "scientific development" creating scripts and and algorithms that will use the python-api before being properly integrated. These members are also more familiar with pycharm which does not handle working from windows within WSL as well as vs code (https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-wsl). As such we want to maintain the ability to work purely within windows regardless of IDE. We have documented how to set up pycharm within WSL but as we are pushing for change away from a "not broken" windows dependent code base we are trying to avoid any strong arming of "you now need to learn linux". Those of us handling the dependencies and distribution are more than comfortable working with the ubuntu CLI and we will use the above extension for vs code.
For standalone builds, there are several of the common Python installers (PyInstaller, Nuitka, etc), but a promising alternative is PyOxidizer (https://pyoxidizer.readthedocs.io/en/stable/) and I wrote a basic 80/20 Pants plugin for that. PyOx basically packages your source code along with a Python interpreter. It has certain limitations, but it's a good approach.
I will have to look into PyOxidizer in more detail. In addition, to the development/data science team there are various members of operations and manufacturing that use pieces of the pyhton code. These members do not perform any coding and as such a .py file should just run when clicked on. As you can imagine this becomes difficult to handle and maintain python dependencies across the company amongst those that do not work in python.
Lastly, as for the repo structure - Pants is happily ambivalent (mostly) to the structure of your repo. So whatever works for you.
Visually, the structure you have only seems slightly odd by the concept of "apps" (which are source code) and then a nested "src/whatever" - but that's unrelated to Pants, and just me being me. And is your
tests
folder a shared set of utilities? Or are you replicating the structure of the rest of your repo in there for tests
The rational behind apps being outside of the source was to have "executable" pieces of the code living in one place. Ideally these would just import the api and then run the specified app so "no" source code would actually be contained. Would it make more sense in your opinion to contain this along side the python-api? So
src/apps
and
src/python-api
(worth noting we have internals names for the python-api and monorepo). We fundamentally followed this (https://www.pantsbuild.org/v2.15/docs/source-roots#multiple-top-level-projects) perhaps each app should become it's own project? Tests are also somewhat of a foreign concept in the established workflow and another push for change. So we are very open to input as to where this should go? @happy-kitchen-89482
Hi! My 2 cents:
• Pants is fairly agnostic to layout, but I think yours makes sense. Specifically, I like that you're putting all your library code in a single combined package hierarchy, under a single source root (
src/
). That means that you don't have to have a mental model of different logical "projects" and how they map onto packages (as you would pre-monorepo), you just have a package hierarchy, and you let Pants take care of dependency management.
• Re distributing binaries to run on Windows machines, are they running under WSL2? If so then Pex seems like the right answer to me.
Glad to hear our structure makes sense, per @wide-midnight-78598’s point above maybe it does make sense to move the apps since they will all fundamentally be python and if this repo grows to projects that depend on other languages then these apps should be nesting in the python part of the repository? Ideally, everybody would have an understanding and be able to set up WSL to work with the application but most of the company will not. So we will want to be distributing to run on machines with a vanilla installation of python 3.8. Getting everything set up in linux has been great because it's highlighting anything that is windows dependent which is making the code base scale much better as the company grows, I have been testing everything along side on my M1 arm-based mac to ensure it all works! Pex sounds like it would be the preference if we were working with windows.
h
My ultimate recommendation would be to have a single source root:
src/python/ingu
and then under that
src/python/ingu/apps
perhaps, alongside
src/python/ingu/math
,
src/python/ingu/models
etc etc.
src/python
is your (only) python source root, and leaves you the option for
src/scala
or
src/js
etc. in the future.
ingu
as a top-level namespace to ensure you don't collide with any external packages. As for tests, you can have a
src/python/ingu/tests
(again, to ensure no import collisions). But note that having tests in a separate folder is usually imposed by tooling that expects it. Pants supports having tests live alongside the code they test. So
foo.py
has
foo_test.py
right next to it in the same dir. The advantage of this is that you can easily find the tests for each module (and you can see if no such tests exist...) Pants will do the right thing if you
./pants test src/python/ingu/math::
for example. So that is my personal opinionated recommendation, but Pants itself doesn't require any of this.
d
Thanks for the feedback, I see your point of having the python code all in one place within the src directory. I originally though the tests directory was required for running the tests. I like having the tests live along side the code since I think it also helps to promote writing of the tests as well! Will report back with where we end up going as we progress in case this thread helps any others!
h
Yep, the separate tests directory is required by standard tooling. For example, so that the tests don't get packaged into the deployable artifact. But Pants does not require it.