Hi all, I’m Moritz and I am heading ML and enginee...
# welcome
g
Hi all, I’m Moritz and I am heading ML and engineering at beyondwords.io. We offer text-to-speech (TTS) services and are aiming to consolidate our TTS stack into a mono-repo for various reasons: • better manage cross-service dependencies and contract testing • streamline development and delivery workflows • Improve code visibility and collaboration We are looking into Pants to support this effort, specifically to: • shorten local commit cycles • shorten CI build times • improve developer UX (currently using Make) There is a vision piece, where we want to treat ML model training as build steps and manage ML workflows with the build system rather than relying on orchestration tools such as airflow. I am currently investigating how execution and caching of remote tasks and artefacts would work. I would be interested to know if anyone has or is planning to use Pants in a similar way 👋
👋 4
b
We heavily leverage Pants in an AI/ML org. I'm also a Pants maintainer, so feel free to talk my ear off. Some features of note: • Pants 2.16 now allows you to reference S3 artifacts in
BUILD
files and have them be downloaded ad-hoc ◦ Actually allows any URL handler to inject itself into any URI. S3 is just already implemented because it's assumed ot be common enough • Check out the upcoming
adhoc_
targets in..;. I think 2.16 (if not 2.17). It allows for very arbitrary process execution, with full Pants caching and isolation. It's kinda equivalent to
make
in a very rough sense.
❤️ 1
b
Hi Moritz! We have some similarities (although our company is much younger), cf my post a bit earlier in this channel. Something I'm implementing for "ML model training as build steps" is to use DVC to define training pipelines (already in-place), check in the CI pipeline whether the DVC locks are up to date (do the dependencies hashes match the current hashes), and run the training or fail the build depending on cases. So, although we've moved to Pants, we don't plan to handle this part with Pants in the near term. I had not heard about the new features Joshua mentions (we're still on 2.15). A big difference would be that, by default, Pants is inferring dependencies while we are adding dependencies (both data and code) by hand.
💯 1
g
thanks @bitter-ability-32190 and @boundless-ambulance-11161, great to hear that you are using Pants in ML projects already. Looking forward to exploring further!
@boundless-ambulance-11161 now that a few month have passed, how has your experience been with running pants plus dvc - would you still recommend it? Would be great to get your thoughts
b
We are still using both. We haven't done everything I wanted (we're still less than 1yo, so we do not have a lot of capacity, and there is a trade-off between improving our process, and get the many low hanging fruits around us to improve the application). We are checking in CI that our DVC pipelines are up-to-date, but pants play no role in this yet. With regards to our usage of pants, one thing we improved since my last comment (but is not directly related to DVC) is to use pants to build and publish our docker images. We deploy most of our models/services as docker images. In previous companies, building docker images which depend on multiple internal packages in the repo, often entailed pushing/pulling the packages to a private pypi or manual handling of the build context (like temporary copying files). Which was a hassle. Pants detecting the dependencies, and including everything you need, is a game changer for me. This means that even our data scientists which don't understand docker are able to add/maintain docker images. Plus, the dependencies thing made it easy to have a github action doing "List the docker images with the tag 'auto-deploy' whose transitive dependencies (code, data, version of 3rdparty packages) have changed since origin/main; build the images and push them to our docker registry; update the yaml file of the corresponding service in the staging environment so that it now uses that image". We no longer have issues with staging being out-of-date with origin/main. When thinking about which tools I want to add to our stack, I used to ask "what will integrate well with DVC?". Now, I ask "what will integrate well with Pants?". I still use and like DVC. But I could see myself switching to another tool. But I see Pants as the heart of our repo, and I have a hard time thinking of a scenario where I'd switch to another build tool. (I promise I am not paid to advertise it)
❤️ 3
In case you're interested, to answer another question, I just posted the small script which deploys to our staging environment the images which changed. https://pantsbuild.slack.com/archives/C046T6T9U/p1692561726189359?thread_ts=1692540323.074429&cid=C046T6T9U It uses https://www.truefoundry.com/ but I'm sure you could adapt it to whatever platform you use to deploy your docker images (assuming you deploy containerized applications).
g
this is great feedback, thanks for sharing @boundless-ambulance-11161