04/14/2023, 7:29 PM
Hi, I'm the leader of the machine learning team in an early stage startup (Aviva credito, offering loans in underserved areas of Mexico). I knew I wanted a monorepo (to avoid the pain of having to do coordinated PRs, with coordinated tests when making breaking changes), and we will only use python for the foreseeable future. The choice of Pants comes from 3 constraints: • Build time (especially in CI). I've worked in start-ups where any commit added to a PR would trigger a 40 min build in CI, that made the PR review process extremely tedious. I knew I wanted a build system with a fine-grained cache invalidation. (the tools I had heard of for this were Pants and Bazel) • I have utilities packages (which are general, will probably be used in several projects), and projects packages. I wanted a single lock for the 3rd party dependencies, to ensure all packages can work together. But I also want them to be separate packages, to avoid having to install the huge deep learning libraries when working on a project which does not require our deep learning-related utilities. I first looked into handling this central lockfile with poetry but it seemed surprisingly hard. So I was happy to see it was easy in Pants. • Nobody in our team has experience with build systems and CI pipeline. So we wanted something simple to put in place and maintain. That was another advantage for Pants over Bazel (which seems more complex). We switched to Pants this week. But we still have some pain points: • Because I generated the lockfile on my laptop, the torch version is "torch-1.13.1%2Bcu116-cp310-cp310-linux_x86_64.whl". Which works fine in production (also on Ubuntu with same CUDA version). But we discovered that the build fails on my coworker laptop (Mac OS without GPU). I think it's related to this issue. I'm looking into solutions, probably multiple resolves corresponding to the different environments. It might lead to different versions of the other packages, so it wouldn't be a perfect fix, but better than some team members not being able to reuse the lockfile. • I'm struggling with the installation of a Build-Farm build agent (problems unrelated to Pants). I'm considering Toolchain, but the price looks a bit steep coming from Mexico (salaries are lower than in the US/EU, so it makes more sense to use a cheaper solution, even if it requires more work to get it to function). It's not clear to me, what advantages does Toolchain bring, compared to self-hosting Build-Farm (outside of the ease of setting it up). • [Edit: now solved, thanks to @enough-analyst-54434] Waiting for the Build-Farm build agent, we use the default build agent from Azure. But caching of "named_cache" only works sometimes. It seems due to the size of the cache (21G because it seems to duplicate the libraries, the virtualenvironment is only 7G). I did not get an answer to my question ( so I'll continue investigating.
👋 2