Hey everyone! Hope everyone is doing well :) My co...
# general
c
Hey everyone! Hope everyone is doing well :) My company is currently building a monorepo for training and serving Machine Learning models and, right now, every team is migrating their models to this monorepo. We want to lower the amount of work these teams need to do to migrate their models, but we’re facing some problems regarding the way Pants (apparently) tracks changes to components. Basically, every time someone adds a package to our requirements and updates the respective lockfiles, Pants detects changes to every component in the monorepo. By reading the docs and what people say here and there on the internet, we are almost 100% sure that Pants detects changes to every component if the lockfile is updated in any way, even if transitive dependencies’ versions are all pinned down. So, in this scenario, every time anyone adds something that would result in changes to the lockfiles, we need to manually do a sanity check for every single component in our monorepo, and every one of them needs to be re-deployed. So… Is this kind of scenario expected? Do we need to make sure everything works (manually, or with unit and integration tests) every time someone: • adds a new dependency? • changes versions of dependencies not related to their specific models? • wants to pin the version of a package (for whatever reason), even if the pinned down version is the same one the lockfile already specifies? Then, if this is expected and we need to do these checks, is there any recommended way of doing dependency management that lowers the amount of affected components with each change we need to make? Maybe something related to how we're creating/using our lockfiles? For some extra info (I'll add more if needed): • We’re using Pants 2.16.0. • We currently have one lockfile for the production side of things and others for testing, dev environment dev/testing/staging environments, and so on.
c
Hi! Seems you have a pretty good sense of the land here. Pants is currently not able to track the individual lines changed in a requirements file, so any change needs to be treated as all lines potentially changed. If this becomes too much, you could partition the repo into sections with resolves, so each one get their own lockfile. May not be ideal, but worth mentioning that you can use multiple requirement files as inputs to a single resolve, and with that have a common requirements file for global stuff to avoid some duplication (but changing a dep in that file would see the same impact you have now).
also, sounds scary to test with a different lockfile from what you use to deploy to production?
c
Wait, I think I got confused and added false information in that statement about different lockfiles. Gonna edit the text 🙂
😅 1
h
Yeah, if the lockfile changes then everything that uses it might be affected, so is invalidated. But if the transitive deps of a test, say, weren't affected in practice then you should get cache hits all the way (after a cache miss regenerating the lockfile subset used by the test)
Are you not seeing this?
c
Nice! Thanks for the responses! In my specific case, I'm facing these problems (packages I judge shouldn't be affected) while running my tests and such in a CI tool. In fact this whole situation is only a problem in the CI environment, in the sense that I don't mind it happening locally. For now, we still don't have remote caching that should allow for the cache hits that would solve this issue for me. Locally, if these components are affected but things are cached, then yes, cache hits all the time. Maybe I could mitigate the problem by implementing remote caching already, which I think will be simpler/easier to do rn?
h
Ah, yes, remote caching is probably what you want here
❤️ 1