Is it possible to have separate requirements files that are Pants #general

Is it possible to have separate requirements files...

brash-glass-61350

12/17/2024, 5:45 PM

Is it possible to have separate requirements files that are more or less independent for different parts of the code? E.g., have a requirements-production.txt and a requirements-research.txt. The production file should be a subset of the research file. Thinking about this to speed up builds and to not have to worry about packages someone needs to run one experiment in a notebook

elegant-florist-94385

12/17/2024, 6:13 PM

Do you need a

requirements-production.txt

to represent explicitly removing some dependencies (compared to the

requirements-research.txt

)? I ask because my gut instinct is that you would be fine with just 1 requirements file. Anything pants does (builds, tests, etc.) will only need/use those dependencies that are actually required. Typically, the main reasons to use multiple requirements files (or lockfiles) are: 1. You have projects that use the same 3rd party dependency, but cannot use the same version. You need a requirements file for each, (and then specify which lockfile each of your source files is designed to work with) 2. You are in the "one lockfile" situation, but you need to actively prevent some projects from using certain dependencies. These projects need to be linked to their own lockfile that does not contain those dependencies. So in your case, I don't think you need

requirements-production.txt

if you don't want some particular experiment to use

numpy

(to choose a random example), then just don't import

numpy

, and then the build won't have

numpy

If you want your build to fail because someone accidentally imported

numpy

(or imported some common file that imports something that eventually imports

numpy

) then you might need to consider having a separate

requirements-production.txt

elegant-florist-94385

12/17/2024, 6:14 PM

In other words, if its just a case of "speeding up builds" and "not worry[ing] about [unneeded] packages" then this should be the behavior you get by default with pants

brash-glass-61350

12/17/2024, 7:06 PM

I can give some further context. My pain point is that people, particularly those in the research org, keep adding dependencies to our single

requirements.txt

file. They need these dependencies to run experiments (that also interact with the production code). However, as they add more and more dependencies, it becomes harder to add/update dependencies in production - all because some random package used in a couple of jupyter notebooks uses it

brash-glass-61350

12/17/2024, 7:07 PM

I'd like to have a set of requirements used in production, which I closely track, and a different set of requirements for single use imports (maybe even stuff outside the lockfiles if possible?)

elegant-florist-94385

12/17/2024, 8:04 PM

So the main goal is that you want to be able to add/upgrade production dependencies without dealing with some random one-off use that messes up the version constraints? You mentioned earlier that prod should be a subset of research. I don't know of any way to enforce that particular constraint, but otherwise, it sounds like a two lockfile approach is what you'll need. (I don't have a ton of experience with this, but bear with me.) I think you'll want to: • follow the "single lockfile" approach above and follow the pants docs to set it as your default lockfile • Create a second lockfile for prod (I don't think there's any way to subset or inherit or anything. Just have to copy) • All your production code needs to set the second lockfile as their resolve. (in python_sources) • Any common library type stuff that needs to work with both prod code and the regular code should use

resolve=parametrize(...)

to declare that it must work with both resolves. This will make pants lint/test/mypy/etc. these files twice, to make sure they work with both resolves

elegant-florist-94385

12/17/2024, 8:05 PM

There will probably be some more hiccups along the way, but I think that's the rough idea you'll need to follow

brash-glass-61350

12/18/2024, 8:34 AM

Thanks @elegant-florist-94385! I'll give this a shot. Is there a reason not to make the production code the default resolve?

elegant-florist-94385

12/18/2024, 12:36 PM

Since production is the more strict one, you probably want it to be explicitly opt-in. But you could probably go either way, as long as its clear to your team how to use them each in the intended ways

careful-address-89803

12/18/2024, 11:32 PM

you can parametrise the

python_requirements.resolve

field for production, and then have other

python_requirements

for the/each research resolve. I think something like this should work:

Copy code

python_requirements(name="all", source="requirements.txt", resolve=parametrize("prod", "research"))
python_requirements(name="research", source="requirements_research.txt", resolve="research")

elegant-florist-94385

12/19/2024, 10:46 AM

@careful-address-89803 Good thought! That would mean that the

research

resolve gets all the standard requirements, and`requirements_research.txt` only needs to specfiy requirements above and beyond those, correct?

careful-address-89803

12/19/2024, 9:44 PM

Yep! I think that'll work. I'm not so sure about whether that will bring much gain, though. you have to relock the

research

resolve every time someone adds something or you update one of the core packages, so I think you might spend the same amount of time locking things...

brash-glass-61350

12/20/2024, 6:31 PM

We're already using resolves because of torch (we need to install different wheels on linux, macos, and gpu machines). Can we have two resolves together? Ie, -prod-linux, -prod-gpu, ...?

13 Views

Open in Slack

Previous Next