Is it possible to have separate requirements files...
# general
b
Is it possible to have separate requirements files that are more or less independent for different parts of the code? E.g., have a requirements-production.txt and a requirements-research.txt. The production file should be a subset of the research file. Thinking about this to speed up builds and to not have to worry about packages someone needs to run one experiment in a notebook
e
Do you need a
requirements-production.txt
to represent explicitly removing some dependencies (compared to the
requirements-research.txt
)? I ask because my gut instinct is that you would be fine with just 1 requirements file. Anything pants does (builds, tests, etc.) will only need/use those dependencies that are actually required. Typically, the main reasons to use multiple requirements files (or lockfiles) are: 1. You have projects that use the same 3rd party dependency, but cannot use the same version. You need a requirements file for each, (and then specify which lockfile each of your source files is designed to work with) 2. You are in the "one lockfile" situation, but you need to actively prevent some projects from using certain dependencies. These projects need to be linked to their own lockfile that does not contain those dependencies. So in your case, I don't think you need
requirements-production.txt
if you don't want some particular experiment to use
numpy
(to choose a random example), then just don't import
numpy
, and then the build won't have
numpy
If you want your build to fail because someone accidentally imported
numpy
(or imported some common file that imports something that eventually imports
numpy
) then you might need to consider having a separate
requirements-production.txt
In other words, if its just a case of "speeding up builds" and "not worry[ing] about [unneeded] packages" then this should be the behavior you get by default with pants
b
I can give some further context. My pain point is that people, particularly those in the research org, keep adding dependencies to our single
requirements.txt
file. They need these dependencies to run experiments (that also interact with the production code). However, as they add more and more dependencies, it becomes harder to add/update dependencies in production - all because some random package used in a couple of jupyter notebooks uses it
I'd like to have a set of requirements used in production, which I closely track, and a different set of requirements for single use imports (maybe even stuff outside the lockfiles if possible?)
e
So the main goal is that you want to be able to add/upgrade production dependencies without dealing with some random one-off use that messes up the version constraints? You mentioned earlier that prod should be a subset of research. I don't know of any way to enforce that particular constraint, but otherwise, it sounds like a two lockfile approach is what you'll need. (I don't have a ton of experience with this, but bear with me.) I think you'll want to: • follow the "single lockfile" approach above and follow the pants docs to set it as your default lockfile • Create a second lockfile for prod (I don't think there's any way to subset or inherit or anything. Just have to copy) • All your production code needs to set the second lockfile as their resolve. (in python_sources) • Any common library type stuff that needs to work with both prod code and the regular code should use
resolve=parametrize(...)
to declare that it must work with both resolves. This will make pants lint/test/mypy/etc. these files twice, to make sure they work with both resolves
There will probably be some more hiccups along the way, but I think that's the rough idea you'll need to follow
b
Thanks @elegant-florist-94385! I'll give this a shot. Is there a reason not to make the production code the default resolve?
e
Since production is the more strict one, you probably want it to be explicitly opt-in. But you could probably go either way, as long as its clear to your team how to use them each in the intended ways
c
you can parametrise the
python_requirements.resolve
field for production, and then have other
python_requirements
for the/each research resolve. I think something like this should work:
Copy code
python_requirements(name="all", source="requirements.txt", resolve=parametrize("prod", "research"))
python_requirements(name="research", source="requirements_research.txt", resolve="research")
e
@careful-address-89803 Good thought! That would mean that the
research
resolve gets all the standard requirements, and`requirements_research.txt` only needs to specfiy requirements above and beyond those, correct?
c
Yep! I think that'll work. I'm not so sure about whether that will bring much gain, though. you have to relock the
research
resolve every time someone adds something or you update one of the core packages, so I think you might spend the same amount of time locking things...
b
We're already using resolves because of torch (we need to install different wheels on linux, macos, and gpu machines). Can we have two resolves together? Ie, -prod-linux, -prod-gpu, ...?