Hi all! I want to ask if you have any suggestions ...
# general
a
Hi all! I want to ask if you have any suggestions for resolving conflicting dependencies. It is a major pain point as our repo grows larger. Sometimes it's difficult to upgrade or add Python dependencies due to conflicts. One idea we've had is to try and only pin the direct dependencies. Also cross-platform (we develop on macOS but everything in the cloud is linux) only makes it worse.
h
Are you referring to third-party dependencies?
1
As specified in requirements.txt for example?
a
yep!
h
"Correctly" managing dependencies is a rather subjective topic. I'm not sure what the consensus is for a monorepo (where you intentionally want to have segregated projects) because most tools expect to maintain a singular
requirements.txt
file.
1
There's some discussion on this problem in the pantsbuild docs already here
My company currently uses https://github.com/jazzband/pip-tools to generate constraints files that are then fed into Pants as a constraint. The idea is to not do dependency resolution yourself in
requirements.txt
.
1
h
Thanks for linking to that Nathanael. I recommend reading the 2.10 docs instead *of the Pants 2.9 docs. They were recently rewritten https://www.pantsbuild.org/v2.10/docs/python-third-party-dependencies#lockfiles. Pants 2.10 now supports multiple lockfiles, which can be a solution when you have multiple conflicting versions of the same requirement, like some code needing to use Tensorflow 1.1 and others using 1.8 It's often desirable to use a single lockfile—simplicity and consistency. But it can be helpful to have multiple lockfiles
h
There are a couple reasons for this 1. It's a hard problem and you're likely to get it wrong. When you do get it wrong, pip usually won't care and will happily introduce violations. The challenge in this problem is that most dependencies have their own transitive dependencies which can make it hard to know which ranges to put in the requirements file. 2. There's no guarantee that every developer will be using the same version of dependencies. The only way to guarantee that is to write
==
constraints which will lead you right back to problem 1.
1
Tools like
pip-tools
,
poetry
, etc. allow you to maintain a looser
requirements.txt
while buying back repeatability and consistency easily.
h
PS Eric means "the 2.10 docs instead of the 2.9 docs" not "instead of the pip-tools link" 😉
👍 1
l
@ambitious-student-81104 What's an example of a conflict you have? I ask because we're currently using pip with the legacy-resolver so that we can live with conflicting dependencies and trust our automated tests to find issues (obviously not 100% foolproof and still a bit risky). Our main conflict today is that we use Flask 1 and Celery 5 and DBT 1, and Flask 1 wants Click < 8 while the latter two want Click >= 8. Bumping Flask 1 up to Flask 2 introduced what seems like a worse conflict on Jinja2 between Flask and DBT. So we regrettably forked Flask 1 and made a branch in our fork for a Flask 1.1.4.1 that allows Click 8, and all seems fine. This makes me sad... but it does work around the problem (our risk now is that Click 8 somehow breaks Flask 1.1.4.1, but that so far has not proven to be the case)
a
Catching up on this. @hundreds-father-404 I've previously seen a solution in pants documentation saying just leave the conflicting dependency out of the lock file. what is the essential difference between that and having multiple lockfiles? and if i am moving into multiple-lockfile land, will the lockfile generation command refresh all the lockfiles I have at the same time? how do I prevent our repo's lockfiles from having unnecessary divergences? also, we've settled internally on allowing applications to have conflicting dependencies, but libraries should not have conflicting dependencies; is there a way for us to tag targets in pants to distinguish between libraries and applications?
h
I've previously seen a solution in pants documentation saying just leave the conflicting dependency out of the lock file. what is the essential difference between that and having multiple lockfiles?
Leaving it out of the lock (specifically constraints.txt) was a very very imperfect workaround. It meant that a) not all your deps would be pinned, which is bad for consistency and reproducibility, and b) you would have to re-install things rather than reusing the prior resolve, so slower perf. Now that multiple lockfiles is supported, that is the blessed way to handle conflicting dependencies FYI this blog post gets a little more into the theory of multiple lockfiles https://blog.pantsbuild.org/multiple-lockfiles-python/
will the lockfile generation command refresh all the lockfiles I have at the same time?
./pants generate-lockfiles
generates all, and you can do
./pants generate-lockfiles --resolve=my-resolve1
for example to instead only regenerate that "resolve" (aka name for a particular lockfile)
is there a way for us to tag targets in pants to distinguish between libraries and applications?
Maybe the
tags
field? But read that blog and the docs at https://www.pantsbuild.org/docs/python-third-party-dependencies#multiple-lockfiles for more on this. Note that when an "app" wants to use "library" code, that library code must share the same resolve (lockfile), So, you'll sometimes have it set up that a particular library file(s) can work with multiple resolves. The blog and docs mention how that works
a
thanks a lot @hundreds-father-404! I see that 2.10 is now both
beta
and
stable
, that is a little confusing; is it stable stable now?
h
Ah it's stable, didn't update the docs properly. Good catch! Fixed
1
a
my
./pants generate-lockfiles
failed on an internal dependency we have 😞 is internal dependency supported now?
h
It should be? How do you depend on it, is it a published wheel in an internal artifactory? And what is the error you're getting?
a
a published wheel in an internal artifactory?
yes
how do we let pants resolve know that some of our deps are from internal artifactory? the extra index url is already in the
requirements.txt
error is
Copy code
Resolving dependencies...
SolverProblemError
 Because pants-lockfile-generation depends on xxx(==0.2.0) which doesn't match any versions, version solving failed.
looks like this has long been fixed.. https://github.com/pantsbuild/pants/issues/8971
i'm still having this issue tho. slack-ing you in the coinbase slack @happy-kitchen-89482
h
Yeah, see https://www.pantsbuild.org/docs/python-third-party-dependencies#generate-lockfiles-goal-vs-manual-lockfile-generation. Unfortunately in 2.10 the
genereate-lockfiles
goal does not work with
[python-repos]
. Fixed in Pants 2.11 by using Pex to generate lockfiles In the meantime, those docs mention some methods to manually generate the lockfiles, similar to what you had with
generate_constraints.txt
h
how do we let pants resolve know that some of our deps are from internal artifactory? the extra index url is already in the requirements.txt
Have you configured the extra index URL in
[python-repos].indexes
?
h
That wouldn't help with
generate-lockfiles
in 2.10
a
A few questions: 1. while we are on 2.10, can we still use multiple lockfiles + private artifactory dependencies? 2. how soon will 2.11 become stable? 3. would you recommend we just upgrade from 2.9 to 2.11 now?
h
1. Yes, you can, only that the lockfiles will need to be manually generated. Note that
[python-repos]
works when installing the lockfiles, only not when generating them 2. We already did 2.11.0rc0, which means we are now focused on getting out a stable release. I would expect it to probably be at least 2-3 weeks for the stable release? Changing to Pex doing lockfile generation has a ton of benefits but will continue to require more testing because the Python ecosystem is so vast 3. I'm tempted to say yes because a) you are using
[python-repos]
and b) you have "overlapping resolves" like where
utils.py
should work with multiple resolves, which is much nicer in Pants 2.11. I think I'd recommend at least waiting till 2.11.0rc1 though, there were some bug reports that we've been fixing. Hopefully we'll do that tomorrow
a
And - we are currently on 2.9, generating one
constraints.txt
file manually, and have internal artifactory dependencies. We want to move to support conflicting dependencies. What is the best course of action?
Note that
[python-repos]
works when installing the lockfiles, only not when generating them
trying to understand what this means for my users. currently since we generate the
constraints.txt
manually, people also install it manually with
pip install -r constraints.txt
. How will the new installation work?
h
How will the new installation work?
Folks can still do
pip install -r constraints.txt
same as before; note that that's not necessary for Pants to do at all though. Pants installs the constraints.txt file in its own way
What is the best course of action?
I think I recommend you use 2.10.0 given that it's stable. Specifically, you'll use the script at https://www.pantsbuild.org/docs/python-third-party-dependencies#manual-lockfile-generation-techniques Note the comments in the script about multiple resolves/lockfiles. Once you have multiple resolves set up, you will need to make a slight adjustment to the script so that the script generates for the correct resolve We will hopefully be able to stabilize 2.11 soon so you'll be able to get rid of the manual lockfile generation
a
thanks. So do I still generate my constraints.txt file manually? but now i will have multiple lockfiles, how does that work?
ok i think the doc answers it
h
Yes, see the script I linked to in the message above. The script is written that you run it one time per distinct resolve/lockfile I recommend re-reading the whole section on "Lockfiles" in those docs. It explains what options to enable and how to set this up
1
a
irrelevant question - why
pip install + freeze
rather than just using
python -m piptools compile
?
h
pip-compile
is valid too. the chart at https://www.pantsbuild.org/docs/python-third-party-dependencies#manual-lockfile-generation-techniques mentions it Feel free to modify your version of the script
manually_generate_user_lockfile.sh
to use pip-compile instead, that's a good improvement. It's especially good that it would let you use
--generate-hashes
, which reduces supply chain attack risk
1
(if it's not clear, we really hate that manual lockfile generation is still a thing in Pants 2.10. We spent a lot of time considering whether to push off lockfile support until Pants 2.11 because Pex wasn't ready yet for its lockfile generation. We decided to go ahead—even with the manual lockfile generation issues—because it's forward progress)
a
so, if i use 2.11, do I still need to do manual lockfile generation? it turns out to be a bit too much work than I can put in my plate rn, especially if we can upgrade to 2.11 in a month or so
h
you should not need to do manual lockfile generation with 2.11 because you can use Pex to generate, and Pex understands
[python-repos]
a
the doc of v2.11 still says
Does not support
[python-repos]
if you have a custom index or repository other than PyPI.
h
Ah I have not yet updated 2.11 docs
a
let me use 2.11 and try...
h
You will want to set
[python].lockfile_generator = 'pex'
-- only I can't guarantee the Pex lockfile generation will actually work because there may be bugs we don't yet know about
a
thanks. I'm suddenly getting
Copy code
17:09:57.54 [ERROR] 1 Exception encountered:

  MappingError: Failed to parse ./BUILD:
Targets in root-level BUILD files must be named explicitly.
added a name and it seems happy
h
Ah, this is because
[GLOBAL].use_deprecated_python_macros
now defaults to
False
in 2.10. You can for now set it to
true
explicitly, at least temporarily while you're trying this out
i gtg to get an MRI rip, but will check back in this evening or tomorrow morning. Benjy might be around to help too
a
thanks Eric!
Hi Eric, I'm getting
Copy code
NoCompatibleResolveException: The resolve chosen for the root targets was airflow-dags, but some of their dependencies are not compatible with that resolve:
when I already set [python].
default_resolve = "pynest-default"
why is this the case?
h
Are you expecting the "root targets" listed in the message to be pynest-default or to be airflow-dags?
a
pynest-default
, since that is my
default_resolve
h
are you setting the field
resolve=
for the root targets mentioned in the error message?
a
no i'm not setting it
h
Hm, strange. To double check, can you please run
./pants peek path/to:bad_tgt
? Where you use the address of one of the root targets from the error message
a
i'm adding
resolve="pynest-default"
into the root target now, seems to give me a different error (moving on)
if it happens again i'll try
./pants peek path/to:bad_tgt
h
hm
[python].default_resolve
should definitely be kicking in, and that's a bug if it isn't. Could you please try running
./pants help-advanced python
and make sure the option is being set properly? It should print what the current value is
a
sure
seems like it's not happening again. maybe it's because i had some other dependency errors
now i'm getting
Copy code
ResolveError: The file or directory 'xxx' does not exist on disk in the workspace, so the address 'xxx' cannot be resolved.
how do I know which target is actually encountering this error?
h
Yeah that's an unfortunate error that it doesn't give more context: I recommend grepping for
xxx
in your BUILD files 🤷 sometime opened a ticket to improve it, which I agree we need to do
But to double check, is
[python].default_resolve
now working how you'd expect? Meaning that
./pants help-advanced python
shows the right thing, and you don't have to explicitly set the
resolve
field for that one target?
a
Copy code
--python-default-resolve=<str>
  PANTS_PYTHON_DEFAULT_RESOLVE
  default_resolve
      default: python-default
      current value: pynest-default (from pants.toml)
      The default value used for the `resolve` field.

      The name must be defined as a resolve in `[python].resolves`.
yes
👍 1
i'm at the end of my wits with
ResolveError
though
there are 81 matches in 20+ BUILD files
i checked everyone and none of them looks suspicious to me
do i need to add
name=reqs
?
this is what the BUILD at
3rdparty/xxx
looks like rn:
Copy code
python_requirements(
    module_mapping={
        "future": ["concurrent"],
    },
    overrides={
        "omniduct": {
            "dependencies": [
                "//:setuptools",
                "xxx#snowflake-sqlalchemy",
            ]
        }
    },
)
h
oh is this a 3rd-party requirement coming from something like a
python_requirements
target generator? How did you end up approaching that from earlier in the day? What is
[GLOBAL].use_deprecated_python_macros
set to?
a
yes this is a 3rdparty requirement target
use_deprecated_python_macros
is now unset, i'm on 2.11
i decided to go with 2.11, and no longer use deprecated macros, because it generates annoying warning messages, will be bad for UX
is
python_requirements
deprecated as a whole? What should I be using instead?
h
Did you follow the deprecation instructions, like running
./pants update-build-files --fix-python-macros
already?
a
yep, I did
ok, something is suspicious, I set
use_deprecated_python_macros = true
, and ran
./pants update-build-files --fix-python-macros
. it gave me:
Copy code
MappingError: Failed to parse ./3rdparty/ml_models_common/statsmodels/BUILD:
__call__() got an unexpected keyword argument 'name'
but all there is in this BUILD is
python_requirements(name="reqs")
h
Yeah, When
use_deprecated_python_macros = true
is set to true, it uses the old macro system where it's an error to have the
name=
field I imagine this is confusing doing both the macro deprecation at the same time as setting up multiple resolves. It might be helpful if you take a step back and completely do the macro change before touching multiple resolves at first. Specifically something like 1. git stash the work you've done 2. go back to the last thing that was working, like
main
3. upgrade to 2.10.0 and follow the instructions exactly for the deprecation of the old macro system, including running
update-build-files --fix-python-macros
& checking the logs if there are any manual changes you need to make 4. Run
./pants dependencies ::
to make sure that you are totally good to go. No ResolveError. Commit this all Then, go back to adding multiple resolves
1
a
ok, on step 3 now
❤️ 1
Looks like the upgrade to 2.10 is successful; CI is passing. Will do 2.11 multi-lock tomorrow. thanks for your help!
❤️ 1
h
Great progress!
h
Btw 2.10 does not require you to switch to multiple lockfiles - constraints file still works. So it might be a good idea to land the 2.10 upgrade. It may make the 2.11 upgrade easier, e.g. if you have to revert the change
a
Btw 2.10 does not require you to switch to multiple lockfiles
we are hitting hard dependency conflicts, we need multi lock
We have just landed the 2.10 upgrade. I'm working rn to land 2.11 with multi locks.
i'm starting to see that we will have many resolves... each application can have its own resolve; I'm concerned about that. For example, some of our airflow DAGs have incompatible dependencies with each other, so each DAG is going to become its own resolve. 😞
Question: if a BUILD file's
pthon_sources()
has
resolve=xxx
, do all the BUILD files under it also have
resolve=xxx
automatically? Or do I need to manually add ``resolve=xxx`` in all the BUILD targets under a directory?
also
./pants peek
is not telling me which
resolve
this target is in
Is it OK to have resolve structures like this? (subdirectory indicates this resolve depends on the parent resolve)
Copy code
Repo
   - Lib 1
     - App 1-1
     - App 1-2
where
App 1-1
and
App 1-2
conflicts with each other?
oh, i realized that resolves do not have structure - they are just parallel universes. so we really should limit the number of resolves... btw how do I use
parametrize
to add a target to multiple resolves?
1
👍 1
h
If you're using 2.11, then you use
parametrize
like this:
Copy code
python_sources(
   resolve=parametrize("resolve-a", "resolve-b"),
)
If you're using Pants 2.10, then you have to manually create two targets for the same file, like this:
Copy code
python_sources(
   name="lib_resolve-a",
   resolve="resolve-a",
)

python_sources(
    name="lib_resolve-b",
    resolve="resolve-b",
)
Yeah, it's generally desirable to have fewer resolves
Re
peek
, you're on 2.11 right? Try doing
./pants peek path/to/file.py
- the
resolve
field only shows up on the generated
python_source
targets, not the
python_sources
target generators
a
Thanks! Yes I tried
resolve=parametrize("resolve-a", "resolve-b"),
and it worked, however it only works on some targets, not all. What is the list of targets
resolve
works on? i'm 80% done with experimenting with a migration into the 2.11 + multi-lock world. And here are my observations so far, I want to check with you to see if they make sense, and what suggestions you have: 1. multi-lock is not a solution to python dep conflicts; it is one of the alternatives, and there are tradeoffs between alternatives, none is a magical solution. hard conflicts are hard conflicts, at the end of the day. They need to be either resolved, or left to not interfere with each other. Even Google who copies all deps into their repo needs to leave some as target-specific deps when they have multiple conflicting versions. 2. multi-lock is very costly, not just to migrate to, but to maintain on an ongoing basis, largely because as parallel universes (lmk if this analogy is not accurate), they are intrinsically expensive to maintain; this means you really can't afford to have too many resolve going at the same time, so you need to put on guardrails and prevent people from just creating a dozen of resolves. So at the end of the day, you're still gonna need to resolve dep conflicts, and multi-lock is just easing/delaying it, not magically solving it. 3. we are thinking that we can learn from Google but need to be a bit more flexible. Maybe still do single lockfile, and do something like this when one introduces a new dep that creates conflicts with the lockfile: a. first, try to resolve your dep conflict; b. if that's too difficult, you can copy the part of 3rdparty code you need into our monorepo (note: unlike Google, we don't default to this way of doing things, only resort to it), and we can build it and push to our artifactory and treat it as an internal dep, and we can sort out its transitive deps much better; c. if b isn't feasible either, then leave that dep out of the lockfile, create a local requirements/constraints.txt and let the specific target that needs it depend on that requirement. i. because this can cause repo divergence, we should detect these changes in CI, and require repo owner approval on such PRs.