hey guys I m trying out the v2 version of pants for python a Pants #general

hey guys, I'm trying out the v2 version of pants f...

rapid-crayon-8232

04/15/2020, 3:02 PM

hey guys, I'm trying out the v2 version of pants for python, and I have a question (maybe stupid) about the dependencies resolution, so I've run

./pants test packages/::

from a cold repo, and when downloading the requirements in 3rdParty I noticed this:

rapid-crayon-8232

04/15/2020, 3:03 PM

does it mean that it is. using multiple processes to download the same thing ?

hundreds-father-404

04/15/2020, 3:09 PM

Not a dumb question at all! What’s going on is that each test target gets run independently, unlike V1 running everything sequentially. This means that if you have enough cores (or enough remote workers), you could run your entire test suite in total parallelism. It also means that the caching works better; if you just run

./pants test ::

, then only change one single test target and rerun

./pants test ::

, only that target will need to rerun and everything else will use the cache. The first step to running a test is to resolve all the 3rd party requirements used transitively by the test target. This is quite slow the first time you run it*, but it fortunately gets cached and will never need to be re-run unless you change the version of a used dependency or add new ones. It looks like it’s resolving the same requirements, but really those are all different combinations of your universe of requirements. One test might depend on A and B, while another depends on A, B, and C. If test targets depend on the same combination, they’ll reuse it, but if that combination has never been encountered yet, then it gets re-resolved *We’re actively trying to speed up the performance of this first-time resolve.

rapid-crayon-8232

04/15/2020, 3:44 PM

thanks, makes a lot of sense, in my case I have a dependency (detectron2) that is quite resource-intensive, and so trying to install it multiple times can easily overflow and end up with a timeout error

👍 1

rapid-crayon-8232

04/15/2020, 3:46 PM

maybe having a standalone install goal, that can be used when cache is cold be good idea ? in the V1 I used to run

./pants pyprep ::

for that

hundreds-father-404

04/15/2020, 3:47 PM

Bummer that it’s timing out, sorry about that negative experience! The main optimization we’re working on is to have a better cache for the resolve step in particular so that even if the overall combination is different, we can have a common cache so that you never need to download

detectron2

more than once. We’ve been considering things like a

./pants pyprep ::

step too.

hundreds-father-404

04/15/2020, 3:49 PM

Are things working properly for any tests without

detectron2

? You’ll want to run something more specific than

packages::

One cool feature to try out: the V2 test implementation works with precise file arguments. If you say

./pants test foo_test.py

, Pants will only run tests for that specific file, even if the owning target has other files in it

rapid-crayon-8232

04/15/2020, 3:54 PM

yes I've seen that and I was honestly very impressed, very cool feature. what I'm doing now is running a single test with detectron2 to warm the cache and retry the global test run, not ideal but I'm just testing for now 🙂 I'll also try installing detectron2 from wheel files to see if it can speedup stuff 🤞

💯 1

hundreds-father-404

04/15/2020, 3:57 PM

I’ll also try installing detectron2 from wheel files to see if it can speedup stuff

Wheels will definitively be faster. How are you getting this to happen? Under-the-hood, Pants is delegating to Pex to resolve dependencies, which then delegates to Pip. Pants will end up running something similar to

pip install detectron2>3.0 flask==2.8

etc, based off what Python requirements are in the transitive closure of your test target

hundreds-father-404

04/15/2020, 3:59 PM

yes I’ve seen that and I was honestly very impressed, very cool feature.

Awesome, glad you find it helpful! I almost never use address args anymore and always use file args because tab autocomplete is so much nicer FYI when using the V1 implementation, you can still use `./pants foo_test.py`; it will run over the entire target owning

foo_test.py

, rather than just that file (this used to be the option

--owner-of

)

rapid-crayon-8232

04/15/2020, 4:09 PM

Wheels will definitively be faster. How are you getting this to happen?

not sure yet, but i'll be testing intstalling directly from here https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html on linux to see how it goes, also the install of detectron2 took some time but it's done now, and got a new error 😅 :

Copy code

Failed to execute PEX file. Needed macosx_10_15_x86_64-cp-37-cp37m compatible dependencies for:
 1: immutables
    But this pex only contains:
      immutables-0.11-cp36-cp36m-macosx_10_13_x86_64.whl

error that I don't get with v1

rapid-crayon-8232

04/15/2020, 4:10 PM

maybe I need to specify immutable as a requirement too ?

hundreds-father-404

04/15/2020, 4:10 PM

Huh. Have you configured

--python-setup-interpreter-constraints

or set

compatibility

on any of the targets?

rapid-crayon-8232

04/15/2020, 4:10 PM

yep, configured globaly to python3.7

👍 1

hundreds-father-404

04/15/2020, 4:12 PM

maybe I need to specify immutable as a requirement too ?

Even better, you can use a constraints file. You’d create a file like

3rdparty/python/constraints.txt

(can be anywhere you want), which has a list of requirements just like a requirements.txt file. Then, in `pants.toml`:

Copy code

[python-setup]
requirement_constraints = "3rdparty/python/constraints.txt"

Constraint files are intended for when you don’t directly depend on the dep, but need to constrain it to get things working

hundreds-father-404

04/15/2020, 4:13 PM

yep, configured globaly to python3.7

Could you please copy the line you use in

pants.toml

(or

pants.ini

) to configure this?

hundreds-father-404

04/15/2020, 4:13 PM

Also, what Pants version are you on?

rapid-crayon-8232

04/15/2020, 4:18 PM

here is mu full pants.toml:

Copy code

[GLOBAL]
pants_version = "1.27.0.dev3"
v1 =  false  # Turn off the v1 execution engine.
v2 = true  # Enable the v2 execution engine.
v2_ui = true  # Enable the v2 execution engine's terminal-based UI.

backend_packages = []  # Deregister all v1 backends.

# List v2 backends here.
backend_packages2 = [
  'pants.backend.python',
  'pants.backend.python.lint.docformatter',
  'pants.backend.python.lint.black',
  'pants.backend.python.lint.flake8',
  'pants.backend.python.lint.isort',
]

# List v2 plugins here.
plugins2 = []

[source]
# The python source and test roots is the packages-python-pants root.
source_roots = """{
  'packages-python-pants': ('python',),
}"""
test_roots = """{
  'packages-python-pants': ('python',),
}"""


[python-setup]
# The default interpreter compatibility for code in this repo.
# Individual targets can override this with the `compatibility` field.
# For example, targets containing Python 23.7-only code can set `compatibility = "CPython==3.7"`,
# and targets containing code compatible with Python 2.7+ *and* Python 3.5+ can set
# `compatibility = CPython>=2.7,!=3.0,!=3.1,!=3.2,!=3.3,!=3.4`.
interpreter_constraints = "CPython>=3.7"

hundreds-father-404

04/15/2020, 4:21 PM

You’re running into https://github.com/pantsbuild/pants/issues/9509 😕 Change

interpreter_constraints

to be

["CPython>=3.7"]

so that it replaces the constraints, rather than appending to the default. That’s not great that our own example repo is doing the wrong thing..our bad. Reinforces that this was a misfeature and we’ll prioritize removing the misfeature

👍 1

rapid-crayon-8232

04/15/2020, 4:24 PM

yep, thank you it was exactly that

👍 1

rapid-crayon-8232

04/22/2020, 12:27 PM

hey @hundreds-father-404, sorry for coming back at this again, I have a question for resolving the transitive deps, it I run a first test that depend on dep A and B (the combination is cached), but If after I run a test that depends on A, B and C and if I understood your above answer correctly, it will not reuse the deps A and B already cached but re-resolve everything (A, B and C) ? also another question for adding the `install`/`pyprep` goal, is it a short term goal or more for the future ? it is for us now, the only main part blocking the move to V2, and is it possible that I can make a plugin for that in the meantime, if it's more of a long term goal ?

hundreds-father-404

04/22/2020, 3:16 PM

No need to apologize - happy to answer any questions about this or otherwise! Yes, your understanding is correct. We do that for a) better caching, meaning if your test doesn’t use

django

and you changed the

django

version, that won’t impact your test; and b) hermeticity, meaning that your test can’t start using symbols/libraries it doesn’t know about.

it will not reuse the deps A and B already cached but re-resolve everything

This is the biggest issue with the setup, and where our priorities are on fixing things. You’re correct that right now the resolve for A and B is not used to help resolve A, B, and C. Our priority is to start leveraging the Pex/Pip cache, but to do so in a safe way that still works with things like remote execution (running your tests in the cloud). Then, resolving A, B, and C would only need to redownload/rebuild wheels for the new dependencies from C.

for adding the install/pyprep goal, is it a short term goal or more for the future ?

It is not currently a short term goal, but I like your thinking as an immediate workaround for until we solve the above caching issue (which is tricky to get right and will be a few more weeks). At Toolchain, one of our engineers runs

./pants test ::

after upgrading any requirements so that he only pays the pain of resolves once, then tests are normal and fast after that. This doesn’t work well if your tests themselves take a long time, though. Do you have any longer tests, like integration tests? Perhaps we should add a

_prep-tests

goal temporarily.

rapid-crayon-8232

04/22/2020, 8:36 PM

thanks for the explication, my main worry is not having long tests (they are relatively short), but having to install one of the dependencies I have multiple times, this dep is used almost everywhere in the repo and it is installed directly from git, so it needs to be built then installed which takes some time, the dep is installed using Pep 517/518, so each time it will download nearly 1Go of necessary deps for it build, so do this 10 times in parallel and it can easily explode. Looking back at it now, I seem to have the worst-case scenario for the current pants caching system 😅 😂 is there also a way to increase the 900s timeout ? maybe we can brute force it if we wait long enough. These question are mostly for our CI-CD Pipeline, for the V1 right now, we have two stages: • `pyprep`: install all deps and cache

pants.d

.cache

• `test`: pull cache + run tests

👍 1

happy-kitchen-89482

04/22/2020, 8:39 PM

Interesting, and good to know about this use-case.

happy-kitchen-89482

04/22/2020, 8:41 PM

We are looking at a few solutions.

happy-kitchen-89482

04/22/2020, 8:42 PM

One is to do incremental resolves that use the results of previous resolves. But that still means having many separate resolves (huge ones in your case), even if they are much faster.

👍 2

happy-kitchen-89482

04/22/2020, 8:43 PM

Another option, might be the best one in your case, is to have a single big resolve shared by all the tests, even if some of them don't need everything in that resolve.

👍 1

happy-kitchen-89482

04/22/2020, 8:44 PM

For example, resolving your entire

requirements.txt

once and having everything use that. Every requirement any test needs will be in there, but so will a lot of other things not needed by a given test.

happy-kitchen-89482

04/22/2020, 8:44 PM

Does that make sense?

happy-kitchen-89482

04/22/2020, 8:44 PM

Do you have a single, global

requirements.txt

hundreds-father-404

04/22/2020, 8:51 PM

to have a single big resolve shared by all the tests,

This would be a bit tricky to implement. For example, Pants allows you to have inline Python requirements via the target type

python_requirement_library

. Sometimes, that’s needed to clarify that this specific test should use a custom version of dependency A, rather than the one in your

requirements.txt

. In this case, we couldn’t really have a single big resolve because you want to use your inlined version of A, not the global one. Another complication is multiple interpreter constraints, if some targets use Py2 and some Py3, for example. But, I’m sure we can figure out a way to get this working for most cases, like if you only have a single

requirements.txt

and only use one Python version. We’re trying to determine if this a) would actually be usable by some users or if it requires too many assumptions, b) how much of a performance gain it would give relative to the complexity of it all.

rapid-crayon-8232

04/22/2020, 9:28 PM

Do you have a single, global
requirements.txt
?

yes we have a single global

requirements.txt

, and I don't see that changing for the foreseeable future

resolving your entire
requirements.txt
once and having everything use that

yes, that makes sense and could indeed speed up the dependency resolving part, maybe instead of just resolving a

requirements.txt

, resolve any instance of

python_requirement_library

in the install path given, for example:

./pants install packages/foo/::

will resolve all instances of

python_requirement_library

packages/foo/**/*

👍 1

hundreds-father-404

04/22/2020, 9:37 PM

And do you only use one python version?

rapid-crayon-8232

04/22/2020, 9:58 PM

yep only one version

👍 1

6 Views

Open in Slack

Previous Next