I m profiling some pex executions It looks like about 75 of Pants #pex

I’m profiling some pex executions… It looks like a...

average-vr-56795

03/23/2020, 2:00 PM

I’m profiling some pex executions… It looks like about 75% of the overhead of running a pex right now is importing things from

pex.third_party

- does this sound about right, and do people have thoughts on how to improve that?

👀 1

enough-analyst-54434

03/23/2020, 2:28 PM

Not sure if that's right but could be. There are two interesting questions there: 1. Is the PEP-302 import hook stuff too slow. I.E.: is the vendoring approach taken inherently slow as a result. 2. Is the set of things being imported too large. I.E: Is

pkg_resources

use hurting us. Its widely observed that

pkg_resources

via its global

working_set

variable, scans the full classpath and does alot of often un-needed work that can be slow.

average-vr-56795

03/23/2020, 2:29 PM

If the latter, how awful would it be to re-implement what we use from

pkg_resources

enough-analyst-54434

03/23/2020, 2:29 PM

Not sure.

enough-analyst-54434

03/23/2020, 2:29 PM

Its not light stuff, it goes into the core of PEXEnvironment.

enough-analyst-54434

03/23/2020, 2:30 PM

I suspect not bad though.

enough-analyst-54434

03/23/2020, 2:31 PM

Perhaps the quick experiment with the latter is to hack up the vendored copy of pkg_resources to not have the global

working_set

variable. Pex does not use it.

👍 1

average-vr-56795

03/23/2020, 5:07 PM

So far I’m at: Replacing https://github.com/pantsbuild/pex/blob/b6681fbafe30b36b40349f4869bada4ff757f152/pex/pex_builder.py#L54-L55 with: printing a line and exiting: 144ms

import pkg_resources

and exiting: 214ms

import pex.third_party.pkg_resources

and exiting: 303ms

average-vr-56795

03/23/2020, 5:07 PM

pkg_resources

appears to a bunch of expensive initialisation, e.g. compiling a bunch of large-ish regexes, on import, which we then don’t actually use when running a pex.

average-vr-56795

03/23/2020, 5:08 PM

The vendored importer appears to also be pretty expensive

enough-analyst-54434

03/23/2020, 5:09 PM

Have you compared the vendored importer to importing the vendored code via standard imports? That's the useful difference. We can't avoid the imports unless we jettison the code we import, we can import it faster if the 302 adds too much overhead over the standard import mechanism.

enough-analyst-54434

03/23/2020, 5:13 PM

More broadly, at runtime pex has to do alot of calculation to set up an isolated venv. If we hashed more things about the PythonInterpreter selected, a more generic warm run win might be to cache the calculations against the cache interpreter. On a re-run with the same interpreter, the needed modifications to sys.path would be read off disk as already known.

average-vr-56795

03/23/2020, 5:15 PM

It looks like importing the vendored code from disk is comparable to using the vendored importer

enough-analyst-54434

03/23/2020, 5:15 PM

That's what I guessed. Python dogfoods this 302 mechanism on modern Pythons.

average-vr-56795

03/23/2020, 5:15 PM

Interestingly, it looks like it ends up getting imported twice… I wonder why…

enough-analyst-54434

03/23/2020, 5:15 PM

I'd dig into the 2x proof a bit 1st.

average-vr-56795

03/23/2020, 5:16 PM

I just added a print statement to the top of our vendored

pkg_resources/__init__.py

to verify the right thing was being imported, and when I run a pex file I get that line printed twice

average-vr-56795

03/23/2020, 5:17 PM

That’s not expected, right?

average-vr-56795

03/23/2020, 5:19 PM

Aah I’ll keep digging…

enough-analyst-54434

03/23/2020, 5:20 PM

It is expected across multiple processes. Pex ~dogfoods itself in 2 places: 1. To identify interpreters: https://github.com/pantsbuild/pex/blob/b6681fbafe30b36b40349f4869bada4ff757f152/pex/interpreter.py#L354-L379 2. To run pip: https://github.com/pantsbuild/pex/blob/b6681fbafe30b36b40349f4869bada4ff757f152/pex/pip.py#L82-L98

average-vr-56795

03/23/2020, 5:21 PM

If the no-op cost of running

pex

is of the order of 150ms, that feels like pretty expensive reuse…

enough-analyst-54434

03/23/2020, 5:21 PM

I'd be interested in the no-op cost of 1.6

enough-analyst-54434

03/23/2020, 5:21 PM

Iff its significantly better, I agree. If not, I don't

enough-analyst-54434

03/23/2020, 5:22 PM

And its not a wonton re-use. Both cases fix isolation bugs, so some careful substitute is needed.

average-vr-56795

03/23/2020, 5:23 PM

I’m seeing about 120ms for 1.6.12, so comparable

enough-analyst-54434

03/23/2020, 5:23 PM

And note 1 and 2 above use the pex cache. Its only a hit on run 1 of a pex on a machine (if the pex cache (PEX_ROOT) is not disabled or ephemeral).

average-vr-56795

03/23/2020, 5:28 PM

To make sure my rough feel for what running a no-deps pexfile does compared to just running a file in a python interpreter; I think the only things it additionally does are (at a high level): • Probe interpreters, select one • Munge sys.path to isolate from system packages Is that about right, or are there extra important things?

enough-analyst-54434

03/23/2020, 5:31 PM

No, just those two broad steps.

enough-analyst-54434

03/23/2020, 5:32 PM

1 is cached so it should only be a cold run hit. 2 is not cached at all.

enough-analyst-54434

03/23/2020, 5:32 PM

2 uses pkg_resources.

average-vr-56795

03/23/2020, 5:32 PM

Cool, thanks 🙂 I will keep poking

Open in Slack

Previous Next