v2 `@rule` resolution appears to be causing a seri...
# development
h
v2
@rule
resolution appears to be causing a serious startup performance hit at import time. Any thoughts appreciated on https://github.com/pantsbuild/pants/issues/7442
a
Wow, that’s impressive and interesting and scary all at once
❗ 1
a
commented, "rule resolution" doesn't happen at import time and i believe this is an
ast
issue
@hundreds-father-404 if i'm seeing
snakeviz: error: the file /Users/dmcclanahan/tools/pants/whole.prof is not a valid profile
and/or
ValueError: unknown record type in log file
from
gprof2dot
is it immediately clear what i'm doing wrong? i tried switching
python
on my path to point to python 3 but that didn't change the output
if not i'll figure it out
w
wrong python, yea.
h
Apply the diff at the top of issue, and then make sure you
pip3 install tuna
rather than
pip
w
i have two
venvs
with the different pythons for snakeviz
a
i should just use pex
w
or that, yea.
a
and eric i'm trying to use the pex profiler to wrap the import line specifically to get a flamegraph for that call
h
Ah, that I have no idea how to do. Profiling in general is completely new to me as of last night hah
a
i just remember from trying to investigate the globs matching stuff when i was stuck in bed last year
pip3 install tuna
worked, pex was yelling about encodings
πŸ‘ 1
w
i do roughly:
virtualenv -p /usr/local/bin/python3 ~/venvs/snakeviz-py3
, then activate and
pip install snakeviz
. then, without the venv activated, you can do
~/venvs/snakeviz-py3/bin/snakeviz $prof
so it's a one-time thing to activate/deactivate
(via
source ~/venvs/snakeviz-py3/bin/activate
, and exiting the shell, respectively)
would recommend against actually pip installing things on your machine
πŸ‘ 1
a
i do not understand the output for some reason as it shows that import taking "30 seconds" when the entire run was 7 seconds
but the deepest call stack does go into the yacc parser
false alarm, that's in the py zipkin import (and that says it's 0.2%)
e
Did folks see my issue update - PYTHONPROFILEIMPORTTIME=1 is magic
πŸ”₯ 2
a
i will now
e
I think e have enough info to implicate @rule
πŸ‘ 1
a
i would definitely think that is the case since we do nontrivial ast nonsense at import time, but i would strongly suspect the performance difference is due to some specific operation we're doing there. if we need to lazily traverse the ast we can do that if that is the cause of the issue, but if the issue is that
ast.parse()
is slower that would be more annoying
w
Right. They've always been eagerly evaluated for that parsing/validation
Could probably microbenchmark it
a
also, the traversal can be optimized, i explicitly didn't do that when i added the
yield Get(...)
check and we can prune the branches a lot
the vast majority of the runtime is in
inspect.getmodule
and other methods which i think means we can optimize this if need be (and is a great place to look for a py3 perf difference) -- i will post a profile after this lunch
Screen Shot 2019-03-27 at 11.48.41 (2).png
Screen Shot 2019-03-27 at 11.47.01 (2).png
this is just with
Copy code
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca1..7510e6e 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -66,7 +66,7 @@ class PantsLoader(object):
         Fix it by setting the LC_* and LANG environment settings. Example:
           LC_ALL=en_US.UTF-8
           LANG=en_US.UTF-8
-        Or, bypass it by setting the below environment variable.
+        Or, bypass it by setting the below environment variable.
           {}=1
         Note: we cannot guarantee consistent behavior with this bypass enabled.
         """.format(encoding, cls.ENCODING_IGNORE_ENV_VAR)
@@ -101,7 +101,9 @@ class PantsLoader(object):
   def load_and_execute(entrypoint):
     assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
     module_path, func_name = entrypoint.split(':', 1)
-    module = importlib.import_module(module_path)
+    from pants.util.contextutil import maybe_profiled
+    with maybe_profiled('wow.prof'):
+      module = importlib.import_module(module_path)
     entrypoint_main = getattr(module, func_name)
     assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
     entrypoint_main()
forget the first part of the diff i am rushing
inspect.stack()
might not be necessary
66% of runtime during import is in the
@rule
decorator (2.06 seconds in my run)
πŸ‘ 1
Screen Shot 2019-03-27 at 14.38.53 (2).png
and
inspect.stack()
seems to take all the runtime, it calls tons of other things, but calling
inspect.getsource()
from the decorator itself is only 0.87% (and we suspected that might be culprit). it might be cached by python since
inspect.stack()
is the first
inspect
method used in that decorator
e
In that case - dispensing with the stack and just ast-ing file may be a big win
πŸ‘ 2
a
you have dispensed with maybe 30 minutes of searching, working on it
e
Thanks for bringing this one home Danny.
❀️ 1
a
thank me once my backend doesn't slow the rest of the thing down
removing
inspect.stack()
shaved 3 seconds off and i cannot find where rule parsing contributes to import time in the profile i have set up of that import line
seeing if this passes tests now but we don't need the stack here and i think this solution is more correct anyway
❀️ 1
e
Excllent
a
everything mysteriously seems to work except the error message which is expected and i am fixing
πŸ’― 1
πŸ”₯ 1
h
Does this also improve performance for
./pants2 -V
?
a
i will include that number in the PR, i suspect so
πŸ‘ 1
getting the whole stack seems like an expensive operation in any language
h
Cool. My impression was this hurt us with both versions, but even more so in Py3 for some unknown reason
a
there is a negligible performance increase in py2
πŸ‘ 1
however this change also brings py3 performance up to py2
❀️ 1
so
inspect.stack()
is like very very slow in py3 like this is p clear at least with our access pattern (which wasn't too weird)
πŸ‘ 1
probably worth a python bug unless it's known
h
That’d be a very cool open source contribution @aloof-angle-91616! I’d be happy to help you edit a writeup if you choose to do so
a
i would love that
going to put it as a TODO in the PR description then make it
h
I wonder if the issue is related to the changed return type in 3.5? https://docs.python.org/3/library/inspect.html#inspect.stack
e
Copy code
$ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done
python2.7: 2.63740515709
python3.4: 3.8067204039980425
python3.5: 4.471806422996451
python3.6: 4.527370049007004
python3.7: 3.3335486440046225
πŸ‘ 1
w
probably proportional to depth
e
Yup
w
proportional/exponential, perhaps
a
i added this to the PR description but there is an answer here that may explain the issue: https://stackoverflow.com/a/17407257/2518889
πŸ‘ 1