Upgraded from some old 1.3 beta to 1.7 recently, a...
# general
f
Upgraded from some old 1.3 beta to 1.7 recently, and discovered that we had to run
./pants lint
explicitly to run lint checks. but it seems like the checks are taking a long time (like 27 minutes) to run. output has like:
Copy code
02:20:31 00:07     [python-eval]
                   Invalidated 120 targets.........................................................................................................................
02:47:37 27:13     [pythonstyle]
                   Invalidated 160 targets.
h
Does it take that long if you run
pants lint
multiple times in a row? The first time should take longer than normal because it invalidates all targets (idk about 30 minutes tho)
f
you mean multiple times in a row without changing the source code?
h
Yes, exactly
f
I’m pretty sure in that case the cache will make it fast, but the ‘cache miss’ use-case here (“I’ve changed the code, does it still pass lint?“) is kind of the main one, no? or are you asking just to diagnosis if it’s in fact invalidation that’s taking a long time
h
Mainly asking to diagnose. For example, if this is the first time you ran the command since upgrading, I suspect every target would be invalidated so it would take a long time. Usually when you’re programming, you likely would be only changing a few files, so that wouldn’t take as long as everything being invalidated. (also disclosure I’m a contributor, not maintainer, so not 100% sure what’s normal and what isn’t)
f
oh, no, this definitely isn’t the first time since we’ve upgraded, it’s regularly this slow in our CI script
that said I’m not 100% sure that the cached state is being carried over from CI run to CI run
where would that state be cached?
h
Oh hm this is on CI? Can you reproduce locally? For most CI I’ve seen in projects state gets wiped every run. Not sure how yours is set up
f
we keep some state in between runs, not sure if where pants caches this info is getting picked up as part of that
(trying to reproduce locally but ran into another issue: https://pantsbuild.slack.com/archives/C046T6T9U/p1535064816000100)
(either way though, 30 minutes even from a fresh start seems way too long, and it seems like it’s mostly getting spent internally for pants to clear its cache, not doing the actual linting)
ok, finally able to get lints to run locally; a 2nd run w/o changing source code is instant
but I’m not sure what this is testing… I think pants is basically caching the fact that the lint was run and doing nothing else
e
Can you confirm the paths shown when running
./pants options --scope=python-setup
are not blown away between CI runs?
f
I’ll double-check, but yes, interpreter_cache_dir and resolver_cache_dir should be carried over in between runs
e
If so, and your requirements aren't changing constantly this is decidedly odd.
f
worth mentioning: we also run all tests, which seems to run faster (we shard the tests 4 ways, but even so, most of the time is being spent running pytest, not doing pants pre-work… let me get some concrete numbers)
so the time between the start of the
[test]
step and the
[test] [pytest] [run]
step is about 2 seconds. the time between the start of
[lint]
and the start of
[lint] [python-eval]
is about 1 second, and then about 27 minutes goes by in between the start of
[lint] [python-eval]
and the start of
[lint] [pythonstyle]
, and then other 16 seconds from then to
[complete]
e.g., these are both from the same
./pants
invocation
Copy code
00:00:49 01:42   [test]
00:00:49 01:42     [test-jvm-prep-command]
00:00:49 01:42       [jvm_prep_command]
00:00:49 01:42     [test-prep-command]
00:00:49 01:42     [test]
00:00:49 01:42     [pytest-prep]
                   Invalidated 290 targets.
00:00:51 01:44     [pytest]
                   Invalidated 42 targets.
00:00:51 01:44       [run]
Copy code
00:05:56 06:49   [lint]
00:05:56 06:49     [scalafix]
00:05:56 06:49     [scalafmt]
00:05:56 06:49     [scalastyle]
00:05:56 06:49     [checkstyle]
00:05:57 06:50     [javascriptstyle]
00:05:57 06:50     [python-eval]
                   Invalidated 120 targets.........................................................................................................................
00:33:21 34:14     [pythonstyle]
                   Invalidated 160 targets.
e
The key data will be confirmation of the cache dirs. 120 chroots are created serially, and if there are expensive to resolve / build python requirements, things will be slow. 30 minutes is surprising still, but nailing down this key fact is a vital step.
f
oh, interesting. I’m confused though, why does pants need a chroot-per-target to lint but not to test?
e
It does. There are options for both tasks to per-target or not iirc and they may default oppositely or you may have them configured oppositely.
f
oh, interesting.
Copy code
--[no-]test-pytest-chroot (default: False)
    Run tests in a chroot. Any loose files tests depend on via `files`
    dependencies will be copied to the chroot.
so yeah I guess it defaults to not using for pytest. I don’t see a similarly-named argument for linting
e
That option does not apply to this discussion - confusingly. It's --fast
f
ok, I’m fairly certain that the interpreter and requirements caches are being cached. however, the pants workdir (repo_root/.pants.d) I think is not being cached, which is where the chroots actually go, right? is that what needs to be cached?
e
Correct and yes
h
Re why it needs a chroot-per-target - that particular “lint” check isn’t just checking for style nits, it’s actually checking that your python modules eval without error, using just their declared dependencies. Doing this in one shared chroot would somewhat subvert that purpose.
f
sigh… caching .pants.d might be untenable for us. it’s something like 5 GiB, which needs to be synced up and down every CI run (CircleCI)
e
So - how fast was this task in 1.3.0? Its super-surprising it regressed this much,
f
lint wasn’t its own separate goal in 1.3.0 AIUI, it happened as part of test and other goals, and testing was not that slow (I guess because it also didn’t use a separate chroot per target it was testing?)
e
Nope, should have been an isolated run regardless of what phase it ran in. So it took how long before?
f
ohhh, interesting. I’d assumed all the chrooting from our old 1.3 runs was for the tests, but I guess it was for the lints (at least also for the lints). the python-eval stage took ~15 minutes then it seems. still somewhat of a regression I guess, but not as much as I thought
oh, or maybe it actually took the same? I can’t tell if linting piggy-backed on the pytest sharding to also only lint 1/n of the targets
e
linting has never sharded
The change. In the old eval task we had this option defaulting False:
Copy code
register('--closure', type=bool,
             help='Eval all targets in the closure individually instead of just the targets '
                  'specified on the command line.')
We now unconditionally eval all targets in the closure.
You might just want to turn the check off. You lose pre-emptive checking that imports match BUILD dependencies.
Copy code
$ ./pants options --scope=lint.python-eval
lint.python-eval.fail_slow = False (from HARDCODED)
lint.python-eval.skip = True (from CONFIG in pants.ini)
So we turn it off in pantsbuild/pants and that looks like:
Copy code
[lint.python-eval]
skip: True
f
interesting, ok. that might be what we end up with
thanks for all the help