is there a way to turn off the v2 fingerprinting a...
# general
s
is there a way to turn off the v2 fingerprinting and dependency resolver for all the BUILD files in a repo? We have a rather large monorepo that has 500+ BUILD files. Everytime we run a pants 2.0 goal it takes 15-20 seconds of processing before it begins executing the actual goal. We are looking for speed optimizations at this point. We have just migrated to v2 and things seem much slower than v1 at this point. My deps and constraints are all cached at this point so we are spending 0 time resolving them…its all pre-processing all the build files it seems.
w
is
pantsd
turned on?
(it’s on by default)
s
its off
this is all in my separate
<http://pants.ci|pants.ci>.toml
file where that is specifically off
w
pantsd
memoizes the vast majority of work, so if you’re going to be running multiple commands in a row, it’s a good idea to have it on
CI is an interesting case… how many independent runs of pants do you do?
s
we use pants to start up a few running processes (services) then run tests with pants
so 5-10 depending on the job
w
ok. and the 5-10 runs of
pants
run sequentially?
s
we are pulling cache into the job so all of the time is used on what looks like fingerprinting and dependency inference
yes it is sequential.
even with pants.d on that doesnt seem to help
w
ok. yea, turning on
pantsd
should amortize well in that case.
because you will pay the startup cost once
s
yeah. so what i did was in my base image … run a pants goal so it does all this work. then use that base image (docker) in all my jobs. Then i load in the pants cache from the 3 locations. so we skip alot of overhead there.
with pants.d its still taking awhile.
some of the “inferring python dependencies” process can take 10-15 seconds per build file. regardless of the cache or pants.d state
w
so, to be clear: when i say
pantsd
, i mean the daemon process, rather than the
.pants.d
directory.
h
One thing to clarify, do you mean the folder
.pants.d
, or the option
[GLOBAL].pantsd
aka
--pantsd
? The folder
.pants.d
is nothing more than a temporary workdir - it’s almost all empty in v2, and caching it shouldn’t make much difference What matters is the
--pantsd
option
Oh coke hah
s
yes sorry that is my mistype
im referring to --pantsd
w
but yes, inference is only memoized in memory: it is not cached on disk in the current version. so you need to enable
pantsd
to keep it warm.
s
okay. thanks!
w
alternatively, it is possible to disable dependency inference, but the goal is for that to not be necessary if
pantsd
is kept alive for longer periods.
there are lots of benefits to inference… see https://blog.pantsbuild.org/dependency-inference/
s
yeah. we are using dependency inference for a few things but there are some we dont need it for. so thats nice to know we can shut it off for some ci tasks that we dont use to save time and money
w
well, not quite. if you have been using it and turn it off, many things will not build, because they will no longer have their dependencies. it changes what is necessary to put in BUILD files
s
yeah. out build files still list out dependencies because we never removed them from the migration from v1
would removing the dependencies speed things up with dependency inference? Or am i miss-understanding this
w
the “inferring python dependencies” item is dependency inference running: if you turn off dependency inference, you’ll have to make BUILD files larger rather than smaller
it’s not the dependency count that contributes to the performance… really just whether inference is on.
s
ah okay. i didnt know if it was trying to parse the dependencies twice because i was defining them in the build files and using inference
h
It wouldn’t speed up the dependency inference step, but it would very likely result in finer-grained dependencies, which means more cache hits for things like running tests. Pants dep inference is at the file level, whereas all your v1 dependencies are at the target level. In the two repos we tested, this resulted in about 30% finer grained cache keys and your devs don’t have the burden of updating BUILD files every time they change an import
🔥 2
s
YES that is what i was getting at @hundreds-father-404
💯 1
I will try this then. That should help as tests are the place things seem to underperform
w
oh, i see what you’re saying. if dependency inference is turned on, you can delete most dependencies. yes.
but also, do try enabling
pantsd
in CI, and see how that does. across 5-10 runs, you should see a
4-9 * 20 second
reduction in runtime
👍 1
(because inference will only happen on the first run)
h
Cool, we think dependency inference was a game changer and hope that your devs will be much happier with it
you can delete most dependencies.
In those two repos (pantsbuild/pants and Toolchain), we were able to delete ~90% of our BUILD file content -- I hear you on the perf overhead though. We agree there’s lots of perf work to be done on Pants, including: - reducing the frequency of Pantsd restarting - speeding up resolving requirements - reducing overhead of running a Pex, which should speed up things like
./pants test
s
yeah the running a pex is a big one. that was my second perf complaint / area for improvement
1
im very happy with the progress of pants. we have been using it since i believe 1.12 or so?
maybe even sooner… tough to remember that far back
v2 is definitely a game changer
❤️ 1
h
Definitely agreed, it slows down things like TDD
im very happy with the progress of pants
Glad to hear! It can sometimes be overwhelming for me personally knowing how much we still have to improve, that it’s not yet where I want the tool to be. But it’s a good reminder to celebrate the progress that has been made
s
oh i could go on and on about the improvements so pat yourselves on the back for that. Thats the beauty of being a software dev… nothing is ever “finished” 😉
💯 1
❤️ 2
h
Just to follow up, are you not seeing improvements in CI even with pantsd turned on? If a single CI run is running multiple consecutive pants processes then you should, unless something is awry.