Hi all, I'm having problems porting our repository...
# general
p
Hi all, I'm having problems porting our repository to use pantsbuild, builds seems to be really slow and use a lot of disk space. This is a data science related repository, having dependencies on rather big python packages like tensorflow, pandas, etc. It also has around 170 test files containing unittests. Linting targets work fine, problems start when running tests. Current approach runs tests in docker container, as we need to a) run on linux b) have some system libraries present. In short, we do docker build & docker run pytest tests. The whole process - before migrating to pants - takes around 4 minutes: 3 minutes for building the image, 1 minute for running all tests using pytest-xdist plugin. With pants, the initial attempt was really slow,
pants test
took around 25 minutes, including pex builds for each test and test execution, and with parallelism set to 1. Because of 3rd party binary dependencies, tests were running within docker_environment, based on ubuntu image that contains all that we need. Lots of tests take a while to start, that is due to import of tensorflow taking some time, so I enabled
batch_compatibility_tag
for all tests and run them in a couple of batches - that makes using cache a bit pointless, but at least build times got to really decent 5 minutes. Next, I tried to use
run_against_entire_lockfile=true
setting (I am using lockfiles for my 3rd party requirements) to reuse single pex binary for whole codebase (as tensorflow installations were causing every build to be pretty slow). That reduced build times a bit more to around 3 minutes, with batch size of 64 tests (or are those tests files?)... However, after a while I run out of disk space and discovered that sandboxes are not removed properly. I use keep_sandboxes=never (default), but looks like when tests are run using docker_environment, sandboxes are not cleared. When I run tests on my host machine, they are properly removed from /tmp directory. Is that expected? As each sandbox has 0.5GB of tensorflow wheel installed, I'm running out of disk space pretty quickly when modyfing and rerunning tests... Ideally I'd like to avoid using batching as our aim was to run only changed tests, but looks like without reworking our dependencies it is not possible? Are there any more tricks I can use here to speed up pex builds?
h
cc @witty-crayon-22786 re the environments sandbox cleanup question
And I think you can still run only changed tests with
--changed-since=GITSPEC
and
--changed-dependents=transitive
? Then those will be batched.
w
the sandboxes not being cleaned up is covered by https://github.com/pantsbuild/pants/issues/18329 … as a workaround, you should be able to change the default user in the image to something other than
root
👀 1
with parallelism set to 1.
why set the parallelism to 1?
p
container do not have any user configured, so it runs as root - thanks for pointing that out, I will try the workaround
regarding the parallelism, level==2 causes occasional freezes of my machine when running tests in the background, but it is acceptable, leaving it to default value makes it impossible to work on code at the same time as all cores and threads are used; I will play with that a bit more but my main concern was pex building time and disk size for now 🙂
And I think you can still run only changed tests with --changed-since=GITSPEC and --changed-dependents=transitive? Then those will be batched.
Ah, ok, I get this now. So, it looks like for github actions it would be the best to run with --changed-since=GITSPEC and --changed-dependents=transitive and batch the tests together to avoid overhead of pytest startup time. I somehow thought the assignment of files to batches is permanent for the given codebase to make use of the cache - but I guess it depends on how the pants test is executed (with what target list), right?
h
Exactly
we try and keep the partition stable on a given set of inputs, but not at the expense of running tests you didn't ask for
p
all clear, thank you!