Hi folks. Wondering if anyone has tried to paralle...
# general
d
Hi folks. Wondering if anyone has tried to parallelize tests in buildkite? I'm getting weird coverage report failures. I'm planning the tests by filtering for all the test targets and
split -l $block_size
. I suspect that I need to toposort and dedupe test targets?
w
you shouldn’t need to unless you have test targets depending on other test targets (which is discouraged, but not banned, afaik)
assuming that we’re talking about Pants 2.x though, tests should be fully isolated and parallelized by the number of cores by default
have you already confirmed that the auto-detected https://www.pantsbuild.org/docs/reference-global#section-process-execution-local-parallelism value in this environment looks like it will saturate the box?
d
yes we are using pants 2.x. I'm getting coverage report failures that I can't repro locally
I think we are using autodetect. I can't find an option on buildkite to enable a bigger build instance
so I resorted to parallelizing the tests
w
what is the coverage failure?
d
I get a log line like
Copy code
22:20:50.31 [WARN] Failed to generate coverage data for some/package/test/python/my/test.py:tests
But can not repro locally
w
interesting… do all targets report that warning, or only some of them? and to confirm, you’re not running with
test --debug
, are you?
d
just one of them. Deterministically. Let me add
--debug
flag and run again
w
sorry: would not recommend adding
--debug
.
just wanted to confirm that you were not using it in this environment.
--debug
causes things to run sequentially in the foreground, and it disables some portions of coverage capture.
d
oh no, I'm not adding --debug in CI
w
do you have a separate
toml
config for CI by any chance?
d
yes
w
does it set
[test] debug
?
d
no
Copy code
[test]
use_coverage = true

[coverage-py]
report = ["json", "html"]
global_report = true
👍 1
w
ok. since it’s just one target, i do suspect something fishy related to test targets depending on other test targets
or perhaps a target that is actually a library, but is marked as a test
d
is there a query I can run to find test depending on tests? Bazel style? 😂
w
not exactly 😃
./pants dependees $target
should tell you if that target is being pulled in elsewhere
1
can also use something like
./pants filter --target-type=python_tests :: | xargs ./pants dependees
to check all test targets at once
d
Copy code
$ ./pants filter --target-type=python_tests :: | xargs ./pants dependees             13s
16:39:26.19 [INFO] Initialization options changed: reinitializing scheduler...
16:39:26.89 [INFO] Scheduler initialized.
16:39:28.07 [INFO] Initialization options changed: reinitializing scheduler...
16:39:29.14 [INFO] Scheduler initialized.
No such targets found. One more interesting observation, the ordering of files in the test report is different in CI and locally, when I run the same command.
w
mm, that’s not ideal.
but possibly related to the other issue you’re experiencing with this problematic target.
d
Local is Mac, and CI is Ubuntu 18
Is there any way to get job logs? Seems like they are suppressed. Maybe there is way to see what coveragepy is complaining about?
w
good idea. you can pass
test --output=all
(or set
[test] output = "all"
) to have the tests dump their output, which might include some more information about why the
.coverage
file wasn’t written
2
d
While I wait for CI, just wanna get a sanity check on my test script. Maybe I wrote something silly
Copy code
./pants -l=error filter --filter-target-type=python_tests :: | sort > /tmp/all_tests

total_tests=$(cat /tmp/all_tests | wc -l)

total_blocks=$BUILDKITE_PARALLEL_JOB_COUNT
if test -z "$total_blocks"; then
    total_blocks=1
fi
current_block=$BUILDKITE_PARALLEL_JOB
if test -z "$current_block"; then
    current_block=0
fi

block_size=$(((total_tests/total_blocks)+1))

size as well
split -l $block_size -a 1 /tmp/all_tests test_target_set_

extra_space=$((block_size*total_blocks-total_tests))
empty_blocks=$((extra_space/block_size))

alphabet=({a..z})
block_index=${alphabet[current_block]}

# Skip empty blocks
if [ "$current_block" -ge "$((total_blocks-empty_blocks))" ]; then
    exit 0
fi

tests_to_run=$(cat test_target_set_$block_index | tr "\n" " ")

./pants test --output=all $tests_to_run
I found the cause. The test crashed and no coverage was ever generated. I'm still trying to figure out why running the test alone vs running groups tests would result in test passing vs crashing.
the test seems to crash without logs. No core dump no stacktrace 🤦‍♂️
👀 1
h
do you have the full Pants run for that?
fyi Rob shared a log over DM (for privacy). Pants reports that the test fails with exit code -9, which is sigkill and usually OOM killed @dazzling-diamond-4749 you said local is macOS, which has no OOM killer Do you expect the tests to be using lots of memory?
🙏 1
d
Do you expect the tests to be using lots of memory?
Ohhhhhh, maybe. Its loading BERT, LOL
👆 1
h
d
I see. I guess, is there a way to print OOM killed instead of just exiting -9? I'd be happy to contribute code for this
h
Hmmm maybe, yeah. You can't get Pytest itself to do that, it gets killed abruptly without being able to clean up The exit code line comes from https://github.com/pantsbuild/pants/blob/824b8d70dbdfb86cc89b1bebfaea9711fca4a715/src/python/pants/core/goals/test.py#L127. Maybe we should augment it to detect exit code's of 9 and -9? Or in general when running any Process, detect that exit code and always log a warning? That would catch things like resolving requirements, running a linter, etc, rather than just running the test I'm not sure, always tricky to know when to add special casing like this vs. when it's distracting, including the risk of leading people astray. What do you think?
d
Let me read a bit on linux OOM killer. Maybe there is some stable interface we can depend on to print OOM killed?
w
Negative exit codes are always signals. So we could convert that into something like "received signal SIG$X"
d
so
exit -x
is -->
kill -x
?
w
Yes, although it looks like this is a bit process-API specific. I believe it's always true in our usage of the Rust process APIs.
d
I see, I was just reading about
128 + SIG
as the standard convention
d
nice! I'll open an issue and send a PR. + a test that make sure
SIGINT
is correctly caught and surfaced as
Neg(sig)
<-- does this suffice?
w
Yea, maybe... I just realized that it might not be iron clad, because when we remote execute a process, we won't be able to get the signal. So maybe the better thing to do would be to change the rust code near where I linked above to append some information about the signal to stderr...
d
I see. so ?
Copy code
exit_status.signal().map(tee(print)).map(Neg::neg))
w
But do feel free to file the issue: if you're interested in fixing it I can add more info about how. If not, I can take a look.
2
🙏 1
d
I don't rust at all. Just guess there has to be some thing like
tee
h
To clarify my understanding of this entire thread, when you say "parallelize tests in buildkite" do you mean sharding across multiple buildkite machines? Or running tests in parallel on a single machine? Pants will happily take care of the latter