Hi, we are using Pants for awhile as a building to...
# general
c
Hi, we are using Pants for awhile as a building tool in GitActions. But we are struggling a lot running the tests. We have a big number of tests, and very frequently some of them they keep running for long time until the timeout is reached. The complexity of the test is irrelevant as it is happening randomly with big and small ones. I have tried to play with different memory configuration, big runners and the result is the same. Do you have any suggestion or idea why this could be happening? Even we have tagged the tests in different ones but in the simplest ones we are having issues if the number is big enough. This is slowing down a lot our CI and it is making us thing about abandon using Pants
l
Javier, very sorry to hear about it, sounds frustrating. We may find together that this is one of: • an issue intrinsic to how you are running tests in github actions that would not go away if you switched away from pants • something that can be fixed by adjusting the configuration of how you run tests in pants • an intrinsic problem in how pants runs tests in gha - for which there might already be an issue in https://github.com/pantsbuild/pants/issues or if not we could report one. We are very keen on pants working well for the use-case of running tests in github actions. Where all have you looked so far for guidance on running pants, and specifically tests, in CI? Could you describe in a bit more detail what you are dealing with?
c
Copy code
Exceeded timeout of 180.0 seconds when executing local process: Run Scalatest runner for *************pec.scala



✕ *******.scala failed in 180.09s.
This is happening randomly for some tests. But those test it doesn’t have any special. Unless it could be related to they are :
Copy code
extends AsyncWordSpec
      with ScalatestRouteTest
      with AsyncMockFactory
Async, but this is working fine for
sbt
I cannot give you more expecific details because of the confidentiality of our code
I am thinking maybe it is a death lock, but no idea why
l
and the failure also happens sporadically when you run the tests from your machine? (with pants)
and it also works fine with sbt in github actions?
h
Sorry for the frustration! Let's see if we can get this working for you.
One thing to try is to turn down the concurrency using this option.
Pants may be trying to run too many test processes concurrently
So to test out this theory you could try setting
--process-execution-local-parallelism=1
(or
process_execution_local_parallelism = 1
in the
[GLOBAL]
section of pants.toml) so Pants only runs one process at a time.
Obviously that is extreme, and it will slow things down, but it will be useful as a test to see if it helps.
Let us know what that does!
c
Locally it is not happening, I will try what you told me Thanks. I will let you know the results
I have tried in local with only one process and I am having the same issue, even for tests that are very simple that are not Async
e
So, Pants uses nailgun by default for JVM processes. Scala is a notorious memory hog. Maybe there are hangs accessing the nailgun pool?: https://github.com/pantsbuild/pants/blob/d99977e6876ba7a9d391bd99556553ea4ca01219/src/rust/engine/src/context.rs#L223-L235 @careful-mechanic-89327 you might try turning nailgun use off: https://www.pantsbuild.org/docs/reference-global#process_execution_local_enable_nailgun That is not a good long term solution (nailgun generally helps speed things up), but it might shed light as a debug step if the hangs go away.
c
I am trying improving our test just in case we could some issues managing Async task. Nailgun is using a pool of JVM? could it be possible if the tests are not handling properly the threads , generate deathlocks or lack of thread due not to release some threads? I am just guessing. I will try what you say. But it could be possible what you say, hung accessing the nailgun pool
Thanks
e
@careful-mechanic-89327 it's always possible there is a deadlock in your test code only revealed when run in a certain way that Pants exposes but sbt does not. That's up to you to figure out if so, since you're the one looking at your code. My suggestion assumes that's not the case and your code is fine, in which case turning off nailgun in Pants is one idea for a debug step.
c
Thanks for the tips. But not using the nailgun I have had the same issue some test reached the time out again
e
Ok. Well, if I were you at this point and had the fortitude to not give up and just use sbt, I'd be using jstack or similar tools to snap the thread stacks of the hung vm and see if there is any obvious deadlock inside the JVM.
It's been awhile since I worked in a JVM, but deadlocks are pretty easy to spot as I recall. There are monitor IDs and you can look for a pattern of 1 thread holding ID1 and trying to acquire ID2 and another thread holding ID2 and trying to acquire ID1.
Assuming its a simple deadlock like that anyway. I guess you could have longer chains that may be harder to spot.