I have a Python monorepo that uses pants. We have ...
# general
d
I have a Python monorepo that uses pants. We have tests registered with
python_tests
that run locally. We're planning on writing tests that run on a remote host. This remote host environment doesn't directly support
pex
files. We don't have the ability to use the remote execution currently built into pants as that requires us to install tools that are incompatible with the remote host (it's a long story). Our current plan is to extend the test task with a custom target in an internal plugin that looks something like this: • For each resolve ◦ Figure out which tests need to run based on Pants cache misses ◦ Figure out if there's a remote host running that already has the external python libraries we need. If not, start a remote host and install the libraries we need. ◦ Ship a
.pex
file to the remote host ◦ Unzip the
pex
and add the source files to the path ◦ From a
@rule
run
pytest
on the remote host and return a Pants compatible
TestResult
How I should go about doing this? We've made a pretty good dent in this with our current approach. But today, we tried using
register_plugin_field
as well as extending
PythonTestTarget
and in both cases we ran into issues where our tests ran twice--once for the built-in pytest runners and once for our remote pytest runner (which isn't what we want). Any tips on what we're potentially doing wrong here would also be appreciated
h
So that
.pex
file would only contain sources, no third-party deps?
The third-party deps would be provided by the runtime on the remote host?
In fact, it wouldn't even need to be a
.pex
file, it sounds like, just any old .zip file would do, but
.pex
happens to be convenient?
d
Yep, that's accurate
We could just use a
.zip
We've made a pretty good dent in this with our current approach. But today, we tried using
register_plugin_field
as well as extending
PythonTestTarget
and in both cases we ran into issues where our tests ran twice--once for the built-in pytest runners and once for our remote pytest runner (which isn't what we want). Any tips on what we're potentially doing wrong here would also be appreciated
Any input on this part? I'll try to clean things up into a tidy code snippet on Monday if that helps.
Copy code
from dataclasses import dataclass
from typing import Any
from pants.backend.python.goals.pytest_runner import PyTestRequest
from pants.backend.python.target_types import PythonTestTarget, PythonTestsGeneratorTarget, PythonTestsGeneratingSourcesField, PythonTestsOverrideField, PythonFilesGeneratorSettingsRequest, SkipPythonTestsField
from pants.backend.python.subsystems.pytest import PythonTestFieldSet, PyTest
from pants.backend.python.subsystems.setup import PythonSetup
from pants.backend.python.subsystems.python_tool_base import PythonToolBase
from pants.backend.python.target_types import PythonTestSourceField, PythonTestsDependenciesField, _PYTHON_TEST_MOVED_FIELDS
from pants.core.goals.test import Partitions, TestFieldSet, TestRequest, TestResult, TestSubsystem, ShowOutput, TestDebugAdapterRequest, TestDebugRequest
from pants.engine.fs import EMPTY_FILE_DIGEST
from pants.engine.rules import rule, collect_rules
from pants.engine.target import (
    COMMON_TARGET_FIELDS,
    Dependencies,
    BoolField,
    IntField,
    SingleSourceField,
    Target,
    TargetFilesGenerator
)
from pants.engine.unions import UnionRule

from pants.option.option_types import SkipOption
from pants.option.subsystem import Subsystem

# adding one field to differentiate our target from the python_test and python_tests target
class RunOnRemoteField(BoolField):
     alias = "run_on_remote"
     help = "Whether to run the test on remote."
     default = False
     required = True

# extending target instead of PythonTestTarget as a step to try and elminate issues
class RemotePythonTestTarget(Target):
    alias = "remote_python_test"
    help = "Tests run on remote."
    core_fields = (
        *COMMON_TARGET_FIELDS,
        *_PYTHON_TEST_MOVED_FIELDS,
        PythonTestsDependenciesField,
        PythonTestSourceField,
        RunOnRemoteField
    )

class RemotePythonTestsGeneratorTarget(TargetFilesGenerator):
    alias = "remote_python_tests"
    core_fields = (
        *COMMON_TARGET_FIELDS,
        PythonTestsGeneratingSourcesField,
        PythonTestsOverrideField,
        RunOnRemoteField
    )
    generated_target_cls = RemotePythonTestTarget
    copied_fields = (*COMMON_TARGET_FIELDS,RunOnRemoteField,)
    moved_fields = _PYTHON_TEST_MOVED_FIELDS
    settings_request_cls = PythonFilesGeneratorSettingsRequest
    help = "Generate a `remote_python_test` target for each file in the `sources` field."

# Our field set is the same as PythonTestFieldSet, but with one additional field.
@dataclass(frozen=True)
class RemotePythonTestFieldSet(PythonTestFieldSet):
    required_fields = (PythonTestSourceField, RunOnRemoteField)
    run_on_remote: RunOnRemoteField

    @classmethod
    def opt_out(cls, tgt: Target) -> bool:
        return tgt.get(SkipPythonTestsField).value or not tgt.get(RunOnRemoteField).value

# We're not going to be running pytest locally, so we extend the default Subsystem instead of the Pytest subsystem (although we tried that and it too didn't make a difference)
class RemotePythonTestSubsystem(Subsystem):
    name = "Remote Python Tests"
    options_scope = "example-test"
    help = "Tests run on remote."
    skip = SkipOption("skip_remote_tests")


# Define our test request
@dataclass(frozen=True)
class RemotePythonTestRequest(TestRequest):
    field_set_type = RemotePythonTestFieldSet
    tool_subsystem = RemotePythonTestSubsystem

# somehow, this gets called twice
# It gets called once with our valid RemotePythonTestFieldSet
# but it also gets called a second time with a PythonTestFieldSet entry. This shouldn't happen because RemotePythonTestFieldSet doesn't
# inherit from PythonTestFieldSet
@rule
async def run_remote_tests(
    batch: RemotePythonTestRequest.Batch[RemotePythonTestFieldSet, Any],
) -> TestResult:

    print(f"batch is: {batch}")

    return TestResult(
        exit_code=None,
        stdout="",
        stderr="",
        stdout_digest=EMPTY_FILE_DIGEST,
        stderr_digest=EMPTY_FILE_DIGEST,
        addresses=tuple(field_set.address for field_set in batch.elements),
        output_setting=ShowOutput.ALL,
        result_metadata=None,
        partition_description=batch.partition_metadata.description,
    )


def rules():
    return [
        # Originally, we tried just re-using the existing targets by adding a field, but abandoned that approach
        # when we saw run_remote_tests was getting called twice
        # PythonTestTarget.register_plugin_field(RunOnRemoteField),
        # PythonTestsGeneratorTarget.register_plugin_field(RunOnRemoteField),
        # UnionRule(TestFieldSet, RemotePythonTestFieldSet),
        *RemotePythonTestRequest.rules(),
        *collect_rules(),
    ]
That's
stuff.py
where we're defining our rules and targets.
register.py
looks like this:
Copy code
from remote_cluster_plugin.remote_integration_test import stuff


def rules():
    return [ *stuff.rules()]


def target_types():
    return [stuff.RemotePythonTestTarget, stuff.RemotePythonTestsGeneratorTarget]
We have one project with a
remote_python_tests
target in its
BUILD
. It has one test file in
integration_tests
, but somehow
run_remote_tests
is being called twice for that target. In one case, the batch has the aforementioned target as the sole element as type
RemotePythonTestFieldSet
. The second time the rule is called, it's the same address, but it's of type
PythonTestFieldSet
which shouldn't be possible
FWIW, we're using
2.16.0rc4
. Any ideas on what's causing this bug?
e
I think you can step away from all the details of Pants and just focus on the word set here. The fundamental problem is the builtin rules operate on a certain set. Your new stuff creates a superset. Unless you can come up with a disjoint set you're doomed without bigger surgery to how things work.
That's my guess anyhow.
Disjoint or subset. Basically if you subtract one field the builtin rules need, then they shouldn't run, just yours will. The problem may be that you really do need all those fields. Perhaps take a simple one and duplicate it with a new name. That would be enough I'd think.
d
That makes sense, I'll give that a shot as a work around. Out of curiosity though, why is my
run_remote_tests
rule being called with elements of type
PythonTestFieldSet
when it clearly requires
RemotePythonTestFieldSet
? Is the type hint ignored? (I seem to recall seeing something about that in the docs) I assumed that the type
RemotePythonTestRequest.Batch
would be sufficient to avoid this
e
That I have no clue about. I haven't been in the details here in a while.
h
Hmm that is strange, even if this were purely duck-typed, that
RunOnRemoteField
is not available on
PythonTestFieldSet
, so it shouldn't trigger
Do you know how to run Pants from sources for debugging?
d
Do you know how to run Pants from sources for debugging?
No, but I don't think that will be necessary.
PythonTestFieldSet
has one required field
PythonTestSourceField
(or something like that). We made our own
PythonSourceField
class which is just like
PythonTestSourceField
, but with a different name. Then we made our own sources target which uses our source field. This allows the
pants test
goal to still run our
@rule
as part of testing, but prevents the standard pytest rule from picking it up
Hmm that is strange, even if this were purely duck-typed, that
RunOnRemoteField
is not available on
PythonTestFieldSet
, so it shouldn't trigger
Yeah, I agree. I think this is a pants bug. We've worked around it for now. Now that I think I understand things better, this is what we should have done initially, but 1. that wasn't immediately clear 2. this bug made us think we were simply doing something wrong
h
Still, we'd like to fix the bug, if it is one!