I am using a python package named "datamodel-code-...
# general
b
I am using a python package named "datamodel-code-generator" to generate a pydantic models module based on graphql types. I expect that file to be stored in "python/data/src/data/models.py". Right now I use a quick and dirty Makefile:
Copy code
graphql-models:
	find ../../graphql -type f -exec cat {} \; \
		| datamodel-codegen  ...
I am trying to use Pants "the right way", so I tried with `run_shell_command`:
Copy code
python_requirement(
    name = "datamodel-codegenerator",
    requirements = ["datamodel-code-generator[graphql]>=0.25.5"],
)
shell_command(
    name = "generate_graphql_models",
    command = "bash ./scripts/generate-models.sh",
    execution_dependencies = [
        "graphql:schema",
        "./scripts:generate_graphql_models_source",
        ":datamodel-codegenerator",
    ],
    output_files = ["models.py"],
    tools = [
        "bash",
        "cat",
        "find",
        "python3",
    ],
)
However, when running
pants export-codegen python/data:generate_graphql_models
, the generated file is stored in
dist/codegen/...
. Is there a way to "move" it to "python/data/src/...." ? I also tried using
adhoc_tool
, which looks more adapted to what I achieve to achieve:
Copy code
python_requirement(
    name = "datamodel-codegenerator",
    requirements = ["datamodel-code-generator[graphql]>=0.25.5"],
)

system_binary(
    name = "bash",
    binary_name = "bash",
)

system_binary(
    name = "python3",
    binary_name = "python3",
)

system_binary(
    name = "cat",
    binary_name = "cat",
)

system_binary(
    name = "find",
    binary_name = "find",
)

adhoc_tool(
    name = "generate_graphql_models",
    args = ["./scripts/generate-models.sh"],
    execution_dependencies = [
        "graphql:schema",
        "./scripts:generate_graphql_models_source",
        ":datamodel-codegenerator",
    ],
    output_files = ["src/data/models.py"],
    runnable = ":bash",
    runnable_dependencies = [
        ":python3",
        ":cat",
        ":find",
    ],
)
But for some reason I can't get the Python dependency I need:
Copy code
$ pants export-codegen python/data:generate_graphql_models
23:18:32.26 [INFO] Completed: Running the `adhoc_tool` at python/data:generate_graphql_models
23:18:32.26 [ERROR] 1 Exception encountered:

Engine traceback:
  in `export-codegen` goal

ProcessExecutionFailure: Process 'the `adhoc_tool` at python/data:generate_graphql_models' failed with exit code 1.
stdout:

stderr:
/usr/bin/python3: No module named datamodel_code_generator
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13



Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.
What am I missing? Any help appreciated! 🙂
g
I'd use
--keep-sandboxes=on_failure
and then inspect the sandbox and debug it in-place. Without seeing
generate-models.sh
it does seem like an issue with that script and how it invokes the
datamodel-codegenerator
. I'd imagine you need to build it as a
pex_binary
and then run that.
b
The script is quite simple:
Copy code
#!/bin/bash -eux

PYTHON_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')

find ../../graphql -type f -exec cat {} \; |
    python3 -m datamodel_code_generator \
        --input-file-type graphql \
        --output models.py \
        --output-model-type pydantic_v2.BaseModel \
        --use-union-operator \
        --target-python-version "$PYTHON_VERSION" \
        --use-schema-description \
        --custom-file-header '# pyright: reportIncompatibleVariableOverride=false'
The issue occurs because depiste adding ":datamodel-codegenerator" to the execution_dependencies, the datamodel_code_generator package is not made available to the Python interpreter. Just surprised I do not have that issue when using shell_command, as I run the exact same script successfully with it. BTW, what is the right way of generating my models.py file?
Thanks for the suggestion @gorgeous-winter-99296! I've got something that works: BUILD:
Copy code
python_requirement(
    name="datamodel-code-generator",
    requirements=["datamodel-code-generator[graphql]>=0.25.5"],
)

python_source(
    name="generate_graphql_models_py",
    dependencies=[":datamodel-code-generator"],
    source="generate_graphql_models.py",
)

pex_binary(
    name="generate_graphql_models",
    dependencies=[
        ":generate_graphql_models_py",
        "graphql:schema",
    ],
    entry_point="generate_graphql_models.py",
)
generate_graphql_models.py:
Copy code
from pathlib import Path

from datamodel_code_generator import (
    DataModelType,
    InputFileType,
    PythonVersion,
    generate,
)


def read_schema(schema_dir: Path) -> str:
    if not schema_dir.is_dir():
        raise Exception(f"Invalid schema_dir: {schema_dir}")
    return "".join(file.read_text() for file in schema_dir.glob("**/*.graphql"))


def generate_pydantic_models(schema_str: str, models_path: Path) -> None:
    generate(
        input_=schema_str,
        input_file_type=InputFileType.GraphQL,
        output=models_path,
        output_model_type=DataModelType.PydanticV2BaseModel,
        use_union_operator=True,
        use_schema_description=True,
        target_python_version=PythonVersion.PY_312,
        custom_file_header="# pyright: reportIncompatibleVariableOverride=false",
    )


if __name__ == "__main__":
    schema = read_schema(Path())
    generate_pydantic_models(schema, Path("python/data/src/timequest_data/models.py"))
Now I can run
pants run python/data:generate_graphql_models
and it generates the
models.py
file in the right place. It still doesn't feel like the right way to do this, but I guess the right way would be to write a codegen plugin for
datamodel_code_generator
. So in the short term I'll probably settle for this solution.
Guess I found a much cleaner way to achieve it: codegen.py:
Copy code
from pathlib import Path

from datamodel_code_generator import (
    DataModelType,
    InputFileType,
    PythonVersion,
    generate,
)


def read_schema(schema_dir: Path) -> str:
    if not schema_dir.is_dir():
        raise Exception(f"Invalid schema_dir: {schema_dir}")
    return "".join(file.read_text() for file in schema_dir.glob("**/*.graphql"))


def generate_pydantic_models(schema_str: str, models_path: Path) -> None:
    generate(
        input_=schema_str,
        input_file_type=InputFileType.GraphQL,
        output=models_path,
        output_model_type=DataModelType.PydanticV2BaseModel,
        use_union_operator=True,
        use_schema_description=True,
        target_python_version=PythonVersion.PY_312,
        custom_file_header="# pyright: reportIncompatibleVariableOverride=false",
    )


if __name__ == "__main__":
    schema = read_schema(Path("../../../graphql"))
    generate_pydantic_models(schema, Path("timequest_models.py"))
BUILD:
Copy code
python_requirement(
    name="datamodel-code-generator",
    requirements=["datamodel-code-generator[graphql]>=0.25.5"],
)

python_source(
    name="codegen_py",
    dependencies=[
        ":datamodel-code-generator",
        "graphql:schema",
    ],
    source="codegen.py",
)

adhoc_tool(
    name="codegen",
    output_files=["timequest_models.py"],
    runnable=":codegen_py",
)

experimental_wrap_as_python_sources(
    name="models",
    inputs=[":codegen"],
)
Then I can simply depend on the "models" target.