I am using a python package named datamodel code generator t Pants #general

I am using a python package named "datamodel-code-...

bumpy-noon-80834

04/02/2024, 3:23 PM

I am using a python package named "datamodel-code-generator" to generate a pydantic models module based on graphql types. I expect that file to be stored in "python/data/src/data/models.py". Right now I use a quick and dirty Makefile:

Copy code

graphql-models:
	find ../../graphql -type f -exec cat {} \; \
		| datamodel-codegen  ...

I am trying to use Pants "the right way", so I tried with `run_shell_command`:

Copy code

python_requirement(
    name = "datamodel-codegenerator",
    requirements = ["datamodel-code-generator[graphql]>=0.25.5"],
)
shell_command(
    name = "generate_graphql_models",
    command = "bash ./scripts/generate-models.sh",
    execution_dependencies = [
        "graphql:schema",
        "./scripts:generate_graphql_models_source",
        ":datamodel-codegenerator",
    ],
    output_files = ["models.py"],
    tools = [
        "bash",
        "cat",
        "find",
        "python3",
    ],
)

However, when running

pants export-codegen python/data:generate_graphql_models

, the generated file is stored in

dist/codegen/...

. Is there a way to "move" it to "python/data/src/...." ? I also tried using

adhoc_tool

, which looks more adapted to what I achieve to achieve:

Copy code

python_requirement(
    name = "datamodel-codegenerator",
    requirements = ["datamodel-code-generator[graphql]>=0.25.5"],
)

system_binary(
    name = "bash",
    binary_name = "bash",
)

system_binary(
    name = "python3",
    binary_name = "python3",
)

system_binary(
    name = "cat",
    binary_name = "cat",
)

system_binary(
    name = "find",
    binary_name = "find",
)

adhoc_tool(
    name = "generate_graphql_models",
    args = ["./scripts/generate-models.sh"],
    execution_dependencies = [
        "graphql:schema",
        "./scripts:generate_graphql_models_source",
        ":datamodel-codegenerator",
    ],
    output_files = ["src/data/models.py"],
    runnable = ":bash",
    runnable_dependencies = [
        ":python3",
        ":cat",
        ":find",
    ],
)

But for some reason I can't get the Python dependency I need:

Copy code

$ pants export-codegen python/data:generate_graphql_models
23:18:32.26 [INFO] Completed: Running the `adhoc_tool` at python/data:generate_graphql_models
23:18:32.26 [ERROR] 1 Exception encountered:

Engine traceback:
  in `export-codegen` goal

ProcessExecutionFailure: Process 'the `adhoc_tool` at python/data:generate_graphql_models' failed with exit code 1.
stdout:

stderr:
/usr/bin/python3: No module named datamodel_code_generator
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13
/usr/bin/find: 'cat' terminated by signal 13



Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.

What am I missing? Any help appreciated! 🙂

gorgeous-winter-99296

04/02/2024, 3:33 PM

I'd use

--keep-sandboxes=on_failure

and then inspect the sandbox and debug it in-place. Without seeing

generate-models.sh

it does seem like an issue with that script and how it invokes the

datamodel-codegenerator

. I'd imagine you need to build it as a

pex_binary

and then run that.

bumpy-noon-80834

04/03/2024, 12:42 AM

The script is quite simple:

Copy code

#!/bin/bash -eux

PYTHON_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')

find ../../graphql -type f -exec cat {} \; |
    python3 -m datamodel_code_generator \
        --input-file-type graphql \
        --output models.py \
        --output-model-type pydantic_v2.BaseModel \
        --use-union-operator \
        --target-python-version "$PYTHON_VERSION" \
        --use-schema-description \
        --custom-file-header '# pyright: reportIncompatibleVariableOverride=false'

The issue occurs because depiste adding ":datamodel-codegenerator" to the execution_dependencies, the datamodel_code_generator package is not made available to the Python interpreter. Just surprised I do not have that issue when using shell_command, as I run the exact same script successfully with it. BTW, what is the right way of generating my models.py file?

bumpy-noon-80834

04/03/2024, 2:42 AM

Thanks for the suggestion @gorgeous-winter-99296! I've got something that works: BUILD:

Copy code

python_requirement(
    name="datamodel-code-generator",
    requirements=["datamodel-code-generator[graphql]>=0.25.5"],
)

python_source(
    name="generate_graphql_models_py",
    dependencies=[":datamodel-code-generator"],
    source="generate_graphql_models.py",
)

pex_binary(
    name="generate_graphql_models",
    dependencies=[
        ":generate_graphql_models_py",
        "graphql:schema",
    ],
    entry_point="generate_graphql_models.py",
)

generate_graphql_models.py:

Copy code

from pathlib import Path

from datamodel_code_generator import (
    DataModelType,
    InputFileType,
    PythonVersion,
    generate,
)


def read_schema(schema_dir: Path) -> str:
    if not schema_dir.is_dir():
        raise Exception(f"Invalid schema_dir: {schema_dir}")
    return "".join(file.read_text() for file in schema_dir.glob("**/*.graphql"))


def generate_pydantic_models(schema_str: str, models_path: Path) -> None:
    generate(
        input_=schema_str,
        input_file_type=InputFileType.GraphQL,
        output=models_path,
        output_model_type=DataModelType.PydanticV2BaseModel,
        use_union_operator=True,
        use_schema_description=True,
        target_python_version=PythonVersion.PY_312,
        custom_file_header="# pyright: reportIncompatibleVariableOverride=false",
    )


if __name__ == "__main__":
    schema = read_schema(Path())
    generate_pydantic_models(schema, Path("python/data/src/timequest_data/models.py"))

Now I can run

pants run python/data:generate_graphql_models

and it generates the

models.py

file in the right place. It still doesn't feel like the right way to do this, but I guess the right way would be to write a codegen plugin for

datamodel_code_generator

. So in the short term I'll probably settle for this solution.

bumpy-noon-80834

04/03/2024, 10:02 AM

Guess I found a much cleaner way to achieve it: codegen.py:

Copy code

from pathlib import Path

from datamodel_code_generator import (
    DataModelType,
    InputFileType,
    PythonVersion,
    generate,
)


def read_schema(schema_dir: Path) -> str:
    if not schema_dir.is_dir():
        raise Exception(f"Invalid schema_dir: {schema_dir}")
    return "".join(file.read_text() for file in schema_dir.glob("**/*.graphql"))


def generate_pydantic_models(schema_str: str, models_path: Path) -> None:
    generate(
        input_=schema_str,
        input_file_type=InputFileType.GraphQL,
        output=models_path,
        output_model_type=DataModelType.PydanticV2BaseModel,
        use_union_operator=True,
        use_schema_description=True,
        target_python_version=PythonVersion.PY_312,
        custom_file_header="# pyright: reportIncompatibleVariableOverride=false",
    )


if __name__ == "__main__":
    schema = read_schema(Path("../../../graphql"))
    generate_pydantic_models(schema, Path("timequest_models.py"))

BUILD:

Copy code

python_requirement(
    name="datamodel-code-generator",
    requirements=["datamodel-code-generator[graphql]>=0.25.5"],
)

python_source(
    name="codegen_py",
    dependencies=[
        ":datamodel-code-generator",
        "graphql:schema",
    ],
    source="codegen.py",
)

adhoc_tool(
    name="codegen",
    output_files=["timequest_models.py"],
    runnable=":codegen_py",
)

experimental_wrap_as_python_sources(
    name="models",
    inputs=[":codegen"],
)

Then I can simply depend on the "models" target.

7 Views

Open in Slack

Previous Next