careful-controller-15791
10/15/2024, 9:35 AMclass DvcTarget(Target):
alias = "dvc_files"
core_fields = (
*COMMON_TARGET_FIELDS,
SingleSourceField,
)
@dataclass(frozen=True)
class PutativeDvcTargetsRequest(PutativeTargetsRequest):
pass
@rule(level=LogLevel.DEBUG, desc="Determine candidate dvc targets to create")
async def find_putative_targets(
req: PutativeDvcTargetsRequest,
all_owned_sources: AllOwnedSources,
# python_setup: PythonSetup,
) -> PutativeTargets:
pts: List[PutativeTarget] = []
all_dvc_files_globs: PathGlobs = req.path_globs("*.dvc")
all_dvc_files = await Get(Paths, PathGlobs, all_dvc_files_globs)
new_dvc_files = set(all_dvc_files.files) - set(all_owned_sources)
for file in new_dvc_files:
logger.info(f"Found dvc file {file}")
dirname = os.path.dirname(file)
file_name = os.path.basename(file)
file_base = os.path.splitext(file_name)[0]
pts.append(
PutativeTarget.for_target_type(
DvcTarget,
dirname,
name=file_base,
triggering_sources=[file],
kwargs={"source": file_name},
)
)
return PutativeTargets(pts)
I just don't know which classes I have to use to tell it to include the files from the dvc file. My goal is that I can add these dvc targets as dependencies to python scripts. This is my current attempt but I think I'm using the wrong components
class GenerateDvcFileRequest(GenerateSourcesRequest):
input = SingleSourceField
output = FileSourceField
@rule
async def generate_dvc_file(
request: GenerateDvcFileRequest,
) -> GeneratedSources:
sources_requests = await Get(
DigestContents, Digest, request.protocol_sources.digest
)
assert len(sources_requests) == 1
sources_request = sources_requests[0]
source_yaml = yaml.load(sources_request.content, Loader=yaml.FullLoader)
if not isinstance(source_yaml, dict):
raise ValueError(f"Invalid yaml file")
logger.info(f"source_yaml: {source_yaml}")
wdir = source_yaml.get("wdir", ".")
outs = source_yaml.get("outs", [])
files = [o.get("path") for o in outs]
file_digests = await MultiGet(
Get(Digest, PathGlobs, PathGlobs([f for f in files])) for f in files
)
snapshot = await Get(Snapshot, Digest, request.protocol_sources.digest)
return GeneratedSources(snapshot)
curved-television-6568
10/15/2024, 12:40 PMDvcTarget
class needs to derive from ResourceTarget
https://github.com/pantsbuild/pants/blob/d955f0b6c4914f367d54a50fe3a7270f39da84e8/src/python/pants/core/target_types.py#L552 in order for it to be included as a python resource (to behave as https://www.pantsbuild.org/stable/docs/using-pants/assets-and-archives#resources ). If you prefer them to be treated as file
target, adjust the base class accordingly. 😉careful-controller-15791
10/15/2024, 1:31 PMclass DvcTarget(FileTarget):
alias = "dvc_files"
So the goal would be to then look at the content of the .dvc file, and including all the files that are loaded by it. This means reading the yaml. Looking at the wdir
and outs: path
to find all the files/folders that belong to this. Glob them. And add them to the Snapshot.
I just don't seem to be able to do this part. I'm completely new to pantsbuild and it's not very clear to me which Request I should use for this. I tried to following but it doesn't seem to be called even when I add the dvc_files
as an explicit dependency to a target and run it
class DvcGeneratorTarget(FilesGeneratorTarget): ...
@rule
async def generate_dvc_file(
request: FilesGeneratorTarget,
) -> GeneratedSources:
raise NotImplementedError()
gorgeous-winter-99296
10/15/2024, 7:20 PMGenerateSourcesRequest
derivative, which declares the to/from source types of your rule.
https://github.com/tgolsson/pants-backends/blob/1c49594919b62ce418d864564d0a84e31c927dfb/pants-plugins/kustomize/pants_backend_kustomize/codegen.py#L49-L59gorgeous-winter-99296
10/15/2024, 7:29 PMpants package foo:bar
independently and your code generator is just "Build packages and forward". There's pants export-codegen
which can also help with debugging.