flat-zoo-31952
11/05/2020, 4:20 PMfiles
-like dependencies...i get how to create the target type, but what kind of object do i need to return in my rule to make it available as a dependency of other targets? GeneratedSources
maybe?hundreds-father-404
11/05/2020, 4:46 PMSources
subclass set alias = "_sources"
, which causes the field to be ignored in ./pants help
.
See https://github.com/pantsbuild/pants/blob/72d0409e61a58d2390390405d3a87a771feea61b/src/python/pants/core/target_types.py#L62-L186 for how we do a very similar setup with the relocated_files
targethundreds-father-404
11/05/2020, 4:47 PMflat-zoo-31952
11/05/2020, 7:41 PMhundreds-father-404
11/05/2020, 7:43 PMflat-zoo-31952
11/05/2020, 7:46 PMhundreds-father-404
11/05/2020, 7:47 PMBinaryPaths
abstraction to safely locate binaries like tar
and unzip
on your machineflat-zoo-31952
11/05/2020, 8:17 PMExtractedFile
get:
Exception: WithDeps(Inner(InnerEntry { params: {DownloadFileRequest}, rule: Task(Task { product: GeneratedSources, can_modify_workunit: false, clause: [DownloadFileRequest], gets: [Get { output: PyDigest, input: DownloadFile }, Get { output: ExtractedArchive, input: PyDigest }, Get { output: Snapshot, input: PyDigest }], func: jumiopants.download_files:59:download_file(), cacheable: true, display_info: DisplayInfo { name: "jumiopants.download_files.download_file", desc: None, level: Trace } }) })) did not declare a dependency on JustGet(Get { output: Snapshot, input: ExtractedArchive })
hundreds-father-404
11/05/2020, 8:18 PMflat-zoo-31952
11/05/2020, 8:18 PMhundreds-father-404
11/05/2020, 8:19 PMGet
is different than what’s expected. If you have Get(Snapshot, ExtractedArchive, "not an extracted archive object")
for examplehundreds-father-404
11/05/2020, 8:20 PMawait Get(Snapshot, Digest, my_extracted_archive)
?flat-zoo-31952
11/05/2020, 8:20 PMflat-zoo-31952
11/05/2020, 8:20 PMhundreds-father-404
11/05/2020, 8:21 PMokay i kinda get that from what’s being saidIt’s a horrible horrible error message. 2.1.0.dev0 revamps it all
flat-zoo-31952
11/05/2020, 8:21 PMflat-zoo-31952
11/05/2020, 8:22 PMhundreds-father-404
11/05/2020, 8:24 PMflat-zoo-31952
11/05/2020, 8:53 PMrepl
goal?flat-zoo-31952
11/05/2020, 8:54 PMrun
goals, but for repl it's buried in .pants.d/
flat-zoo-31952
11/05/2020, 9:06 PMrun
goal file?hundreds-father-404
11/05/2020, 9:08 PM.pants.d
, and load the contents on PYTHONPATH
(via PEX_EXTRA_SYS_PATH
) iiuc. We use a tmpdir so that we don’t materialize things like codegen into your build root.
Unlike normally using a tmpdir in /tmp
, we use .pants.d
so that you can reach into files in your build root if you want, since both run
and repl
are never cached and we reason it’s safer to be less hermetichundreds-father-404
11/05/2020, 9:08 PM.pants.d/tmp139131/src/python/project/asset.zip
hundreds-father-404
11/05/2020, 9:09 PMwrite-codegen
goal I prototyped last night. It could be relevant here, but has a possible dangerous consequence if implemented the way I did
I would love a) if you’d like that goal to begin with, and b) which of those two solutions sound better to youflat-zoo-31952
11/05/2020, 9:30 PMhundreds-father-404
11/05/2020, 9:30 PMflat-zoo-31952
11/05/2020, 9:31 PMflat-zoo-31952
11/05/2020, 9:33 PMflat-zoo-31952
11/05/2020, 9:35 PMfiles
target into a resources
target?flat-zoo-31952
11/05/2020, 9:35 PMhundreds-father-404
11/05/2020, 9:36 PMResourcesSources
, rather than FilesSources
?flat-zoo-31952
11/05/2020, 9:47 PMflat-zoo-31952
11/05/2020, 9:48 PMhundreds-father-404
11/05/2020, 9:50 PMresources
rather than files
? They’re very similar, only that resources
will have the source root included in the PYTHONPATH. (https://www.pantsbuild.org/docs/resources describes this more)flat-zoo-31952
11/05/2020, 9:55 PMhundreds-father-404
11/05/2020, 9:56 PMpkgutil.get_data()
and not need to worry about .pants.d
, rather than using open()
, right?hundreds-father-404
11/05/2020, 9:57 PMrelocated_files
won’t work how we want with those two goals. I’ve never understood very well running repl
and run
in .pants.d
, and I wonder if it was a mistake and we should run in a clean tmpdir like normal)flat-zoo-31952
11/05/2020, 9:59 PMMODEL_PATH
to tell ML inference code where to find model artifacts. These models can be COPY'd into docker images, mounted in there, or loaded via other processes, depending on the deployment. Unfortunately, we don't currently have a good way of emulating this locally to experiment or test (run/repl/test goals, maybe raul's jupyter goal would be nice too).flat-zoo-31952
11/05/2020, 10:00 PMpython_library(name="model-loaded")
target that depends on both the inference code's python_library(name="lib")
target and the download_file(name="model-v1")
target i'm adding for this artifactflat-zoo-31952
11/05/2020, 10:04 PM./pants repl path/to/build:model-loaded
would download the model and insert it into some predictable path within the execution environment so that someone could load it. I think i'd actually rather this happen via open()
than pkgutil
, but I need some way for the execution process to have access to that informationflat-zoo-31952
11/05/2020, 10:05 PMflat-zoo-31952
11/05/2020, 10:07 PMPath('.pants.d').glob('**/my_file')
but that's clearly a hackhundreds-father-404
11/05/2020, 10:11 PMrun
and repl
run in the build root under a tmpdir isn’t worth it. There’s this issue you’re running into with code generated files
, and there’s also an issue that it’s confusing to have both .pants.d/tmp131/src/python/project/app.py
and normal src/python/project/app.py
. From the user’s perspective, how are those different?
We went with running in the build root under the idea that repl
is more experimental, so it’s fine to break hermiticity. But I don’t know if that is worth it. And we don’t use that logic with test --debug
cc @happy-kitchen-89482 what do you think? @hundreds-breakfast-49010 and @witty-crayon-22786 toowitty-crayon-22786
11/05/2020, 10:16 PMhundreds-father-404
11/05/2020, 10:17 PMrun
be in the build root now. We wanted scripts that write to the build root to work. They wouldn’t if they ran in a tmpdirflat-zoo-31952
11/05/2020, 10:24 PMdist
as a solid option, as it's 1) predictable 2) very likely to be in .gitignore 3) easy for users to clear when they don't need it anymore (a good rm -rf build dist
from time to time never hurt anyone)hundreds-father-404
11/05/2020, 10:26 PMwrite-codegen
goal is helpful for multiple reasons, and I appreciate your validation that being in dist/
is good.
But still, you pointed out relocated_files
isn’t going to work how we want with run
and repl
, which I want to figure out.witty-crayon-22786
11/05/2020, 10:27 PM.pants.d/tmp**
or dist
for the purposes of `repl`/`run`, so i don’t think the CWD of repl/run are relevant hereflat-zoo-31952
11/05/2020, 10:27 PMwitty-crayon-22786
11/05/2020, 10:27 PMdist
feels orthogonal to the usecase, if the usecase is `repl`/`run`flat-zoo-31952
11/05/2020, 10:28 PMdist
is better than .pants.d/tmp**
IMO because i don't have to glob or search to find my codewitty-crayon-22786
11/05/2020, 10:28 PMflat-zoo-31952
11/05/2020, 10:29 PMwitty-crayon-22786
11/05/2020, 10:29 PMhundreds-father-404
11/05/2020, 10:30 PMyou shouldn’t have to regardless if you use codegenFor things that you load via Python’s module system, agreed. You can use
import
and pkgutil.get_data()
to use things in .pants.d/tmp1323
without issue.
But we want to be able to use open()
instead. And it’s important imo that calling open()
on a relocated_files
target would work with repl
and run
, too. Right now, you need to prefix the path with .pants.d/tmp123
for generated files (not resources)
And the only reason you don’t need that prefix for non-generated files is that you’re reading from the actual build root, rather than the .pants.d
tmpdirwitty-crayon-22786
11/05/2020, 10:31 PMfiles
are loaded for run
is … an open question independent of codegen, right?flat-zoo-31952
11/05/2020, 10:31 PMopen()
hundreds-father-404
11/05/2020, 10:32 PMso with regard to FilesSources, how files are loaded for run is … an open question independent of codegen, right?Yep. It works for non codegen, but only because we’re leaking into the actual build root. It fails for any code genned files, including
relocated_files
targetswitty-crayon-22786
11/05/2020, 10:32 PMtest
or package
, you would get those loose files in a well-known relative pathwitty-crayon-22786
11/05/2020, 10:32 PMrun
.flat-zoo-31952
11/05/2020, 10:32 PMwitty-crayon-22786
11/05/2020, 10:33 PMrun
because of how parallel it is to package
… repl is more interesting, but probably still possible.flat-zoo-31952
11/05/2020, 10:38 PMrepl
is a really compelling usecase for anybody who works with binary data and needs a team to play with it somewhat regularly without clogging up a repo...you can have external programs, collections of files, neural network models, etc, all kinda fake version controlled by virtue of the checksum you have to store in-repo, but that don't need to be downloaded until you have the particular need to work with that targetflat-zoo-31952
11/05/2020, 10:39 PMflat-zoo-31952
11/05/2020, 10:39 PMwitty-crayon-22786
11/05/2020, 10:57 PMwitty-crayon-22786
11/05/2020, 11:03 PMhundreds-father-404
11/05/2020, 11:04 PMwitty-crayon-22786
11/05/2020, 11:46 PMwitty-crayon-22786
11/05/2020, 11:47 PMhundreds-father-404
11/06/2020, 10:38 PMhundreds-father-404
11/10/2020, 3:00 AMexport-codegen
goalflat-zoo-31952
11/10/2020, 3:04 AMhundreds-father-404
11/10/2020, 3:07 AMis that about exporting the results of code generation to filesystem for, say, inclusion in the repo?This, so that you can access files at
dist/codegen
.
The bigger problem still exists at https://github.com/pantsbuild/pants/issues/11109. For example, the above workaround has an issue that if you forget to manually regenerate the files, your `repl`/`run` will be using stale fileswitty-crayon-22786
11/10/2020, 3:38 AM