flat-zoo-31952
11/05/2020, 4:20 PMfiles-like dependencies...i get how to create the target type, but what kind of object do i need to return in my rule to make it available as a dependency of other targets? GeneratedSources maybe?hundreds-father-404
11/05/2020, 4:46 PMSources subclass set alias = "_sources", which causes the field to be ignored in ./pants help.
See https://github.com/pantsbuild/pants/blob/72d0409e61a58d2390390405d3a87a771feea61b/src/python/pants/core/target_types.py#L62-L186 for how we do a very similar setup with the relocated_files targethundreds-father-404
11/05/2020, 4:47 PMflat-zoo-31952
11/05/2020, 7:41 PMhundreds-father-404
11/05/2020, 7:43 PMflat-zoo-31952
11/05/2020, 7:46 PMhundreds-father-404
11/05/2020, 7:47 PMBinaryPaths abstraction to safely locate binaries like tar and unzip on your machineflat-zoo-31952
11/05/2020, 8:17 PMExtractedFile get:
Exception: WithDeps(Inner(InnerEntry { params: {DownloadFileRequest}, rule: Task(Task { product: GeneratedSources, can_modify_workunit: false, clause: [DownloadFileRequest], gets: [Get { output: PyDigest, input: DownloadFile }, Get { output: ExtractedArchive, input: PyDigest }, Get { output: Snapshot, input: PyDigest }], func: jumiopants.download_files:59:download_file(), cacheable: true, display_info: DisplayInfo { name: "jumiopants.download_files.download_file", desc: None, level: Trace } }) })) did not declare a dependency on JustGet(Get { output: Snapshot, input: ExtractedArchive })hundreds-father-404
11/05/2020, 8:18 PMflat-zoo-31952
11/05/2020, 8:18 PMhundreds-father-404
11/05/2020, 8:19 PMGet is different than what’s expected. If you have Get(Snapshot, ExtractedArchive, "not an extracted archive object") for examplehundreds-father-404
11/05/2020, 8:20 PMawait Get(Snapshot, Digest, my_extracted_archive)?flat-zoo-31952
11/05/2020, 8:20 PMflat-zoo-31952
11/05/2020, 8:20 PMhundreds-father-404
11/05/2020, 8:21 PMokay i kinda get that from what’s being saidIt’s a horrible horrible error message. 2.1.0.dev0 revamps it all
flat-zoo-31952
11/05/2020, 8:21 PMflat-zoo-31952
11/05/2020, 8:22 PMhundreds-father-404
11/05/2020, 8:24 PMflat-zoo-31952
11/05/2020, 8:53 PMrepl goal?flat-zoo-31952
11/05/2020, 8:54 PMrun goals, but for repl it's buried in .pants.d/flat-zoo-31952
11/05/2020, 9:06 PMrun goal file?hundreds-father-404
11/05/2020, 9:08 PM.pants.d, and load the contents on PYTHONPATH (via PEX_EXTRA_SYS_PATH) iiuc. We use a tmpdir so that we don’t materialize things like codegen into your build root.
Unlike normally using a tmpdir in /tmp, we use .pants.d so that you can reach into files in your build root if you want, since both run and repl are never cached and we reason it’s safer to be less hermetichundreds-father-404
11/05/2020, 9:08 PM.pants.d/tmp139131/src/python/project/asset.ziphundreds-father-404
11/05/2020, 9:09 PMwrite-codegen goal I prototyped last night. It could be relevant here, but has a possible dangerous consequence if implemented the way I did
I would love a) if you’d like that goal to begin with, and b) which of those two solutions sound better to youflat-zoo-31952
11/05/2020, 9:30 PMhundreds-father-404
11/05/2020, 9:30 PMflat-zoo-31952
11/05/2020, 9:31 PMflat-zoo-31952
11/05/2020, 9:33 PMflat-zoo-31952
11/05/2020, 9:35 PMfiles target into a resources target?flat-zoo-31952
11/05/2020, 9:35 PMhundreds-father-404
11/05/2020, 9:36 PMResourcesSources, rather than FilesSources?flat-zoo-31952
11/05/2020, 9:47 PMflat-zoo-31952
11/05/2020, 9:48 PMhundreds-father-404
11/05/2020, 9:50 PMresources rather than files? They’re very similar, only that resources will have the source root included in the PYTHONPATH. (https://www.pantsbuild.org/docs/resources describes this more)flat-zoo-31952
11/05/2020, 9:55 PMhundreds-father-404
11/05/2020, 9:56 PMpkgutil.get_data() and not need to worry about .pants.d, rather than using open(), right?hundreds-father-404
11/05/2020, 9:57 PMrelocated_files won’t work how we want with those two goals. I’ve never understood very well running repl and run in .pants.d, and I wonder if it was a mistake and we should run in a clean tmpdir like normal)flat-zoo-31952
11/05/2020, 9:59 PMMODEL_PATH to tell ML inference code where to find model artifacts. These models can be COPY'd into docker images, mounted in there, or loaded via other processes, depending on the deployment. Unfortunately, we don't currently have a good way of emulating this locally to experiment or test (run/repl/test goals, maybe raul's jupyter goal would be nice too).flat-zoo-31952
11/05/2020, 10:00 PMpython_library(name="model-loaded") target that depends on both the inference code's python_library(name="lib") target and the download_file(name="model-v1") target i'm adding for this artifactflat-zoo-31952
11/05/2020, 10:04 PM./pants repl path/to/build:model-loaded would download the model and insert it into some predictable path within the execution environment so that someone could load it. I think i'd actually rather this happen via open() than pkgutil, but I need some way for the execution process to have access to that informationflat-zoo-31952
11/05/2020, 10:05 PMflat-zoo-31952
11/05/2020, 10:07 PMPath('.pants.d').glob('**/my_file') but that's clearly a hackhundreds-father-404
11/05/2020, 10:11 PMrun and repl run in the build root under a tmpdir isn’t worth it. There’s this issue you’re running into with code generated files, and there’s also an issue that it’s confusing to have both .pants.d/tmp131/src/python/project/app.py and normal src/python/project/app.py. From the user’s perspective, how are those different?
We went with running in the build root under the idea that repl is more experimental, so it’s fine to break hermiticity. But I don’t know if that is worth it. And we don’t use that logic with test --debug
cc @happy-kitchen-89482 what do you think? @hundreds-breakfast-49010 and @witty-crayon-22786 toowitty-crayon-22786
11/05/2020, 10:16 PMhundreds-father-404
11/05/2020, 10:17 PMrun be in the build root now. We wanted scripts that write to the build root to work. They wouldn’t if they ran in a tmpdirflat-zoo-31952
11/05/2020, 10:24 PMdist as a solid option, as it's 1) predictable 2) very likely to be in .gitignore 3) easy for users to clear when they don't need it anymore (a good rm -rf build dist from time to time never hurt anyone)hundreds-father-404
11/05/2020, 10:26 PMwrite-codegen goal is helpful for multiple reasons, and I appreciate your validation that being in dist/ is good.
But still, you pointed out relocated_files isn’t going to work how we want with run and repl, which I want to figure out.witty-crayon-22786
11/05/2020, 10:27 PM.pants.d/tmp** or dist for the purposes of `repl`/`run`, so i don’t think the CWD of repl/run are relevant hereflat-zoo-31952
11/05/2020, 10:27 PMwitty-crayon-22786
11/05/2020, 10:27 PMdist feels orthogonal to the usecase, if the usecase is `repl`/`run`flat-zoo-31952
11/05/2020, 10:28 PMdist is better than .pants.d/tmp** IMO because i don't have to glob or search to find my codewitty-crayon-22786
11/05/2020, 10:28 PMflat-zoo-31952
11/05/2020, 10:29 PMwitty-crayon-22786
11/05/2020, 10:29 PMhundreds-father-404
11/05/2020, 10:30 PMyou shouldn’t have to regardless if you use codegenFor things that you load via Python’s module system, agreed. You can use
import and pkgutil.get_data() to use things in .pants.d/tmp1323 without issue.
But we want to be able to use open() instead. And it’s important imo that calling open() on a relocated_files target would work with repl and run, too. Right now, you need to prefix the path with .pants.d/tmp123 for generated files (not resources)
And the only reason you don’t need that prefix for non-generated files is that you’re reading from the actual build root, rather than the .pants.d tmpdirwitty-crayon-22786
11/05/2020, 10:31 PMfiles are loaded for run is … an open question independent of codegen, right?flat-zoo-31952
11/05/2020, 10:31 PMopen()hundreds-father-404
11/05/2020, 10:32 PMso with regard to FilesSources, how files are loaded for run is … an open question independent of codegen, right?Yep. It works for non codegen, but only because we’re leaking into the actual build root. It fails for any code genned files, including
relocated_files targetswitty-crayon-22786
11/05/2020, 10:32 PMtest or package, you would get those loose files in a well-known relative pathwitty-crayon-22786
11/05/2020, 10:32 PMrun.flat-zoo-31952
11/05/2020, 10:32 PMwitty-crayon-22786
11/05/2020, 10:33 PMrun because of how parallel it is to package … repl is more interesting, but probably still possible.flat-zoo-31952
11/05/2020, 10:38 PMrepl is a really compelling usecase for anybody who works with binary data and needs a team to play with it somewhat regularly without clogging up a repo...you can have external programs, collections of files, neural network models, etc, all kinda fake version controlled by virtue of the checksum you have to store in-repo, but that don't need to be downloaded until you have the particular need to work with that targetflat-zoo-31952
11/05/2020, 10:39 PMflat-zoo-31952
11/05/2020, 10:39 PMwitty-crayon-22786
11/05/2020, 10:57 PMwitty-crayon-22786
11/05/2020, 11:03 PMhundreds-father-404
11/05/2020, 11:04 PMwitty-crayon-22786
11/05/2020, 11:46 PMwitty-crayon-22786
11/05/2020, 11:47 PMhundreds-father-404
11/06/2020, 10:38 PMhundreds-father-404
11/10/2020, 3:00 AMexport-codegen goalflat-zoo-31952
11/10/2020, 3:04 AMhundreds-father-404
11/10/2020, 3:07 AMis that about exporting the results of code generation to filesystem for, say, inclusion in the repo?This, so that you can access files at
dist/codegen.
The bigger problem still exists at https://github.com/pantsbuild/pants/issues/11109. For example, the above workaround has an issue that if you forget to manually regenerate the files, your `repl`/`run` will be using stale fileswitty-crayon-22786
11/10/2020, 3:38 AM