Hi all! I have a pickle with files materialization...
# general
g
Hi all! I have a pickle with files materialization. My BUILD file looks like this:
Copy code
python_sources(
    name="foo",
    dependencies=[
        ":model.safetensors",
    ],
)
file(
    name="model.safetensors",
    source=http_source(
        filename="model.safetensors",
        url="<https://huggingface.co/iiiorg/piiranha-v1-detect-personal-information/resolve/main/model.safetensors>",
        len=1112954872,
        sha256="3d503b6804dd734b8f5c9c98c7c38c431cba99778da53987a6945cf1503f3b82",
    ),
)
And when I run
pants repl app:foo
, the file
model.safetensors
is downloaded, but the repl runs inside project instead of a sandbox with the
model.safetensors
present. The only way I found to materialize files and use them is to use
run_shell_command
and symlinking sandboxed files to repo's root dir
ln -s {{chroot}}/app/model.safetensors model.safetensors
. ChatGPT/Google - no answers to this question. Has anyone found workaround?
1
So in the end I used environments-preview to map big files to symlinks (with shell-command), so they are not unnecessarily copied for every sandbox. Rest of the files (including symlinks) I simply copied (with single run_shell_command for dev env preparation) from sandbox to project root and .gitignore'd them. So now I have everything I need in the project root and repl easily.
In the future I may work on making Pants more DX friendly, because figuring this out was pain
h
Glad you found a solution! As mentioned in another thread,
repl
(and
run
) have the workspace as their
cwd
by design. The typical use of
file()
is for another process to use them as input, which is why they get downloaded into a temporary sandbox (and are placed in the lmdb_store). Just to clarify my understanding: it sounds like you don’t actually need the files in a sandbox at all? You just want them in the workspace all along? Your solution sounds reasonable for that, with the caveat that I’m not sure what happens if you delete a symlink. Do things invalidate properly for you in that case?
Oh, and why symlink instead of mv? I’m surprised that works, since sandboxes typically get cleaned up at the end of a run.
g
@happy-kitchen-89482 I guess I need sandboxing for builds using those files, but having them easily accessible by other devs for repl is also very useful, so they don't have to download them manually and we can just leverage file() command. Ah yeah, I don't symlink to file inside sandbox, I copy the file to local dir and then output symlink as file artifact.
Thankfully Docker automatically resolves symlinks during builds so all is good
h
Ah, makes sense