Hey I'm getting `Filesystem changed during run: re...
# general
r
Hey I'm getting
Filesystem changed during run: retrying
when trying to call
workspace.write_digest
from a goal rule. Looking through
.pants.d/pants.log
but nothing is popping out at me as the issue. The digest I am trying to write is just an empty directory:
Copy code
dagster_home_path = dist_dir.relpath / "dagster_home"
dagster_home_digest = await Get(Digest, CreateDigest([Directory(str(dagster_home_path))]))
logger.debug(f"Creating dagster home directory at {dagster_home_path}")
workspace.write_digest(dagster_home_digest)
The directory is being successfully created but then the goal rule retries due to filesystem change.
h
Hmm, Pants shouldn't be filewatching the dist_dir
What does
dagster_home_path
end up evaluating to?
r
Just
dist/dagster_home
-
/Users/nick.dellosa/Projects/data-platform/dist/dagster_home
is the absolute path
h
And are you overriding pants_ignore/pants_ignore_use_gitignore in your pants.toml?
r
dist is in my gitignore but I am not overriding pants_ignore at all
h
pants_ignore ignores dist by default anyway
so this is mysterious
r
This is the only thing that stands out to me in pants.log though
Copy code
15:39:19.05 [INFO] pantsd 2.16.0rc0 running with PID: 8365
15:39:19.15 [INFO] handling request: `--print-stacktrace dagster ::`
15:39:33.76 [INFO] request completed: `--print-stacktrace dagster ::`
15:40:21.79 [INFO] notify invalidation: cleared 0 and dirtied 0 nodes for: {"pants-plugins/plugins/dagster/goals.py"}
15:40:21.79 [INFO] notify invalidation: cleared 1 and dirtied 2 nodes for: {"pants-plugins/plugins/dagster/goals.py"}
15:40:21.79 [ERROR] saw filesystem changes covered by invalidation globs: content changed (Digest('9b61bab483e5f51b9e1b551b38c18b6107fc67c36ef076955bca0bd855d60795', 258) fs Digest('7f68cd243874fee0d32a5ffe1850bfc5f3307f0a42efa7dcc779948ef995f40d', 258)). terminating the daemon.
15:40:22.67 [ERROR] service failure for <pants.pantsd.service.scheduler_service.SchedulerService object at 0x442574a910>.
I added more log statements and it looks like it's actually the interactive process I am trying to launch that is causing the change detection to trigger ('dagster process' is the last log statement I get before restart):
Copy code
dagster_home_path = dist_dir.relpath / "dagster_home"
        dagster_home_digest = await Get(Digest, CreateDigest([Directory(str(dagster_home_path))]))
        logger.debug(f"Creating dagster home directory at {dagster_home_path}")
        workspace.write_digest(dagster_home_digest)
        logger.debug("dir created")
        dagster_process = await Get(
            Process,
            VenvPexProcess(
                dagster_pex,
                argv=("dev", *process_opts),
                input_digest=await Get(Digest, MergeDigests([sources_digest, dagster_home_digest])),
                extra_env=FrozenDict({"DAGSTER_HOME": dagster_home_path.absolute()}),
                description="Run dagit",
            ),
        )
        logger.debug("dagster process")
        result = await Effect(
            InteractiveProcessResult, InteractiveProcess, InteractiveProcess.from_process(dagster_process)
        )
w
that’s likely a panic occurring in a background thread. do you see a useful error message with
--no-pantsd
?
r
Nope, same thing unfortunately.
Could it have something to do with the process I'm launching writing to an absolute path on the filesystem?
Yeah it looks like that's the issue, I was able to configure it to just use a temp directory and it worked. I think that's unintended though, because the absolute path I was passing was in my
dist
directory?
I'm also passing a number of files to this process as in input digest but looks like my process can't access them. They are also not present in the sandbox. Do I need to explicitly move them to the sandbox so they are available under the same relative path there that they would be in my workspace? I was under the impression that that's what passing them as the input digest would do.
Also, is it possible to get the current sandbox path inside of a rule?
h
Yes, anything in the input digest should get automatically written to the equivalent relpath in the sandbox before the process executes. As you say, that's what the input digest is for.
How certain are you that the input digest contains the files you think it does? And are you sure they didn't get written into the sandbox, but in an unexpected location?
r
Ah that was the issue! Looks like the paths were off relative to the build root.
h
Ah great. Does that also solve the
Filesystem changed during run
issue, or just the input digest issue?
r
Just the input digest issue. I can get around the filsystem changed issue by not provider dagster (the process I'm trying to run) with a home directory to store its files, which causes it to create a temp directory inside my sandbox. However, I'd like to preserve these files if possible, which is why I'm interested in getting the absolute path of the sandbox within my rule.
Following back up on this, I've actually got what I need working if I include a couple more things on my
PYTHONPATH
when I run the pex shim. However, providing
PYTHONPATH
as an
extra_env
variable to
VenvPexProcess
doesn't seem to work. I also see that there is a
PEX_EXTRA_SYS_PATH
option for building the pex, but I don't see a good way to use that in a
PexRequest
.
h
What are you adding to your PYTHONPATH? That seems like a hack that probably has a more straightforward solution
r
So dagster already has a built-in workspace refresh button, which is really convenient for local testing because I can edit my code and click the button and voila, my changes appear. My presumption is this would not work if my source files are built into a pex. So what I am trying to do is build my third party dependencies into a pex (namely, dagster itself) and then launch it in a way where it can find my source files. Which I realize may be a bit against the grain of how pants likes to do things. Ideally I'd also want to be able to preserve dagster's home directory too but every time I do that pants restarts the process due to filesystem change, even if I put it in an ignored directory like
dist
.
Oh my god...I just got it working but the way I got it working is weird. I was specifying my modules using the command line arguments (for dagster, not for Python, so the full command is
./pex_shim dagster dev -m module_name
)
-m module_name
and for whatever reason, that was causing it to recognize the module name as ' module_name' instead of just 'module_name'.
h
I see. Interestingly
pants run path/to/file.py
works like that - it builds a reusable third-party only pex, but takes sources directly from the source tree.
So you use
-mmodule_name
?
r
--module-name=mymodule
worked, but
-m mymodule
did not due to added whitespace.
-m
is shorthand for
--module-name
.
h
Yes, I think
-mmodule_name
would also work
without the space
Or maybe not because it would think all the letters were flags
either way,
--module-name
seems better anyway
and has the advantage of working...