Having a couple issues getting started with builds...
# general
n
Having a couple issues getting started with builds. Have a test project called casper w/
./src/casper/<pkg, mod, etc.>
structure and
"src/"
pattern for root. When using
./pants run <target>
located in
./src/casper/pkg/subpkg
, I get an module not found error on
import casper.pkg.subpkg
I would have assumed Pants would place
./src
in the
PYTHONPATH
(or perhaps install the package in the hermetic env to make it self-importable)? I don't know if it makes a difference, but the import was in
__init__.py
of
pkg
. The second issue is that when I make any changes to the offending file, it seems Pants does not pick those changes up -- build and run goals keep using the same depended on file.
e
1st issue: run
./pants roots
to see what Pants guesses for
PYTHONPATH
entries.
h
but the import was in init.py of pkg
It might be this gotcha about
__init__.py
files: https://www.pantsbuild.org/v2.8/docs/python-backend Have you seen that yet?
n
@enough-analyst-54434 I get back
src
and
tests
and so
"./src"
should be on the path, thus
import casper.pkg.subpkg
possible? Curiously, when packaging, it gives me
./dist/src.casper.pkg.subpkg/target_module.pex
. Is the
src.
prefix an indication that it only put
./
on the path?
h
./dist/src.casper.pkg.subpkg/target_module.pex .  Is the src. prefix an indication that it only put ./  on the path?
That part is a red herring. See the
output_path
field https://www.pantsbuild.org/v2.8/docs/reference-pex_binary#codeoutput_pathcode
n
@hundreds-father-404 Gotcha. Per your suggestion, I enabled the first option, but the second option python.tailor_ignore_solitary_init_files = false appears to only be available on 2.8 (using 2.7.1). Even with the first option enabled and recreating BUILD files and trying to change the init file into a blank, the build/package goals still use the original init file with the bad import. Is pants not invalidating the cache of that file when I make changes?
I attempted this again on a fresh tailor of ./src/casper/pkg2/subpkg2 with only vacuous init files, and having the same import casper.xx issues. Also having the same issue of Pants using the original files to build/package despite changes made.
So tried again w/ a package that only imports stdlib pkgs and that works. I tried one that imports a third party package pandas and having import issues. Maybe I am misunderstanding some things, so I'll go back and read over the documentation before posting more, but I thought that Pants would infer the dependencies and from that create an isolated venv with required dependencies and then perform the run goal (and package that venv in a pex file when disting). Admittedly I was probably being too optimistic here, and I see now there is some mention of having to teach Pants about your dependencies through a requirements.txt file, and I did not begin to look at the target-specific dependency declarations yet.
Ah, so: https://www.pantsbuild.org/v2.8/docs/python-third-party-dependencies#requirementstxt. Dependency inference is only done within the universe you define up-front.
Is it a feature for Pants to not warn you about an import it cannot find in your declarations during the import inference stage? Is it meant to be caught only at run-time?
e
Yeah - we do not do magic - you have to declare 3rdparty dependencies via requirements or elsewise - we will not guess a version for you.
👍 2
As to the silence on missed imports - that was just made better 2 days ago: https://github.com/pantsbuild/pants/pull/13491
👍 1
h
Yeah no plans to backport, we'll be doing a stable release of 2.8 likely on Monday assuming no bugs are reported before then Taylor to check, are things working now? Anything we can help with?
n
I found the macro to use requirements.txt, but again stuck in the situation where Pants won't invalidate its caches when I make file changes. I wonder if this has anything to do with my files being checked out in my NFS home directory, but the .cache/ being on the local host in /var/tmp?
e
I don't think it has to do with separate locations. I'm almost positive it is exactly due to NFS.
You should be able to quickly test this, but we use inotify and the like and these do not see NFS edits (since they're remote) IIUC.
n
Is there any way to “wake it up” (like an event loop)?
f
discussion on stack overflow points at it being because NFS does not have protocol support for notification events: https://stackoverflow.com/questions/4231243/inotify-with-nfs
(since NFS is several decades old and predates even the existence of the Linux kernel)
n
Right, but can Pants be instructed to scan/recompute hashes for a given file, dir, project manually? Or maybe it needs a polling feature. For example, PyCharm complains about file syncing because it can't place file watchers on NFS mounts, but we still manage because it periodically polls the project files and you can force resync.
f
probably needs an opt-in polling feature
👍 1
n
In the meantime, kill whatever server processes listening for file changes Pants gins up so it is forced to reexamine the files?
e
Yeah, which is easier accomplished with
--no-pantsd
. Performance will not be good though.
1
n
Does that option disable caching? Is caching and killing more optimal if willing to trade longer init times w/ faster goal completion?
So I tried both ways: w/ --no-pantsd (working now!) and not setting the option, but modifying the pants script to kill the sever pid at exit. Since the proto project is trivial, I don't know if "poor performance" just means slow init time or if Pants is starting from scratch on each goal run?
Also, is the inotify logic in the rust or python codebase? Wondering how difficult architecturally polling would be to implement (or force resync) for the server.
h
It does not disable caching on-disk, only memoization in-memory. But that memoization is pretty important to avoid doing things like "Creating module mapping" for dependency inference Rather than killing the server, it'd be simpler to use
--no-pantsd
so you don't start the server in the first place It is implemented in Rust. I'm not familiar enough w/ that part of the code to know how tricky it would be, but definitely possible!
maybe open a feature request to support NFS by adding polling for file watching? Can link to https://github.com/pantsbuild/pants/issues/10842, where we talk about adding polling for paths outside of the "build root" (the repo)
n
@hundreds-father-404 Thanks! I'll add this request to my list of things I'm keeping as we delve into this over the weekend. I think local performance is really important (for developers to run goals before committing), but our CI jobs would always be starting from disk (at best) and so the initialization lag is already going to be a fact of life there. They download the Git workspace and at some point run your custom build logic. The intention is to have Pants already bootstrapped on an NFS drive the build uid can access, so when it starts running goals using the pants script, it will not need to re-bootstrap itself and have access to the cache, which will hopefully be in a state based on the last successful release. Still need to sort out the details though. If you have any examples of people deploying Pants under similar constraints that would be helpful!
e
I'm slightly confused by your scenario description since it sounds ~opposite of what we've been working through here, where the code repo is on NFS. Here you describe ~/.cache/pants on NFS IIUC. Pants has native support for a remote / shared cache: https://www.pantsbuild.org/docs/remote-caching-execution. The Pants OSS CI jobs dogfood this with a remote cache hosted / donated by Toolchain in fact.