With cacheing, is it roughly accurate to say - pan...
# general
n
With cacheing, is it roughly accurate to say ā€¢ pantds: invalidates at the file metadata level (e.g., from a
touch -a source.py
), and ā€¢ pants: invalidates at the file content (SHA hash) level (e.g., from an
echo -e "\n" >>source.py
)? Besides scanning the logs with
-ldebug
option, is there a way to see extra output regarding cache hits from both levels?
On closer experimenting, it does seem
pantsd
also only invalidates at the content level, as touching source files and re-running
package
results in the same execution time after the first uncached run. Only actually changing a file causes a rebuild. I think I was thrown off by
regenerate-lockfiles
immediately restarting if files were touched. Also, even with
pantsd
, it does not seem like this goal runs any faster between runs, even if nothing has changed. Is that intentional? Is there any way to force pants to consider a file changed even if its hash has not besides poking it with an
echo -e "\n" >>source.py
?
w
To the first comment, while I've never used this - could
stats
be leveraged somehow? https://www.pantsbuild.org/docs/reference-stats
n
Yes, I think that gets the job done! Thank you. Just a couple observations: 1. When
pantsd
is able to satisfy the request, all the metrics are 0 2. Output is pretty verbose (hard to compare between runs w/o lots of scrolling) because most rows are for remote cache metrics, even when one is not being utilized.
šŸ‘ 1
w
Interesting to see, I just discovered that subsystem, so I think I need to start using it too! šŸ™‚
n
Playing a bit more, the cache summary indeed reflects only what
pants
is doing, but if
pantsd
determines it needs a refreshed request, those requests are logged. Running an experiment w/ adding/removing
\n
or
touch
of
source.py
when packaing a
python_distribution
: 1. Run
package ./path:dist
: See approx. 1200 cache requests, many uncached 2. Re-run :
pantsd
makes no cache requests 3. Add
\n
:
pantsd
stales two cache requests and those requests are reported uncached, as expected 4. Rerun:
pantsd
makes no cache requests as expected 5. Remove
\n
:
pantsd
stales two cache requests and those cache requests are reported cached, as expected 6. Repeating removing/adding
\n
and running the command causes
pantsd
to continually stale the requests related to modifying
source.py
, but they are reported as being cached by
pants
itself If I re-run the above experiment with
touch
instead of modifying the file,
pantsd
never stales any of its requests -- so I guess this isn't enough to trigger
inotify
then? So my conclusion is now: 1.
pantsd
invalidates based on the file being changed from its current state -- that is, it's not enough to
touch
it, you have to touch && modify it (is it technically the condition under which
fsnotify
tells
pantsd
the file has changed? what is that precisely?) 2.
pantsd
invalidates at content level (hash), as expected
Note: Not trying to be overly pedantic, just want to know the mechanisms a bit better. Also, at the moment we are (temporarily!) customizing some BUILD files w/
open
operations (I know, bad), so want to know the easiest way to get
pants
/
pantsd
to consider those actually changed when the file they are opening to customize themselves is changed.
h
Re `generate-lockfiles`: Unlike most goals, that one is affected by the state of the world (e.g., the packages available on PyPI). So it reruns every time.