On pants 2.23.0, we had a developer commit a pytho...
# general
h
On pants 2.23.0, we had a developer commit a python .pkl file that was collected under a files target for some unit test data. It ended up causing pants to crash when using the —changed-since flag with the following error
Copy code
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 28463: invalid start byte
Since committing binary files isn’t best practice anyway, we got around it by finding an alternative way to track the data file. But I’m still curious why the pants tooling needs to decode a file in order to determine what has changed. Is this expected?
w
That doesnt feel right. Even if it was binary, should be a bunch of who cares. Looks like it was decoded as utf-8 Do you have a stack trace of that?
Also,
files
not
resources
?
h
Would have to check again but was just guessing files since it was test data and not something you’d package up
Checking on permission for sharing the trace
w
👍 My guess, without any research would be that it's getting hydrated for some reason, or it's run through the
fs
content path - although, my assumption was that it should just get straight copied into a sandbox
h
Here's the full details the user had. It's pretty easy to recreate, so I can get more details if this doesn't have anything useful.
Untitled
Also sounds like this might be worthy of a bug ticket, so I can spin that up with some repro steps if that's useful.