I am having the weirdest problem that I just canno...
# random
g
I am having the weirdest problem that I just cannot figure out, and I can't reproduce outside Pants. I'm adding a pants-hosted go-toolchain. As part of that I'm invoking the go command line to get the environment. That prints out a JSON structure that I decode and look at. But one of the strings is... malformed? Bugged? I don't know.
Copy code
The string '.go-1.21.6' is 36 long
Hex escaped: .go-1.21.6
As a list: ['/', 't', 'm', 'p', '/', 'p', 'a', 'n', 't', 's', '-', 's', 'a', 'n', 'd', 'b', 'o', 'x', '-', 'z', '7', 'F', '5', 'n', '1', '/', '.', 'g', 'o', '-', '1', '.', '2', '1', '.', '6']
After formatting: .go-1.21.6, len=36
[0]: / (47)
[1]: t (116)
[2]: m (109)
[3]: p (112)
[4]: / (47)
[5]: p (112)
[6]: a (97)
[7]: n (110)
[8]: t (116)
[9]: s (115)
[10]: - (45)
[11]: s (115)
[12]: a (97)
[13]: n (110)
[14]: d (100)
[15]: b (98)
[16]: o (111)
[17]: x (120)
[18]: - (45)
[19]: z (122)
[20]: 7 (55)
[21]: F (70)
[22]: 5 (53)
[23]: n (110)
[24]: 1 (49)
[25]: / (47)
[26]: . (46)
[27]: g (103)
[28]: o (111)
[29]: - (45)
[30]: 1 (49)
[31]: . (46)
[32]: 2 (50)
[33]: 1 (49)
[34]: . (46)
[35]: 6 (54)
Compare the length of the string with the contents! Where are all these characters coming from?! No matter what I do they are never printed when part of a string, and even multiple formatting-roundtrips preserves them. They clearly exist... but aren't visible. But they seem to be regular characters? I've never observed anything like this before. 🤯
If I rebuild the original string...
Copy code
[0]: / (47) total='/'
[1]: t (116) total='/t'
[2]: m (109) total='/tm'
[3]: p (112) total='/tmp'
[4]: / (47) total='/tmp/'
[5]: p (112) total='/tmp/p'
[6]: a (97) total='/tmp/pa'
[7]: n (110) total='/tmp/pan'
[8]: t (116) total='/tmp/pant'
[9]: s (115) total='/tmp/pants'
[10]: - (45) total='/tmp/pants-'
[11]: s (115) total='/tmp/pants-s'
[12]: a (97) total='/tmp/pants-sa'
[13]: n (110) total='/tmp/pants-san'
[14]: d (100) total='/tmp/pants-sand'
[15]: b (98) total='/tmp/pants-sandb'
[16]: o (111) total='/tmp/pants-sandbo'
[17]: x (120) total='/tmp/pants-sandbox'
[18]: - (45) total='/tmp/pants-sandbox-'
[19]: I (73) total='/tmp/pants-sandbox-I'
[20]: 9 (57) total='/tmp/pants-sandbox-I9'
[21]: q (113) total='/tmp/pants-sandbox-I9q'
[22]: C (67) total='/tmp/pants-sandbox-I9qC'
[23]: z (122) total='/tmp/pants-sandbox-I9qCz'
[24]: Y (89) total='/tmp/pants-sandbox-I9qCzY'
[25]: '
[26]: . (46) total='.'
[27]: g (103) total='.g'
[28]: o (111) total='.go'
[29]: - (45) total='.go-'
[30]: 1 (49) total='.go-1'
[31]: . (46) total='.go-1.'
[32]: 2 (50) total='.go-1.2'
[33]: 1 (49) total='.go-1.21'
[34]: . (46) total='.go-1.21.'
[35]: 6 (54) total='.go-1.21.6'
Something with the 25th character there breaks...
c
heh, it's almost like null-terminated string in reverse. I wonder if at some point there's a slice that's misbehaving, I remember you could get in trouble by taking slices of slices if one of the slices got reallocated because it grew too large (although that was 5 years ago). I curious about when you reconstructing the string as above, if you inject characters or take slices, does it always break at index 25 or at the '/' character? Also the fact that the line is just `'`is super-cursed, like it activated some control character. I wonder if you wrote it to a file (instead of through stdout to Pants to the terminal, I guess?). Something I saw just today was where some interaction of the WSL, Windows Terminal, bash, and possibly
jq
and
base64
caused some terminal output to just be redrawn over itself, so a multiline block just appeared as the last characters of each column. Or, what happens if you base64 it on the golang side? does golang believe that the string is actually 36chars, or not? I'm fascinated
oh no, wait, I'm like 34% sure, doesn't pants sometimes strip the sandbox path in certain output contexts?
g
Do you have any idea where that would happen? I also thought so but couldn't find where...
Fwiw this is all happening in a Python rule, in the go backend. But the string itself comes from a go process, and I imagine it is meant to be the full path. So the slicing seems very interesting...
c
weirdly intriguing... do you have a small repro you can share?
g
https://github.com/pantsbuild/pants/tree/ts/go-toolchain
Copy code
pants test src/python/pants/backend/go/dependency_inference_test.py
I'd removed all the debugging but line 114 in goroot.py has some new prints:
Copy code
.go-1.21.6
36
['/', 't', 'm', 'p', '/', 'p', 'a', 'n', 't', 's', '-', 's', 'a', 'n', 'd', 'b', 'o', 'x', '-', 'I', '9', 'q', 'C', 'z', 'Y', '/', '.', 'g', 'o', '-', '1', '.', '2', '1', '.', '6']
👀 1
I think for test output (from the thread I posted in #development) it's likely in part caused by https://github.com/pantsbuild/pants/pull/19923, but that wouldn't affect
print
which is what is used for those strings...
c
Ah, yea, I see you found that already
your prints are run during a pants test run, so why wouldn't they be?
g
Hmm. I was thinking since they're a rule, running in a rule runner, they wouldn't be part of the output of the Test itself
c
this worked to print, with added spaces:
Copy code
" G O R O O T " :   " / p r i v a t e / t m p / p a n t s - s a n d b o x - 1 k d 8 C i / . g o - 1 . 2 1 . 6 " ,
did you have an issue with GOROOT, or this just got you fish-sniped? (red herring reference there...)
to further prove my print point, this print has the same issue, clearly nothing with the data itself:
Copy code
print("/private/tmp/pants-sandbox-XQVpfm/.go-123")
g
I think I have an issue with Go; and this was probably a huge red herring. But even just stripping this it ends up being weird and failing to build. So I'll probably have to patch this to not strip it from logs, but rather replace it with something readable at least.
c
it was a really interesting case -lol (I'm sorry you spent time on it..)
g
Probably this should be fairly easy to opt-out from since you might also have bugs where a captured log has the wrong sandbox path, but you wouldn't know even if we normalize it
c
the simplifier class has options to not strip... but I didn't see a good/easy way to get to that option
g
There's probably also still a bug since line 25 is malformed, not just having removed the path
c
yea, if we knew the sandbox path exactly, we could be more precise in only redacting that one, to
<sandbox>/...
g
👍 Could also find all and number them... <sandbox-1>, <sandbox-2>, ...
💯 1
That's probably more tractable
c
I like it
or keep the id part of the sandbox, so you actually can find it if you keep the sandboxes...
<sandbox-Tyec0k>/...
etc
g
I think that'd hit the original issue of log-uniqueness
c
also avoids confusion between runs if the numbers reset, but it's not the same sandbox
oh... didn't read the PR, was that a thing?
thought it was to make it less verbose
g
The PR is about improving remote cache hits, though the log pruning isn't mentioned explicitly.
c
ah..
g
Yeah, this bug very aggressively hid that there was paths into four different sandboxes in play here...
In particular, the path to the files we were trying to compile were into another sandbox