After upgrading to Pants 2.4.0, I'm seeing some od...
# general
b
After upgrading to Pants 2.4.0, I'm seeing some odd failures in our CI environment:
Copy code
16:26:31.31 [WARN] Completed: lint - isort failed (exit code 2).
.cache/pex_root/venvs/d8105fd40948579fedc63bb8ddd363d251b7db7a/5dc7bde7966ce85a9f3820e78694dc0e8d63768d/pex: line 7: syntax error near unexpected token `('
.cache/pex_root/venvs/d8105fd40948579fedc63bb8ddd363d251b7db7a/5dc7bde7966ce85a9f3820e78694dc0e8d63768d/pex: line 7: `    venv_dir = os.path.abspath(os.path.dirname(__file__))'
I'm working on getting access to the build nodes to dig in further, but it looks like after the failing pants run that
pex_root
directory isn't present anymore, making it tricky to debug. Any ideas on how I might be able to debug this? Thanks 🙇
h
You can conserve the sandboxes that processes run in with
--no-process-execution-local-cleanup
It'll log the conserved path for each process, and in that dir the
__run.sh
script will execute the process
so you can tinker with that script as needed and re-run it directly inside that dir
That is often a really useful way to debug issues like this
That is a weird error
That
Copy code
venv_dir = os.path.abspath(os.path.dirname(__file__))
line 7 is in the
__main__.py
that pex synthesizes to make its venv runnable.
And
syntax error near unexpected token
is a bash error
So bash is trying to execute that file as shell instead of as python
meaning that, probably, the shebang in that file isn't pointing to a valid python interpreter
1
So the next step would be to see what the shebang is for the failing process
b
ah, interesting... thanks for the pointers
I see on my workstation, these files are located in ~/.cache/pants/named_caches/pex_root, whereas this message says that
pex_root
is directly in the
.cache
directory. In CI, I'm using a
pants.ci.toml
file with the following contents:
Copy code
local_store_dir = ".cache/pants/lmdb_store"
named_caches_dir = ".cache/pants/named_caches"

pants_ignore = [
  ".cache/pants/named_caches",
  ".cache/pants/lmdb_store",
]
But otherwise, I don't think I'm doing anything else different.
h
Are you able to share the path to your CI's build root, ie where the ./pants script is located? In particular, I'm wondering the length of it
Ah, if you click through to the linked Pants issue from that Pex issue, indeed that is the exact same Bash error
b
the build root is
/var/lib/buildkite-agent/builds/default-i-XXXXXXXXXXXXXXXXX-1/grapl/testing
which now seems rather long 😕
h
Got it, which is much longer than the default of /users/foo/.cache, for example. Almost certainly this is the issue According to https://github.com/pantsbuild/pex/pull/1254, the average shebang length is now ~60 characters, but your build root path is much longer so it is almost certainly exceeding the shebang length limit I think Pex's original fix will need to be revisited so that it is not as dependent on the length of the cache dir's prefix In the meantime, I don't imagine you can shorten that build root path, right? I'm afk but I think you can try changing your OS to allow longer shebangs - not sure if that can be updated. Last option is to not cache named_caches in your build root for now
b
I'll do some digging... there may be some knobs I can tweak. Thanks for providing some illumination on this; I was going a little crazy 😅
❤️ 1
This worked fine with Pants 2.2.0... were there changes with PEX or something else in the meantime that would have changed how this worked?
h
Thanks for the report! It's tricky to know beforehand all the edge cases out there, and it's invaluable to get error reports like this
Yes, Pex added a new venv mode, which reduces nearly all overhead from running a Pex after a one-time cost on the first run by creating a virtualenv for the PEX. It makes Pants much faster with running Pytest, isort, etc
b
ah, got it; makes sense!
🤔 Is there any Pants configuration to perhaps disable the use of this venv mode?
(just as a temporary thing, in my case)
h
There is not, it's hardcoded into the implementation of things like running Pytest. It should be an implementation detail users don't need to care about, minus this bug
b
Cool 👍 Thought I'd check 😄
Update: I was able to tweak the build path for my CI jobs so it's very short now, and the shebang lines work fine. One semi-unfortunate side effect I've discovered, though, is that my CI caching of the Pants cache is now complicated somewhat. On each machine in my build cluster, I can have 1-n agents running and servicing jobs. To keep them from stepping on each other, they each have their own build directory. The caching mechanism currently can only cache things relative to the build directory itself, and not some global location (like the build agent's $HOME directory). Ultimately, this means that my Pants caches will be in the agent-specific build directory; something like
/builds/1/cache
,
/builds/2/cache
, etc. This in turn means that the shebang lines in the cached PEX venvs will also have this agent specificity. It seems I will now need a cache per agent to ensure that the restored caches have PEX venvs that refer to the appropriate path (while experimenting with this, I had some builds that failed because they were using a cache with shebang lines that pointed to a nonexistent path). At the end of the day, it's not a big deal (S3 storage for the caches is cheap, and I don't know that I'm going to have more than a handful of agents on any given box), but it was an interesting realization. I'm not mentioning this to request any kind of change in Pants or anything, just more of an FYI. Thanks again for the assistance on this; I really appreciate it!
h
Interesting. Trying to think if there's some way around this.
h
The solution I was thinking is to have the shebang point to some path of fixed length, like always point to ~./cache/pex_venv_launcher. That then forwards the venv hash to the actual venv But I don't think that works, eg if that fixed location is not writeable. (But perhaps the fixed location could be configurable)
e
A shebang must be an absolute path.
I'm not fully following the various CI environment restrictions, but the way to go at 1st blush is to configure https://www.pantsbuild.org/docs/reference-global#section-named-caches-dir to a short global path like
/var/cache/pants
for each agent (this is safe to do, named caches must be concurrency safe) and then if you can't sync a cache from a global location, symlink that directory into an agent-local location.
👍 1
b
@enough-analyst-54434 Good suggestion; I may give that approach a try later today.
e
Excellent. Afaict this will be the only route here. Pex can't do more than it is already except to shave off maybe 5 characters. Fundamentally we need to be able to support placing the named caches in a moderately prefixed dir and warn or document better when folks like you run into issues with a long prefix dir.
👍 1
b
👍
e
Just checking in Chris. Hopefully you had some success?
b
Sorry, I got a bit sidetracked with trying out that symlink approach... haven't tried it yet, but the solution we have in place is working fine for now. Once I get a bit of free time, I'll give it a shot, though. Thanks for the follow up!
e
Sounds good.