Hello, We are running into "Resource temporary un...
# general
r
Hello, We are running into "Resource temporary unavailable" error when using Pants (2.14) in the concurrent CI environment (Jenkins, docker, kubernetes). The example command and debug logs are attached. CC: @gray-shoe-19951
Copy code
16:13:57  ./pants tailor --check update-build-files --check ::

16:13:57  20:13:56.57 [DEBUG] acquiring lock: <pants.pantsd.lock.OwnerPrintingInterProcessFileLock object at 0x7ff8eaa8cdf0>

16:13:57  20:13:56.57 [DEBUG] releasing lock: <pants.pantsd.lock.OwnerPrintingInterProcessFileLock object at 0x7ff8eaa8cdf0>

16:13:57  20:13:56.57 [DEBUG] Connecting to pantsd on port 39833

16:13:57  20:13:56.57 [DEBUG] Connecting to pantsd on port 39833 attempt 1/3

16:13:57  20:13:56.58 [DEBUG] Connected to pantsd

16:13:57  20:13:56.60 [DEBUG] Launching 1 roots (poll=false).

16:13:57  20:13:56.60 [DEBUG] Dependency SessionValues of Some("@rule(pants.engine.internals.options_parsing.parse_options())") changed.

16:13:57  20:13:56.62 [DEBUG] Dependency @rule(pants.engine.internals.options_parsing.parse_options()) of Some("@rule(pants.engine.internals.options_parsing.scope_options())") changed.

16:13:57  20:13:56.62 [DEBUG] computed 1 nodes in 0.023255 seconds. there are 7 total nodes.

16:13:57  20:13:56.67 [DEBUG] Launching 1 roots (poll=false).

16:13:57  20:13:56.67 [DEBUG] computed 1 nodes in 0.000454 seconds. there are 7 total nodes.

16:13:57  20:13:56.70 [DEBUG] specs are: Specs(includes=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(RecursiveGlobSpec(directory=''),), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=True, from_change_detection=False), ignores=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=False, from_change_detection=False))

16:13:57  20:13:56.70 [DEBUG] changed_options are: ChangedOptions(since=None, diffspec=None, dependees=<DependeesOption.NONE: 'none'>)

16:13:57  20:13:56.70 [DEBUG] Launching 1 roots (poll=false).

16:13:57  20:13:56.70 [DEBUG] Dependency SessionValues of Some("@rule(pants.engine.internals.options_parsing.parse_options())") changed.

16:13:57  20:13:56.70 [DEBUG] Dependency @rule(pants.engine.internals.options_parsing.parse_options()) of Some("@rule(pants.engine.internals.options_parsing.scope_options())") changed.

16:13:57  20:13:56.70 [DEBUG] Dependency @rule(pants.engine.internals.options_parsing.parse_options()) of Some("@rule(pants.engine.internals.options_parsing.scope_options())") changed.

16:13:57  20:13:56.71 [DEBUG] computed 1 nodes in 0.004453 seconds. there are 17 total nodes.

16:13:57  20:13:56.71 [DEBUG] requesting <class 'pants.core.goals.tailor.TailorGoal'> to satisfy execution of `tailor` goal

16:13:57  20:13:56.71 [DEBUG] Launching 1 roots (poll=false).

16:13:57  20:13:56.71 [DEBUG] Starting: `tailor` goal

16:13:57  20:13:56.71 [DEBUG] Starting: Find all sources from input specs

16:13:57  20:13:56.71 [DEBUG] Starting: Find targets from input specs

16:13:57  20:13:56.71 [DEBUG] Starting: Find targets from input specs

16:13:57  20:13:56.71 [DEBUG] Completed: Find targets from input specs

16:13:57  20:13:56.72 [DEBUG] Completed: Find targets from input specs

16:13:57  20:13:56.72 [DEBUG] Completed: Find all sources from input specs

16:13:57  20:13:56.72 [DEBUG] Completed: `tailor` goal

16:13:57  20:13:56.72 [DEBUG] computed 1 nodes in 0.017771 seconds. there are 115 total nodes.

16:13:57  20:13:56.72 [ERROR] 1 Exception encountered:

16:13:57  

16:13:57    Exception: Couldn't find file contents for "src/bread_service/BUILD": Failed to begin read transaction: Resource temporarily unavailable
e
Can you detail the container setup? What paths are mounted in vs which are resident in the image.
As detailed a picture as you can provide of the container setup and how / where pants is run inside it will be crucial for having any idea what's going on here.
r
Sorry, was trying to post from the phone and output were screwed up. It is updated now. ☝️ Pants is running in containers which are managed by Jenkins/Kubernetes. I will see if I can find more detailed setup. Project files are cloned from Git inside the container. All Pants files are stored on shared volume (Kubernetes PVC).
e
The nature of the PVC will be the key detail. It's almost certainly the problem assuming it's network attached storage and the category "Pants files" includes what would normally be located under
~/.cache/pants
on your machine.
r
Here are the additional details... We set the following environment variables to point Pants to network storage:
Copy code
PANTS_LOCAL_STORE_DIR
PANTS_NAMED_CACHES_DIR
PANTS_LOCAL_EXECUTION_ROOT_DIR
PVC provide is Portworkx. Full spec:
Copy code
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pants-1
  namespace: bai-rnd-jenkins
  uid: b3f8a546-b749-46f2-8074-690524b4488c
  resourceVersion: '884042807'
  creationTimestamp: '2023-02-07T16:34:21Z'
  annotations:
    <http://pv.kubernetes.io/bind-completed|pv.kubernetes.io/bind-completed>: 'yes'
    <http://pv.kubernetes.io/bound-by-controller|pv.kubernetes.io/bound-by-controller>: 'yes'
    <http://volume.beta.kubernetes.io/storage-provisioner|volume.beta.kubernetes.io/storage-provisioner>: <http://kubernetes.io/portworx-volume|kubernetes.io/portworx-volume>
    <http://volume.kubernetes.io/storage-provisioner|volume.kubernetes.io/storage-provisioner>: <http://kubernetes.io/portworx-volume|kubernetes.io/portworx-volume>
    <http://volume.kubernetes.io/storage-resizer|volume.kubernetes.io/storage-resizer>: <http://kubernetes.io/portworx-volume|kubernetes.io/portworx-volume>
  finalizers:
    - <http://kubernetes.io/pvc-protection|kubernetes.io/pvc-protection>
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 250Gi
  volumeName: pvc-b3f8a546-b749-46f2-8074-690524b4488c
  storageClassName: pwx-sc-ssd-shared
  volumeMode: Filesystem
status:
  phase: Bound
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 250Gi
e
That's almost certainly the problem. The portworx site is sales speak word salad; so I'm not 100% sure this is a network filesystem, but it likely is. Neither Pants, which uses both file locks and LMDB (a memory mapped database), nor Pex, which uses file locks, works on a network filesystem.
So you can't share caches this way.
If you need a shared cache you might check out https://www.pantsbuild.org/docs/remote-caching
r
Yes, it is a network filesystem. We use network filesystems a lot and from user perspective it works just the same way as local file system (obviously besides latency). I could double check, but my understanding is that file locks should also work there. Do you think there could be any other reasons why it does not work with Pants? Could you confirm that Pants support concurrent execution (running multiple ./pants commands at the same time) while using the same cache? e.g. if I have multiple branches with slightly different versions of the code and I run pants commands at the same time from them We could look into remote caching, but it seems like much more complicated setup. I wonder if there is any short-term solution possible to mitigate the issue.
e
Posix file locks definitely do not works on network filesystems in general. They can on NFS v4.
With the information you've provided, which is a bit minimal on the front half, this is my best guess. Maybe you can re-run the Pants command with
-ldebug
and provide the full output, command and all of stderr / stdout and we can discover more.
r
Is it the same as setting PANTS_LEVEL=debug? If yes, I already did it and log above includes it. Or you also need logs from pantsd?
e
Ah yes, you did. I think you'll just need to dig, it's not clear to me where exactly that message comes from. As far as Pants code goes, the only likely place is LMDB, which, like I said, doesn't work with network filesystems.
So, backing up, you're a bit tight lipped on circumstances. What is new here? Did you change Pants version then hit this? Did you add the shared cache then hit this? What changed between before hitting this and after?
Re-reading above, yes, Pants is known to work fine with a local shared cache across multiple concurrent Pants runs. It is likewise known to not work when the cache is on a network filesystem.
r
Even if it is NFSv4?
And yes, you are right, it seems we started to see such sporadic errors only this week. No changes to pants version, still on 2.14
e
See the 2nd to last caveat here: http://www.lmdb.tech/doc/
You are barking up the wrong tree. You need to use a local filesystem for Pants caches or use remote caching.
Fwiw, remote caching is far easier to set up than the k8s config you've shown. There are many providers of this service. If you don't want to pay a service provider and you want to run your own remote cache instead, then, yes, it's likely more complex. That said, it's your only option for a shared non local cache.