lemon-noon-33245
05/08/2023, 7:20 PM[WARN] Failed to write to remote cache (1 occurrences so far): ResourceExhausted: "grpc: received message larger than max (5812170 vs. 4194304)"
If I increase the max receivable size on the server from the 4MB default, the cache starts working. Theoretically, AFAIK, client and server should automatically agree on a max message size and honor that throughout the message exchange. I tried looking up how pants is using tonic
but couldn’t see anything exotic there in terms of message size.
Has anyone seen anything like that happen?fast-nail-55400
05/08/2023, 7:31 PM--remote-store-batch-api-size-limit
to lower the threshold before the streaming API is used.lemon-noon-33245
05/08/2023, 7:33 PM4194304
according to docs. If that was the case it should never be above that, right? I’m getting 5812170
in this particular error messages but already had times where I saw close to 10MBremote-store-batch-api-size-limit
but it seems to not limit the sizefast-nail-55400
05/08/2023, 7:36 PMGetCapabilites
response?lemon-noon-33245
05/08/2023, 7:44 PMself.get_capabilities
to check what it’s returningfast-nail-55400
05/08/2023, 7:44 PMlemon-noon-33245
05/08/2023, 7:45 PMIf the streaming API is triggering the too big error, then that would be a different issue.How can I hone in to know which API is triggering the error?
fast-nail-55400
05/08/2023, 7:46 PMtonic
RUST_LOG
should work.-ldebug
or -ltrace
enabled. (Been a while since I had turned on Rust logging.)lemon-noon-33245
05/08/2023, 7:53 PMRUST_LOG=trace
adds a little more info but nothing on grpc even with -ltrace
. Any other ideas?max_batch_total_size_bytes
seems to be being sent correctly. This is what I get printing ServerCapabilities:
ServerCapabilities { cache_capabilities: Some(CacheCapabilities { digest_functions: [Sha256], action_cache_update_capabilities: Some(ActionCacheUpdateCapabilities { update_enabled: true }), cache_priority_capabilities: Some(PriorityCapa
bilities { priorities: [PriorityRange { min_priority: 0, max_priority: 0 }] }), max_batch_total_size_bytes: 4194304, symlink_absolute_path_strategy: Allowed, supported_compressors: [Zstd], supported_batch_update_compressors: [Zstd] }),
execution_capabilities: None, deprecated_api_version: None, low_api_version: Some(SemVer { major: 2, minor: 0, patch: 0, prerelease: "" }), high_api_version: Some(SemVer { major: 2, minor: 3, patch: 0, prerelease: "" }) }
fast-nail-55400
05/08/2023, 8:49 PMgrpc: received message larger than max
is gRPC level config. I had suggested the REAPI level options since it might have an indirect effect on message size. I am not sure Pants configures any message size limits in tonic
itself. (https://github.com/pantsbuild/pants/tree/main/src/rust/engine/grpc_util is relevant Pants crate.)tonic
has a config we should be using if we are not already using it?lemon-noon-33245
05/08/2023, 9:03 PMfast-nail-55400
05/08/2023, 9:12 PMpants should use the streaming api, right?It should (and the logs would show
ByteStream
RPCs). Of course, it begs the question whether the gRPC max message size is being exceeded for the ByteStream
API stream messages.lemon-noon-33245
05/08/2023, 10:04 PMbazel-remote
.
I’d love to see the rpc logs but can’t turn them on for the life of me. I’m calling pants with:
RUST_LOG=trace pants export-codegen //pkg:build -ltrace --keep-sandboxes=on_failure --no-local-cache
Am I missing anything here?fast-nail-55400
05/08/2023, 10:54 PM--log-show-rust-3rdparty
lemon-noon-33245
05/09/2023, 2:11 PMfast-nail-55400
05/09/2023, 4:34 PM09:51:57.86 [ERROR] 1 Exception encountered:
Engine traceback:
in `export-codegen` goal
IntrinsicError: Failed to create symlink to ../../../../node_modules/.pnpm/@amplitude+analytics-browser@1.9.1/node_modules/@amplitude/analytics-browser at /Users/theoribeiro/Dev/trinio/trinio/dist/codegen/node/apps/checkout/node_modul
es/@amplitude/analytics-browser: File exists (os error 17)
an error unrelated to remote cache09:52:00.56 [WARN] Failed to write to remote cache (1 occurrences so far): ResourceExhausted: "grpc: received message larger than max (5812170 vs. 4194304)"
09:51:57.86 [TRACE] Session end task `remote cache write Digest { hash: Fingerprint<089ee036609c36c38e6abcca42dd7e5039b7bc29b5e2051a71735a724e41b610>, size_bytes: 138 }` completed successfully
09:51:57.86 [TRACE] Session end task `remote cache write Digest { hash: Fingerprint<8a44c11b4a792d93843b654f9924f215b3c761e45852c33de4db22ba0109acc0>, size_bytes: 138 }` completed successfully
09:52:00.56 [TRACE] Session end task `remote cache write Digest { hash:Fingerprint<d686e419484f63fd28c18983ea994cf6fbe751ddb4ab55b80190698993438701>, size_bytes: 142 }` completed successfully
--stats-log
and provide it here? The stats will contain a histogram of the remote cache blob sizes. I am wondering if Pants is trying to write a large blob or not. (Although with the streaming API, that should be fine.) But it can help refine the inquiry into the issue.lemon-noon-33245
05/09/2023, 5:42 PMan error unrelated to remote cacheYeah, this one happens when my dist folder is not empty. Not really too big of an issue.
--stats-log
13:44:55.99 [INFO] Counters:
backtrack_attempts: 0
docker_execution_errors: 0
docker_execution_requests: 0
docker_execution_successes: 0
local_cache_read_errors: 0
local_cache_requests: 0
local_cache_requests_cached: 0
local_cache_requests_uncached: 0
local_cache_total_time_saved_ms: 0
local_cache_write_errors: 0
local_execution_requests: 16
local_process_total_time_run_ms: 52139
remote_cache_read_errors: 0
remote_cache_request_timeouts: 0
remote_cache_requests: 8
remote_cache_requests_cached: 6
remote_cache_requests_uncached: 1
remote_cache_speculation_local_completed_first: 1
remote_cache_speculation_remote_completed_first: 6
remote_cache_total_time_saved_ms: 271
remote_cache_write_attempts: 2
remote_cache_write_errors: 1
remote_cache_write_successes: 1
remote_execution_errors: 0
remote_execution_requests: 0
remote_execution_rpc_errors: 0
remote_execution_rpc_execute: 0
remote_execution_rpc_retries: 0
remote_execution_rpc_wait_execution: 0
remote_execution_success: 0
remote_execution_timeouts: 0
remote_process_total_time_run_ms: 0
remote_store_missing_digest: 0
remote_store_request_timeouts: 0
fast-nail-55400
05/09/2023, 6:40 PMhdrhistogram
to [GLOBAL].plugins
in pants.toml
, that will enable the histogramslemon-noon-33245
05/09/2023, 6:47 PM14:51:03.56 [INFO] Observation histogram summaries:
14:51:03.73 [INFO] Summary of `local_store_read_blob_size` observation histogram:
min: 1
max: 520959
mean: 3861.533
std dev: 20559.834
total observations: 104811
sum: 404731094
p25: 236
p50: 645
p75: 1598
p90: 5435
p95: 11359
p99: 64383
14:51:03.74 [INFO] Summary of `remote_store_blob_bytes_uploaded` observation histogram:
min: 0
max: 428
mean: 103.370
std dev: 98.918
total observations: 27
sum: 2791
p25: 6
p50: 125
p75: 139
p90: 142
p95: 304
p99: 428
14:51:03.80 [INFO] Summary of `local_process_time_run_ms` observation histogram:
min: 6
max: 49183
mean: 3122.812
std dev: 11888.894
total observations: 16
sum: 49965
p25: 22
p50: 39
p75: 93
p90: 107
p95: 166
p99: 49183
14:51:03.86 [INFO] Summary of `remote_cache_get_action_result_time_micros` observation histogram:
min: 11464
max: 42079
mean: 25393.714
std dev: 12466.941
total observations: 7
sum: 177756
p25: 13039
p50: 18527
p75: 36319
p90: 40063
p95: 42079
p99: 42079
14:51:03.92 [INFO] Summary of `remote_cache_get_action_result_network_time_micros` observation histogram:
min: 6852
max: 39839
mean: 22771.143
std dev: 12556.225
total observations: 7
sum: 159398
p25: 10415
p50: 18159
p75: 33151
p90: 36991
p95: 39839
p99: 39839
14:51:04.06 [INFO] Summary of `local_store_read_blob_time_micros` observation histogram:
min: 13
max: 3938303
mean: 2100973.634
std dev: 889304.076
total observations: 104811
sum: 220205147530
p25: 1538047
p50: 1998847
p75: 2609151
p90: 3567615
p95: 3817471
p99: 3907583
14:51:04.07 [TRACE] deregistering event source from poller
fast-nail-55400
05/09/2023, 6:53 PM14:51:03.74 [INFO] Summary of `remote_store_blob_bytes_uploaded` observation histogram:
min: 0
max: 428
mean: 103.370
std dev: 98.918
total observations: 27
sum: 2791
p25: 6
p50: 125
p75: 139
p90: 142
p95: 304
p99: 428
lemon-noon-33245
05/09/2023, 7:01 PMfast-nail-55400
05/09/2023, 7:01 PMtcpdump
to capture a packet trace and then view it with Wireshark.lemon-noon-33245
05/09/2023, 7:17 PMfast-nail-55400
05/09/2023, 7:22 PM-o
option to tcpdump
)lemon-noon-33245
05/09/2023, 7:27 PMfast-nail-55400
05/09/2023, 7:28 PMlemon-noon-33245
05/09/2023, 7:30 PMfast-nail-55400
05/09/2023, 7:30 PMByteStream
gRPC protobuflemon-noon-33245
05/09/2023, 7:37 PMfast-nail-55400
05/09/2023, 7:39 PMlemon-noon-33245
05/09/2023, 7:47 PMfast-nail-55400
05/09/2023, 7:55 PMDissector bug, protocol HTTP2: epan/dissectors/packet-http2.c:3086: failed assertion "!((pinfo)->fd->visited) && datalen == length"
lemon-noon-33245
05/09/2023, 7:57 PMAssuming the numbers like 91, 100, 125 are chunk sizes, nothing in the trace stands out as a huge number.Yeah, those are the package lengths
fast-nail-55400
05/09/2023, 8:00 PMResourceExhausted
is occurring.lemon-noon-33245
05/09/2023, 8:06 PMfast-nail-55400
05/09/2023, 8:09 PMResourceExhausted
error from somewhere else?lemon-noon-33245
05/09/2023, 8:10 PMfast-nail-55400
05/09/2023, 8:12 PMlemon-noon-33245
05/09/2023, 8:14 PM