what is the reason behind asking for a len on http...
# development
c
what is the reason behind asking for a len on http_source while the sha256 already provide integrity, most of the checksum file will only include sha256?
e
What if a 10TB rogue file is on the other end. Sure it will fail the hash check, but only after potentially DOSing your client or crashing it depending on how it's written.
c
if you are concerned about things like that add a max_size with a default of 2g and you are done
with configurable limit of course
e
It's a bit deeper I think. If you look at the remote execution API (Pants, Bazel and many servers implement it), the size + hash is the lingua franca. It pervades the codebase as a result. Perhaps we could be less strict for this user facing bit, but what is the practical problem you're encountering? Is it just annoyance or something bigger?
c
It is annoyance and using bazel i did not encountered that
I usually scrap the checksum file of GitHub release and use it to auto update my skylark
Similar the current terraform plugin of pants is lagging by several versions and would require to be updated, i need to write a script to auto update to do a PR
c
1
Also, the length is but a cli command away..
Copy code
❯ curl -sLI $URL | grep -i Content-Length
content-length: 0
content-length: 189537029
(here, the first one is from a redirect response..)
b
Yeah the true technical answer is shasum+len is the cache key in the underlying cache. So unless you want to download every time, you need both.
c
@curved-television-6568 https://github.com/pantsbuild/pants/blob/main/src/python/pants/backend/terraform/tool.py the script should probably be run latest version is 1.4.6 now
@bitter-ability-32190 thanks for the answer, so it is a technical constraint primary
Content-Length is not mandatory header
c
no, not mandatory, so if your source of download does not include it, I guess the full download and look at the size is needed unless there’s a oob way to find the size
👍 1
c
regarding bazel approch: https://bazel.build/rules/lib/repo/http you can see sha256 as optional and no size
c
the script should probably be run latest version is 1.4.6 now
Addressed in https://github.com/pantsbuild/pants/pull/19004
👍 1
you can see sha256 as optional and no size
Well, optional as a temporary convenience only:
It is optional to make development easier but either this attribute or
integrity
should be set before shipping.
c
i think sha256 should not be optional i agree my point is length is more questionable 🙂
c
there’s certainly ways to avoid it, but as Josh pointed out, it’s part of the cache key so as long as that is a thing, it’s difficult to get rid of. I imagine that it could be possible to leave them out and have Pants log the actual values that may then be used to fill it in, manually or automagically.
c
Yes, I am basically setting up a new mono repo with IaC, Kubernetes deployment, Typescript projects, and i want it to be open for Python/Golang/Rust projects
b
Having a default global length limit (eg 500MB or something) and only requiring a length when larger seems like a possibility that might balance resilience and convenience. Worth a feature request?
👍 1
b
FWIW that's not why there's a length in there. It very much is because that's the cache key. So you'd have to untangle that first
👍 1
b
As in, the file download is cached directly using that info, rather than as the output of a (cached) process?
b
Precisely. Straight to the cache with the value. Nothing in the middle