I have a requirements.txt file with around 70 requ...
# general
a
I have a requirements.txt file with around 70 requirements. Some of them have minimum versions
>=
, some use exact versions and for some the version is unspecified. I've ran into OOM errors a couple of times already when trying to do
pants generate-lockfiles
on a machine with 8GB of ram. IIRC last time I resolved it by pinning everything to an exact version in requirements. Is there a better way to work around it? I'm not even sure why I run into these memory errors in the first place
e
There is no obvious reason or correlation here from the facts given. Can you provide the Pants version along with the full command line output of an OOM event? Best is if you can run with -ldebug which will include enough information to repro / rule in or rule out various things.
a
sure,
pants_version = "2.15.0"
and here's the output of running
./pants -ldebug generate-lockfiles --resolve="python-default"
I only removed the address of our private pypi index. It's only used to host wheels for spacy models as they're not hosted on pypi https://pastebin.com/rMb4WWvi
e
Great, thank you. I'll have time to attempt a repro for this in ~3 hours and I'll update this thread.
a
Great, thank you 🙂
e
Ok, I'm not sure I can repro well here. I'm using systemd-run to enforce an 8G cgroup but I fail quickly - I think because of the missing private PyPI:
Copy code
$ systemd-run --scope -p MemoryMax=8G --user pex.venv/bin/pex3 lock create --output=lock.json --no-emit-warnings --style=universal --resolver-version pip-2020-resolver --target-system linux --target-system mac --indent=2 --no-pypi --index=<https://pypi.org/simple/> --index=<https://download.pytorch.org/whl/cu117> --manylinux manylinux2014 --interpreter-constraint "CPython==3.9.*" "SQLAlchemy>=2.0.6" "bertopic>=0.14.1" "boto3>=1.26.91" "ca_core_news_sm==3.5.0" "celery[redis]>=5.2.7" "cleanlab>=2.3.0" "click>=8.1.3" "da_core_news_sm==3.5.0" "datasets>=2.10.1" "dateparser>=1.1.7" "de_core_news_sm==3.5.0" "doccano-client>=1.1.0" "el_core_news_sm==3.5.0" "elasticsearch>=7.17.6" "emoji>=2.2.0" "en_core_web_sm==3.5.0" "es_core_news_sm==3.5.0" "eventlet" "fastapi>=0.94.1" "fasttext==0.9.2" "fr_core_news_sm==3.5.0" "gevent" "hdbscan>=0.8.29" "httpx>=0.23.3" "it_core_news_sm==3.5.0" "ja_core_news_sm==3.5.0" "kombu" "lightgbm" "lt_core_news_sm==3.5.0" "matplotlib>=3.7.1" "millify>=0.1.1" "mk_core_news_sm==3.5.0" "motor>=3.1.1" "nb_core_news_sm==3.5.0" "nl_core_news_sm==3.5.0" "numpy>=1.24.2" "openai>=0.27.2" "pandas>=1.5.3" "pika>=1.3.1" "pl_core_news_sm==3.5.0" "pt_core_news_sm==3.5.0" "pydantic" "pymongo>=4.3.3" "python-dateutil>=2.8.2" "pytorch-lightning>=1.9.4" "ray[serve]>=2.3.0" "regex>=2022.10.31" "requests>=2.28.2" "retry>=0.9.2" "ro_core_news_sm==3.5.0" "ru_core_news_sm==3.5.0" "scikit-learn>=1.2.2" "scipy>=1.10.1" "seaborn>=0.12.2" "sentence-transformers>=2.2.2" "sentry-sdk>=1.16.0" "setuptools" "slack-sdk>=3.20.2" "spacy[lookups]>=3.5.1" "statsd>=4.0.1" "torch>=1.13.1" "tqdm>=4.65.0" "transformers>=4.26.1" "umap-learn>=0.5.3" "uvicorn>=0.21.0" "xx_ent_wiki_sm==3.5.0" "zh_core_web_sm==3.5.0"
Running scope as unit: run-r0040121299ca4f44b15b3e4f8475a5c3.scope
pid 1831 -> /home/jsirois/.pex/venvs/948f96a8e809ae4371e74f85a8a888c4de89ea61/108a3ddc84230ab282ea6312e06cb68f51008ce5/bin/python -sE /home/jsirois/.pex/venvs/948f96a8e809ae4371e74f85a8a888c4de89ea61/108a3ddc84230ab282ea6312e06cb68f51008ce5/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /home/jsirois/.pex/pip_cache --log /tmp/pex-pip-log.m2oko96u/pip.log download --dest /tmp/tmpx6u2ix1l/home.jsirois.support.pants.Szymon.pex.venv.bin.python3.9 SQLAlchemy>=2.0.6 bertopic>=0.14.1 boto3>=1.26.91 ca_core_news_sm==3.5.0 celery[redis]>=5.2.7 cleanlab>=2.3.0 click>=8.1.3 da_core_news_sm==3.5.0 datasets>=2.10.1 dateparser>=1.1.7 de_core_news_sm==3.5.0 doccano-client>=1.1.0 el_core_news_sm==3.5.0 elasticsearch>=7.17.6 emoji>=2.2.0 en_core_web_sm==3.5.0 es_core_news_sm==3.5.0 eventlet fastapi>=0.94.1 fasttext==0.9.2 fr_core_news_sm==3.5.0 gevent hdbscan>=0.8.29 httpx>=0.23.3 it_core_news_sm==3.5.0 ja_core_news_sm==3.5.0 kombu lightgbm lt_core_news_sm==3.5.0 matplotlib>=3.7.1 millify>=0.1.1 mk_core_news_sm==3.5.0 motor>=3.1.1 nb_core_news_sm==3.5.0 nl_core_news_sm==3.5.0 numpy>=1.24.2 openai>=0.27.2 pandas>=1.5.3 pika>=1.3.1 pl_core_news_sm==3.5.0 pt_core_news_sm==3.5.0 pydantic pymongo>=4.3.3 python-dateutil>=2.8.2 pytorch-lightning>=1.9.4 ray[serve]>=2.3.0 regex>=2022.10.31 requests>=2.28.2 retry>=0.9.2 ro_core_news_sm==3.5.0 ru_core_news_sm==3.5.0 scikit-learn>=1.2.2 scipy>=1.10.1 seaborn>=0.12.2 sentence-transformers>=2.2.2 sentry-sdk>=1.16.0 setuptools slack-sdk>=3.20.2 spacy[lookups]>=3.5.1 statsd>=4.0.1 torch>=1.13.1 tqdm>=4.65.0 transformers>=4.26.1 umap-learn>=0.5.3 uvicorn>=0.21.0 xx_ent_wiki_sm==3.5.0 zh_core_web_sm==3.5.0 --index-url <https://pypi.org/simple/> --extra-index-url <https://download.pytorch.org/whl/cu117> --retries 5 --timeout 15 exited with 1 and STDERR:
ERROR: Could not find a version that satisfies the requirement ca_core_news_sm==3.5.0
ERROR: No matching distribution found for ca_core_news_sm==3.5.0
Yeah. If I switch that one to Spanish, I next get hit with da
a
yeah, you might have a problem with those packages. I'm not sure if the problem persists if you remove the model packages
e
Ok, trying with all those removed save for Spain 3.1.0
Adding --preserve-pip-download-log and tailing that, torch is killing things (
torch>=1.13.1
). Pip is having to try several versions and 1.13.1, for example, is 1.8GB. This may not be the OOM reason, but it is slow to download just to check its requirements for resolve recursion!
Do you really mean torch 13 is ok?
Or do you mean to say anything > than 1.13.1 but < 2?
a
hmm I suppose the parameter you shared will already be helpful in debugging this. I didn't know it had to try multiple versions, I thought 1.13.1 is the latest release.
I just looked it up and it says they released 2.0 an hour ago lol
e
Nope, 2.0.0 was released ~1 hour ago.
a
Oh well, so yeah. I'll definitely change it to <2 for now
e
Yeah, but that aside you really have to ask yourself if you're ok with an unbounded upper range for any dep. Is a major release - which is always allowed to break compatibility - ok to float up to?
Granted - you should not have to care with a lock file. It just makes solving potentially slower.
Ok, I'm retrying with torch pinned. That was too slow for my internet connection.
a
Yeah, I thought a lockfile would solve potential issues with upgrade. I'm in the process of cleaning up our dependencies so I need to figure out a good way to approach this
It's a shame you need to download whole packages in order to resolve versions of dependencies. But it's a problem with pip isn't it?
e
No, not a problem with Pip, a problem with Python / PyPA. There is no standard to store metadata separate from package.
There is a recent json API supported by PyPI, but its not required, not universal, and not always available yet.
a
Oh, but it sounds like it could be a huge improvement once it's adopted
e
Yeah.
a
I think there's one more issue with the command I shared with you. There's a leftover index there
<https://download.pytorch.org/whl/cu117>
. It's no longer needed but I didn't remove it
e
Maybe good news:
Copy code
$ systemd-run --scope -p MemoryMax=8G --user pex.venv/bin/pex3 lock create --output=lock.json --no-emit-warnings --style=universal --resolver-version pip-2020-resolver --target-system linux --target-system mac --indent=2 --no-pypi --index=<https://pypi.org/simple/> --index=<https://download.pytorch.org/whl/cu117> --manylinux manylinux2014 --interpreter-constraint "CPython==3.9.*" "SQLAlchemy>=2.0.6" "bertopic>=0.14.1" "boto3>=1.26.91" "es_core_news_sm==3.1.0" "celery[redis]>=5.2.7" "cleanlab>=2.3.0" "click>=8.1.3" "datasets>=2.10.1" "dateparser>=1.1.7" "doccano-client>=1.1.0" "elasticsearch>=7.17.6" "emoji>=2.2.0" "eventlet" "fastapi>=0.94.1" "fasttext==0.9.2" "gevent" "hdbscan>=0.8.29" "httpx>=0.23.3" "kombu" "lightgbm" "matplotlib>=3.7.1" "millify>=0.1.1" "motor>=3.1.1" "numpy>=1.24.2" "openai>=0.27.2" "pandas>=1.5.3" "pika>=1.3.1" "pydantic" "pymongo>=4.3.3" "python-dateutil>=2.8.2" "pytorch-lightning>=1.9.4" "ray[serve]>=2.3.0" "regex>=2022.10.31" "requests>=2.28.2" "retry>=0.9.2" "scikit-learn>=1.2.2" "scipy>=1.10.1" "seaborn>=0.12.2" "sentence-transformers>=2.2.2" "sentry-sdk>=1.16.0" "setuptools" "slack-sdk>=3.20.2" "spacy[lookups]>=3.5.1" "statsd>=4.0.1" "torch==1.13.1" "tqdm>=4.65.0" "transformers>=4.26.1" "umap-learn>=0.5.3" "uvicorn>=0.21.0" --preserve-pip-download-log
Running scope as unit: run-rdc0b29a6f94e4b63bd6c15c1dba1f458.scope
pex: Preserving `pip download` log at /tmp/pex-pip-log.iqxjfggr/pip.log
pid 3873 -> /home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/bin/python -sE /home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /home/jsirois/.pex/pip_cache --log /tmp/pex-pip-log.iqxjfggr/pip.log download --dest /tmp/tmpn232vwyh/home.jsirois.pex.venv.bin.python3.9 SQLAlchemy>=2.0.6 bertopic>=0.14.1 boto3>=1.26.91 es_core_news_sm==3.1.0 celery[redis]>=5.2.7 cleanlab>=2.3.0 click>=8.1.3 datasets>=2.10.1 dateparser>=1.1.7 doccano-client>=1.1.0 elasticsearch>=7.17.6 emoji>=2.2.0 eventlet fastapi>=0.94.1 fasttext==0.9.2 gevent hdbscan>=0.8.29 httpx>=0.23.3 kombu lightgbm matplotlib>=3.7.1 millify>=0.1.1 motor>=3.1.1 numpy>=1.24.2 openai>=0.27.2 pandas>=1.5.3 pika>=1.3.1 pydantic pymongo>=4.3.3 python-dateutil>=2.8.2 pytorch-lightning>=1.9.4 ray[serve]>=2.3.0 regex>=2022.10.31 requests>=2.28.2 retry>=0.9.2 scikit-learn>=1.2.2 scipy>=1.10.1 seaborn>=0.12.2 sentence-transformers>=2.2.2 sentry-sdk>=1.16.0 setuptools slack-sdk>=3.20.2 spacy[lookups]>=3.5.1 statsd>=4.0.1 torch==1.13.1 tqdm>=4.65.0 transformers>=4.26.1 umap-learn>=0.5.3 uvicorn>=0.21.0 --index-url <https://pypi.org/simple/> --extra-index-url <https://download.pytorch.org/whl/cu117> --retries 5 --timeout 15 exited with 2 and STDERR:
ERROR: Exception:
Traceback (most recent call last):
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 171, in _merge_into_criterion
    crit = self.state.criteria[name]
KeyError: 'torch'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 223, in _main
    status = self.run(options, args)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 180, in wrapper
    return func(self, options, args)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/commands/download.py", line 130, in run
    requirement_set = resolver.resolve(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 121, in resolve
    self._result = resolver.resolve(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 453, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 318, in resolve
    name, crit = self._merge_into_criterion(r, parent=None)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _merge_into_criterion
    crit = Criterion.from_requirement(self._p, requirement, parent)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 82, in from_requirement
    if not cands:
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/resolvelib/structs.py", line 124, in __bool__
    return bool(self._sequence)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 99, in __bool__
    return any(self)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 239, in iter_index_candidates
    candidate = self._make_candidate_from_link(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 167, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 296, in __init__
    super(LinkCandidate, self).__init__(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 144, in __init__
    self.dist = self._prepare()
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 222, in _prepare
    dist = self._prepare_distribution()
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 307, in _prepare_distribution
    return self._factory.preparer.prepare_linked_requirement(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 480, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 503, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 253, in unpack_url
    file = get_http_url(
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 130, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/network/download.py", line 150, in __call__
    resp = _http_get_download(self._session, link)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/network/download.py", line 131, in _http_get_download
    resp = session.get(target_url, headers=HEADERS, stream=True)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_internal/network/session.py", line 428, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File
...
"/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/msgpack/fallback.py", line 670, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/home/jsirois/.pex/venvs/a1afe18d558b9e69ce154ed593514cf84783ea42/108a3ddc84230ab282ea6312e06cb68f51008ce5/lib/python3.9/site-packages/pip/_vendor/msgpack/fallback.py", line 683, in _unpack
    return bytes(obj)
MemoryError
I'm not sure what a systemd cgroup OOM looks like. Digging more to see if this is what it looks like. Not I did halve the memory since not even 4GB should be used in a resolve.
Ah, no. I did not 1/2 that time, that was with 8G.
Ok, yeah - that stack trace is repeatable and its during Pip examining its cache for torch. It tries to deserialize a ~2GB cache entry. I need to dig to see how that step blows up enough to top an 8GB limit, but it appears to be the proximate issue.
@adamant-magazine-16751 the best I can come up with so far falls short. That backtrace seems to at most account for ~1.8GB x2 - copying a
>>> obj: 139838261863216 type=<class 'bytearray'> len=1801768980 size=1801769037
into a
bytes(...)
. That still leaves 1/2 the 8GB unaccounted for.
That's close though! Ish. I tried modern Pex (2.1.129) and modern Pip (
--pip-version 23.0.1
) and neither helped. Same error in the same spot.
a
Well, thank you very much for your help anyway. It's not the end of the world, it's not the only machine we've got. I was just curious what the problem had been and whether I could do anything to decrease the memory consumption. One last question. From what I understand, the way to decrease the runtime of
generate-lockfiles
would be to decrease the number of compatible interpreters and to pin the dependencies. Is that correct?
e
Constraining ICs is the 1st thing to do. It should be the easiest. The second is to use
--pip-version
to select modern Pip (only available in Pants 2.16+: https://www.pantsbuild.org/v2.16/docs/reference-python#pip_version). Pinning requirement versions is not awesome since lock files mean you shouldn't have to. Maybe placing an upper bound on versions though. There is 1 final option documented here: https://github.com/pantsbuild/pex/issues/2044
a
Great! Once again, thank you very much for all the help