I've written a Lambda function using pandas, s3fs ...
# general
b
I've written a Lambda function using pandas, s3fs and fastparquet. I defined it as:
Copy code
python_awslambda(
    name = "lambda",
    dependencies = [
        "!!:types-requests",
        ":s3fs",
        ":fastparquet",
    ],
    handler = "lambda.py:handler",
    runtime = "python3.9",
)
However, when running it on AWS, I get:
Copy code
{
  "errorMessage": "Install s3fs to access S3",
  "errorType": "ImportError",
[...]
I tried to run it locally and I get:
Copy code
$ lambdex test ./dist/packages.data-fred.src.data_fred/lambda.zip /tmp/event.json 
Traceback (most recent call last):
  File "/home/pmuller/.local/bin/lambdex", line 8, in <module>
    sys.exit(main())
  File "/home/pmuller/.local/lib/python3.9/site-packages/lambdex/bin/lambdex.py", line 291, in main
    args.func(args)
  File "/home/pmuller/.local/lib/python3.9/site-packages/lambdex/bin/lambdex.py", line 207, in test_lambdex
    runner = EntryPoint.parse("run = %s" % lambdex_entry_point).resolve()
  File "/home/pmuller/.local/lib/python3.9/site-packages/pex/vendor/_vendored/setuptools/pkg_resources/__init__.py", line 2481, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/pmuller/.pex/unzipped_pexes/2c5fccf932b3d8caf958b1b9e39ecf41464b6f78/data_fred/lambda.py", line 6, in <module>
    from pandas import DataFrame
  File "/home/pmuller/.pex/installed_wheels/0e2227a9f4be0a9ef2995a85f08c068f142d53f2fd8289da259c29ee633290de/pandas-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl/pandas/__init__.py", line 16, in <module>
    raise ImportError(
ImportError: Unable to import required dependencies:
numpy: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    <https://numpy.org/devdocs/user/troubleshooting-importerror.html>

Please note and check the following:

  * The Python version is: Python3.9 from "/usr/bin/python3.9"
  * The NumPy version is: "1.21.5"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: No module named 'numpy.core._multiarray_umath'
However, I tried to package it with pex and it works fine:
Copy code
pex_binary(
    name = "pex",
    dependencies = [
        ":s3fs",
        ":fastparquet",
    ],
    entry_point = "lambda.py",
)
Any idea what I did wrong? (simple lambdas, without heavy dependencies like pandas & co work fine)
BTW numpy and s3fs are present in the Lambda bundle:
Copy code
$ unzip -l ./dist/packages.data-fred.src.data_fred/lambda.zip | fgrep .deps/s3fs- | wc -l
24
$ unzip -l ./dist/packages.data-fred.src.data_fred/lambda.zip | fgrep .deps/numpy- | wc -l
1052
When I run
lambda.py
directly, it doesn't find the
fastparquet
package:
Copy code
$ ./pants run packages/data-fred/src/data_fred/lambda.py 
INFO:data_fred.api:Downloading observations for series WALCL
INFO:__main__:Writing observations to <s3://data-sources-XXX/fred/WALCL.parquet>
Traceback (most recent call last):
  File "/home/pmuller/.cache/pants/named_caches/pex_root/venvs/2301562cab68abc8768f002da997fdda620d26de/7b7f23c6248e806b986f5af0bb1896fbed295aaa/pex", line 235, in <module>
    runpy.run_module(module_name, run_name="__main__", alter_sys=True)
  File "/usr/lib/python3.9/runpy.py", line 225, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/pants-sandbox-KA6n0t/packages/data-fred/src/data_fred/lambda.py", line 32, in <module>
    handler({"series": "WALCL"}, None)
  File "/tmp/pants-sandbox-KA6n0t/packages/data-fred/src/data_fred/lambda.py", line 27, in handler
    DataFrame(observations).to_parquet(url)
  File "/home/pmuller/.cache/pants/named_caches/pex_root/venvs/2301562cab68abc8768f002da997fdda620d26de/7b7f23c6248e806b986f5af0bb1896fbed295aaa/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/home/pmuller/.cache/pants/named_caches/pex_root/venvs/2301562cab68abc8768f002da997fdda620d26de/7b7f23c6248e806b986f5af0bb1896fbed295aaa/lib/python3.9/site-packages/pandas/core/frame.py", line 2975, in to_parquet
    return to_parquet(
  File "/home/pmuller/.cache/pants/named_caches/pex_root/venvs/2301562cab68abc8768f002da997fdda620d26de/7b7f23c6248e806b986f5af0bb1896fbed295aaa/lib/python3.9/site-packages/pandas/io/parquet.py", line 424, in to_parquet
    impl = get_engine(engine)
  File "/home/pmuller/.cache/pants/named_caches/pex_root/venvs/2301562cab68abc8768f002da997fdda620d26de/7b7f23c6248e806b986f5af0bb1896fbed295aaa/lib/python3.9/site-packages/pandas/io/parquet.py", line 52, in get_engine
    raise ImportError(
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
Trying to import the above resulted in these errors:
 - Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
 - Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet.
But everything works fine when using the REPL:
Copy code
$ ./pants repl ::
In [1]: import pandas
In [2]: pandas.DataFrame().to_parquet('<s3://data-sources-XXX/wtf.parq>')
In [3]: pandas.read_parquet('<s3://data-sources-XXX/wtf.parq>')
Out[3]: 
Empty DataFrame
Columns: []
Index: []
e
Both the lambda and the pex are zip files that contain a json PEX-INFO file at the top-level. Can you share both PEX-INFO files?
b
Lambda:
Copy code
{
  "bootstrap_hash": "9ae3753c8b8c0bbf289e195e0bb72485c276bb20",
  "build_properties": {
    "pex_version": "2.1.102"
  },
  "code_hash": "0fbe85458d796338e23ecfd5bbbc9cc2868b6827",
  "distributions": {
    "aiobotocore-2.4.0-py3-none-any.whl": "d525bdba3051dbce14156221f35dbfc4c6d561642b0b0683288cad30b2ace8d2",
    "aiohttp-3.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "bbeefcbf8e7f0ba0c95faf36a70bb70ec5b3de8c3e40998f5d46e59915bba973",
    "aioitertools-0.11.0-py3-none-any.whl": "4f0ed38f3d141ce857cd541945da59c929ddc84f78e87962b05f0133742a9b2f",
    "aiosignal-1.2.0-py3-none-any.whl": "4dcd54c010a637de9258977e9b043a7e3ce7609de2de7d338b3af6dc512bd769",
    "async_timeout-4.0.2-py3-none-any.whl": "7a304d75499b9c9070da920a841c61b687b137718ca94e5ee8f0230e1fd84f7d",
    "attrs-22.1.0-py2.py3-none-any.whl": "ba3b3a2a12f1974564901d180c62dfd8f34855516420772702fda17f280db9b5",
    "botocore-1.27.59-py3-none-any.whl": "81bc68c3656887489cd9e2f6a0316bf4bb1c7e91baee53dee8262c8ae6983c8a",
    "certifi-2022.9.24-py3-none-any.whl": "195a12b6b851945bf7cfcbbee22018272ff45f8f1a515b90f7144c8e27e2afdc",
    "charset_normalizer-2.1.1-py3-none-any.whl": "c4dcf068d12625d4115f5e98cca5b90c18403837015e82bdf2a5cbc80f01d32d",
    "cramjam-2.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl": "46c4b611c873dbdd5d730c9ef279b51a5bd4a4e4703285618c69f83a010dbaee",
    "fastparquet-0.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "bb1da2c01e9a09382f093ca5693b8bf45a7bf78887004360e450e76cd8603a72",
    "frozenlist-1.3.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "c6a8e543bf63695ebd5c8811da7bd775d2fe054bd88231611ac0333117a2b5e8",
    "fsspec-2022.8.2-py3-none-any.whl": "951027ba0e5feec3cf1ed388b6b1b4159178759223b595af59293df32926e880",
    "idna-3.4-py3-none-any.whl": "3baea0ffbdea9f349783bacbdb82dbd50eadfec75339388aaf1597754786e3ed",
    "jmespath-1.0.1-py3-none-any.whl": "299c3a18595a39d4d54252e98a3c39343899b9bd1997e3eebfee63ea5757588b",
    "multidict-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "ff719624f5d5b5b03a124812fb93d52d84c26339e40d7e08cd43fc500e3ec9a9",
    "numpy-1.23.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "0bbc0ebba28ce792957b68df8f21e58e4099ce0a899a0a756e20c7f6a8d17303",
    "packaging-21.3-py3-none-any.whl": "a728bb5d35998463aee41e8a69df75fe81e027553b019f51236950d765d9f132",
    "pandas-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "0e2227a9f4be0a9ef2995a85f08c068f142d53f2fd8289da259c29ee633290de",
    "pyparsing-3.0.9-py3-none-any.whl": "a0d0acca03cd18219d6c8d6af5f1804e7142b9df0e3e05f9f7e6ae8f0b157511",
    "python_dateutil-2.8.2-py2.py3-none-any.whl": "202c27a293331dd8fa9d41d1fcdd5ba4f4d6d2de0a2f00fa8547adc7c1aac629",
    "pytz-2022.4-py2.py3-none-any.whl": "1cec25904f0a6d911ba9ed7fdb4108e6e044e49ce7273e69f0a768a2b31844e2",
    "requests-2.28.1-py3-none-any.whl": "aceadd0c9564fd6a3ded05afec41e816beee638b94feeccf30ec61da94a6c91d",
    "s3fs-2022.8.2-py3-none-any.whl": "e705415bdc2cb1b14aba27fb0579d6c126a49b196d849a1afca21b4592668bdf",
    "six-1.16.0-py2.py3-none-any.whl": "3e1c439c88d2e7681372427bab751b3fc99969891e95a714fed9604bf7710213",
    "typing_extensions-4.4.0-py3-none-any.whl": "219f52e56e51e3b9c1bc8cbdc0674e1efc73c626a2b5ebf8b87510a0530ad7b9",
    "urllib3-1.26.12-py2.py3-none-any.whl": "cc8d9aa8d61580df595254e4bd08cee6ff223cd393abbcd754fc708c7ac09260",
    "wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "7357d5567dc387b9ad388dddebd149ee811ed64f1128451d2a3cbee2bd25326d",
    "yarl-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "812f9b5f99e8a611aa97a3d3f5c5b2e57fef748c89f2c8a88f8b1639841210ef"
  },
  "emit_warnings": false,
  "ignore_errors": false,
  "includes_tools": false,
  "inherit_path": "false",
  "interpreter_constraints": [],
  "pex_hash": "2c5fccf932b3d8caf958b1b9e39ecf41464b6f78",
  "pex_path": null,
  "requirements": [
    "fastparquet",
    "pandas",
    "requests",
    "s3fs"
  ],
  "strip_pex_env": true,
  "venv": false,
  "venv_bin_path": "false",
  "venv_copies": false,
  "venv_site_packages_copies": false
}
PEX:
Copy code
{
  "bootstrap_hash": "9ae3753c8b8c0bbf289e195e0bb72485c276bb20",
  "build_properties": {
    "pex_version": "2.1.102"
  },
  "code_hash": "0fbe85458d796338e23ecfd5bbbc9cc2868b6827",
  "distributions": {
    "aiobotocore-2.4.0-py3-none-any.whl": "d525bdba3051dbce14156221f35dbfc4c6d561642b0b0683288cad30b2ace8d2",
    "aiohttp-3.8.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "9828075458cc80897f7ffc92b358239fa248bdfdad4ad6b13b8ef83b02b4a287",
    "aiohttp-3.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "bbeefcbf8e7f0ba0c95faf36a70bb70ec5b3de8c3e40998f5d46e59915bba973",
    "aioitertools-0.11.0-py3-none-any.whl": "4f0ed38f3d141ce857cd541945da59c929ddc84f78e87962b05f0133742a9b2f",
    "aiosignal-1.2.0-py3-none-any.whl": "4dcd54c010a637de9258977e9b043a7e3ce7609de2de7d338b3af6dc512bd769",
    "async_timeout-4.0.2-py3-none-any.whl": "7a304d75499b9c9070da920a841c61b687b137718ca94e5ee8f0230e1fd84f7d",
    "attrs-22.1.0-py2.py3-none-any.whl": "ba3b3a2a12f1974564901d180c62dfd8f34855516420772702fda17f280db9b5",
    "botocore-1.27.59-py3-none-any.whl": "81bc68c3656887489cd9e2f6a0316bf4bb1c7e91baee53dee8262c8ae6983c8a",
    "certifi-2022.9.24-py3-none-any.whl": "195a12b6b851945bf7cfcbbee22018272ff45f8f1a515b90f7144c8e27e2afdc",
    "charset_normalizer-2.1.1-py3-none-any.whl": "c4dcf068d12625d4115f5e98cca5b90c18403837015e82bdf2a5cbc80f01d32d",
    "cramjam-2.5.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl": "e30587453ab784285db8a13758863d3f2eab8b016f8b03066e750f8f98eb7913",
    "cramjam-2.5.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl": "46c4b611c873dbdd5d730c9ef279b51a5bd4a4e4703285618c69f83a010dbaee",
    "fastparquet-0.8.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "2a74dfbf017fb7f6281f2e33be95cb87788ab6d24bfcbdd7de85ce0015e2d7b7",
    "fastparquet-0.8.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "bb1da2c01e9a09382f093ca5693b8bf45a7bf78887004360e450e76cd8603a72",
    "frozenlist-1.3.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "7d628183e8987d4199cd33c7fc502950fcd8684a5feb049c51742a8a8132d2ae",
    "frozenlist-1.3.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "c6a8e543bf63695ebd5c8811da7bd775d2fe054bd88231611ac0333117a2b5e8",
    "fsspec-2022.8.2-py3-none-any.whl": "951027ba0e5feec3cf1ed388b6b1b4159178759223b595af59293df32926e880",
    "idna-3.4-py3-none-any.whl": "3baea0ffbdea9f349783bacbdb82dbd50eadfec75339388aaf1597754786e3ed",
    "jmespath-1.0.1-py3-none-any.whl": "299c3a18595a39d4d54252e98a3c39343899b9bd1997e3eebfee63ea5757588b",
    "multidict-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "94eb3be9cc9a1659da4832cc53912a5fc5914eedff48ef911e4cdc387f36dd4b",
    "multidict-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "ff719624f5d5b5b03a124812fb93d52d84c26339e40d7e08cd43fc500e3ec9a9",
    "numpy-1.23.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "3d50c13171cf7354542cc0aed9941de1b59cd38bca67307ccd20c5100929e3a1",
    "numpy-1.23.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "0bbc0ebba28ce792957b68df8f21e58e4099ce0a899a0a756e20c7f6a8d17303",
    "packaging-21.3-py3-none-any.whl": "a728bb5d35998463aee41e8a69df75fe81e027553b019f51236950d765d9f132",
    "pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "88bcadcfdf4da04007789226d95a41ecd54fbf5d42dae61dfed54ed562890b58",
    "pandas-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "0e2227a9f4be0a9ef2995a85f08c068f142d53f2fd8289da259c29ee633290de",
    "pyparsing-3.0.9-py3-none-any.whl": "a0d0acca03cd18219d6c8d6af5f1804e7142b9df0e3e05f9f7e6ae8f0b157511",
    "python_dateutil-2.8.2-py2.py3-none-any.whl": "202c27a293331dd8fa9d41d1fcdd5ba4f4d6d2de0a2f00fa8547adc7c1aac629",
    "pytz-2022.4-py2.py3-none-any.whl": "1cec25904f0a6d911ba9ed7fdb4108e6e044e49ce7273e69f0a768a2b31844e2",
    "requests-2.28.1-py3-none-any.whl": "aceadd0c9564fd6a3ded05afec41e816beee638b94feeccf30ec61da94a6c91d",
    "s3fs-2022.8.2-py3-none-any.whl": "e705415bdc2cb1b14aba27fb0579d6c126a49b196d849a1afca21b4592668bdf",
    "six-1.16.0-py2.py3-none-any.whl": "3e1c439c88d2e7681372427bab751b3fc99969891e95a714fed9604bf7710213",
    "types_requests-2.28.11.2-py3-none-any.whl": "fa23a37e6fc398c11cb931b58a229132a820056376c9098522e70825d9967e9d",
    "types_urllib3-1.26.25-py3-none-any.whl": "8bd934ca6ca8f9fb4b6feca1181b0a229fb6dd0e3530b917591d71b59f35fe79",
    "typing_extensions-4.4.0-py3-none-any.whl": "219f52e56e51e3b9c1bc8cbdc0674e1efc73c626a2b5ebf8b87510a0530ad7b9",
    "urllib3-1.26.12-py2.py3-none-any.whl": "cc8d9aa8d61580df595254e4bd08cee6ff223cd393abbcd754fc708c7ac09260",
    "wrapt-1.14.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "cb54f7edaf503121bbe87a8474875fa5b512c07dc0eaeb671faef0f5bc2d45c7",
    "wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "7357d5567dc387b9ad388dddebd149ee811ed64f1128451d2a3cbee2bd25326d",
    "yarl-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "e15860a428e5f58b5ea726ff0fb86867139889c175e03f7fa21d2daa0a8816aa",
    "yarl-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl": "812f9b5f99e8a611aa97a3d3f5c5b2e57fef748c89f2c8a88f8b1639841210ef"
  },
  "emit_warnings": false,
  "entry_point": "data_fred.lambda",
  "ignore_errors": false,
  "includes_tools": false,
  "inherit_path": "false",
  "interpreter_constraints": [
    "CPython<4,>=3.9"
  ],
  "pex_hash": "9116caed4bcfdccf86c6235116cf48df57ae7464",
  "pex_path": null,
  "requirements": [
    "fastparquet",
    "pandas",
    "requests",
    "s3fs",
    "types-requests"
  ],
  "strip_pex_env": true,
  "venv": false,
  "venv_bin_path": "false",
  "venv_copies": false,
  "venv_site_packages_copies": false
}
e
Thank you. So does the packaged lambda work for you on your machine if you run it as
python3.9 lambda.zip
?
Aha, How did you "try to run it locally" for the lambda? Your log from that shows:
Copy code
* The NumPy version is: "1.21.5"
But the PEX-INFO shows numpy-1.23.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
So somehow your run of the lambda picks up some other installation of numpy. PEX should prevent this, but clearly its not. I wonder if the same thing is happening on AWS? Can you grab the detailed logs from there and compare?
Even better, if you can set a
PEX_VERBOSE=3
environment variable for both your local run of the lambda and the run on AWS, that will reveal the sys.path PEX assembles at runtime which should reveal how the rogue numpy leaks in.
One final thing - how do you run your lambda on AWS? Do you use a zip file directly or do you install that in a container? If you just use a zip file directly, you can try adding
execution_mode="venv"
to your
pex_binary
target and deploying that instead of the lambda. The only change you need is to set your lambda handler as prefixed with
__pex__.
; so I think
__pex__.data_fred.lambda
(I'm not sure what `:function`name in your lambda.py is, but tack that on as needed).
b
Thank you. So does the packaged lambda work for you on your machine if you run it as
python3.9 lambda.zip
?
Copy code
$ python3.9 ./dist/packages.data-fred.src.data_fred/lambda.zip 
Python 3.9.14 (main, Sep  7 2022, 23:43:48) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
Not sure how I can run it that way?
Aha, How did you "try to run it locally" for the lambda?
I
python3.9 -mpip install --user lambdex
, which is probably why the environment differs. What would be a the good way to install lambdex in my monorepo environment?
Do you use a zip file directly or do you install that in a container?
I simply upload the
lambda.zip
file
Can you grab the detailed logs from there and compare?
https://pastebin.com/UYvZR65d
One final thing - how do you run your lambda on AWS? Do you use a zip file directly or do you install that in a container? If you just use a zip file directly, you can try adding
execution_mode="venv"
to your
pex_binary
target and deploying that instead of the lambda. The only change you need is to set your lambda handler as prefixed with
__pex__.
; so I think
__pex__.data_fred.lambda
(I'm not sure what `:function`name in your lambda.py is, but tack that on as needed). (edited)
Cannot do that, because the package gets too large and I get this error on AWS side:
Unzipped size must be smaller than 262144000 bytes
The file itself is 81M, and once unzipped, it's 331M.
For comparison,
lambda.zip
is 48M.
What would be a the good way to install lambdex in my monorepo environment?
I simply added a
python_requirement()
in my project, then exported the virtualenv. That way the lambda function works fine locally:
Copy code
$ lambdex test ./dist/packages.data-fred.src.data_fred/lambda.zip /tmp/event.json 
INFO:data_fred.api:Downloading observations for series WDTGAL
INFO:data_fred.lambda:Writing observations to <s3://data-sources-XXX/fred/WDTGAL.parquet>
(is there an easiest way to achieve this?)
Here is the local run verbose logging as requested: https://pastebin.com/EjJYjuFS
I tried to understand the difference of how s3fs is handled in the local run and the remote one, but I did not notice a difference. Did I miss some important hint?
Oh, the lambda works when I set
PEX_INHERIT_PATH=fallback
. TBH I kept note of it after reading https://pantsbuild.slack.com/archives/C046T6T9U/p1660086097619209 which is unrelated to that issue. So I am not sure why it's working with it.
Here is the log with
PEX_VERBOSE=3
and
PEX_INHERIT_PATH=fallback
for comaprison: https://pastebin.com/aqZHqmwX
No diff except the import issue
e
Alright - thank you for all that data. The full story is complex, but the upshot is PEX was failing in its contract of isolating you from the environment. Exactly as promised here: https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html botocore-1.23.32 was on the
sys.path
, (via
/var/runtime
) but this path element was not being scrubbed by PEX. As such, that botocore would leak into the PEX environment and the underlying error actually was this:
Copy code
ImportError: cannot import name 'apply_request_checksum' from 'botocore.client' (/var/runtime/botocore/client.py)
  File "/var/task/data_fred/lambda_function.py", line 10, in debug_import
    module = importlib.import_module(module_name)
  File "/var/lang/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/var/task/.deps/s3fs-2022.8.2-py3-none-any.whl/s3fs/__init__.py", line 1, in <module>
    from .core import S3FileSystem, S3File
  File "/var/task/.deps/s3fs-2022.8.2-py3-none-any.whl/s3fs/core.py", line 26, in <module>
    import aiobotocore.session
  File "/var/task/.deps/aiobotocore-2.4.0-py3-none-any.whl/aiobotocore/session.py", line 12, in <module>
    from .client import AioBaseClient, AioClientCreator
  File "/var/task/.deps/aiobotocore-2.4.0-py3-none-any.whl/aiobotocore/client.py", line 2, in <module>
    from botocore.client import (
By using
PEX_INHERIT_PATH=fallback
you were forcing the
/var/runtime
sys.path leak to the back of the line which had the effect of getting the import from the PEX version of botocore which leads to the above backtrace not occuring. To drive this home, the botocore that ships with Lambda Python3.9 runtime is not compatible with your requirements. If add them in I get:
Copy code
$ pex --python python3.9 fastparquet panda requests s3fs boto3==1.20.32 botocore==1.23.32
Failed to resolve compatible distributions:
1: aiobotocore==2.4.0 requires botocore<1.27.60,>=1.27.59 but botocore 1.23.32 was resolved
In closing, you found a workaround and hopefully you understand what it works around / why it works. I now do at any rate and I have a fix for PEX that gets it scrubbing
/var/runtime
from the
sys.path
like it should be which obviates the need for the workaround. I'll post a link to the Pex issue here shortly once I've created it. Thanks again for all the data you provided here Phillippe.
b
Thanks for your precious help @enough-analyst-54434, you put me on the right track to find that workaround 🙂
e
@bumpy-noon-80834 it sounds like you have a workaround, but if you're eager to get rid of it you can upgrade your Pants configuration to pull in the new Pex release with the fix: https://pantsbuild.slack.com/archives/C18RRR4JK/p1665439516603199 You'd do that with:
Copy code
[pex-cli]
version = "v2.1.110"
known_versions = [
  "v2.1.110|macos_arm64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
  "v2.1.110|macos_x86_64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
  "v2.1.110|linux_x86_64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
  "v2.1.110|linux_arm64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599"
]
b
Great, I'll give it a try!
@enough-analyst-54434 Doesn't work on my side. I added your snippet ton
pants.toml
and removed `PEX_INHERIT_PATH=fallback`from the Lambda environment:
Copy code
$ git diff
diff --git a/pants.toml b/pants.toml
index 988ad2d..3dd0566 100644
--- a/pants.toml
+++ b/pants.toml
@@ -30,3 +30,14 @@ resolves = { python-default = "python/default.lock" }
 
 [repl]
 shell = "ipython"
+
+# Workaround for Lambda package dependencies issue
+# ref: <https://pantsbuild.slack.com/archives/C046T6T9U/p1665439829888769?thread_ts=1665268569.124139&cid=C046T6T9U>
+[pex-cli]
+version = "v2.1.110"
+known_versions = [
+  "v2.1.110|macos_arm64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
+  "v2.1.110|macos_x86_64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
+  "v2.1.110|linux_x86_64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599",
+  "v2.1.110|linux_arm64|f0b291e8d8fb3386eba98548ede694b7f5b9e60e926eec6a580b6f35eb778209|4062599"
+]
diff --git a/typescript/infrastructure/src/deployment/lambda/data-downloader-fred.ts b/typescript/infrastructure/src/deployment/lambda/data-downloader-fred.ts
index c5465c8..055a9b8 100644
--- a/typescript/infrastructure/src/deployment/lambda/data-downloader-fred.ts
+++ b/typescript/infrastructure/src/deployment/lambda/data-downloader-fred.ts
@@ -23,7 +23,6 @@ export const deployDataDownloaderFredLambdaFunction = ({
         variables: {
           DATA_SOURCES_BUCKET_ID,
           FRED_API_KEY: new Config().requireSecret("data_fred_api_key"),
-          PEX_INHERIT_PATH: "fallback",
         },
       },
       tags: {
Then I rebuilt the Lambda bundle and deployed it. And I end up with the initial issue:
e
Hrm, that directly contradicts my experiments deploying lambdas and testing them using the AWS console as detailed on https://github.com/pantsbuild/pex/issues/1944. One difference is I did not use lambdex, I just used the PEX (with .zip extension) directly + the magic
__pex__
import prefix.
Let me see if I repro using lambdex + pex 2.1.110 which is ... ah, wait a sec. I think I led you to use the wrong pants.toml config change. I think the lambdex rules use an independent lock file that pins the Pex version.
Right, sorry about that. You can remove the configuration I asked you to use and add this instead:
Copy code
[lambdex]
extra_requirements = ["pex==2.1.110"]
lockfile = "some/relative/path/in/your/repo/yu/want/this/to/be/stored/lambdex.lock
Then run ``./pants generate-lockfiles --resolve=lambdex` and then proceed to re-build and re-deploy the lambdex.
I'm sorry about the bad config advice!
And a pro-tip that we should be integrating at some point here in the Pants lambdex builds - at which point you wont need to worry - you want to set
PEX_ROOT: "/tmp/.pex"
in your AWS config for a fast lambdex.
b
Same result, unfortunately!
BTW, I did not pass the
--resolve=lambdex
parameter to
generate-lockfiles
and it updated python-default's lockfile too:
Copy code
diff --git a/python/default.lock b/python/default.lock
index 350b8ac..8f70163 100644
--- a/python/default.lock
+++ b/python/default.lock
@@ -1237,13 +1237,13 @@
           "artifacts": [
             {
               "algorithm": "sha256",
-              "hash": "d405477cb1fb625d753d619fc4f76fb45942cd866780d96b7e042a293a2a0c7f",
-              "url": "<https://files.pythonhosted.org/packages/6d/bf/6b947016519801aace42feeb2000be13cdf924c22afb05beafc6666cf680/pex-2.1.109-py2.py3-none-any.whl>"
+              "hash": "6dff72a3cf579a114418c642022583cd96945a3f061e9e622e581371f1cccc24",
+              "url": "<https://files.pythonhosted.org/packages/4d/9b/b19610eee28259e635f37e331b32d2b43a5248120f0d52e450a40e3e4b0d/pex-2.1.110-py2.py3-none-any.whl>"
             },
             {
               "algorithm": "sha256",
-              "hash": "1bd1dc0cb56f441f8e5e6729d92629dab494b1b39c2a3acdf0a7543103fc0caa",
-              "url": "<https://files.pythonhosted.org/packages/e6/c6/15d8bc7a7877ad5bb7b11889d961e1fadd14b80604f0480873506ca8424b/pex-2.1.109.tar.gz>"
+              "hash": "3181d44f155ce658752673b0d74b6b07a9c1888204c66f5cc5e1d1c8ac15ce15",
+              "url": "<https://files.pythonhosted.org/packages/81/67/ea3c2b17d1d9f4c860b8b7b2db12e48cc99c2fa604c92477d60c4def25ef/pex-2.1.110.tar.gz>"
             }
           ],
           "project_name": "pex",
@@ -1251,7 +1251,7 @@
             "subprocess32>=3.2.7; extra == \"subprocess\" and python_version < \"3\""
           ],
           "requires_python": "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,<3.12,>=2.7",
-          "version": "2.1.109"
+          "version": "2.1.110"
         },
         {
           "artifacts": [
Is that expected?
For reference:
Copy code
diff --git a/pants.toml b/pants.toml
index 988ad2d..1c8d43a 100644
--- a/pants.toml
+++ b/pants.toml
@@ -30,3 +30,9 @@ resolves = { python-default = "python/default.lock" }
 
 [repl]
 shell = "ipython"
+
+# Workaround for Lambda package dependencies issue
+# ref: <https://pantsbuild.slack.com/archives/C046T6T9U/p1665512318605209?thread_ts=1665268569.124139&cid=C046T6T9U>
+[lambdex]
+extra_requirements = ["pex==2.1.110"]
+lockfile = "python/lambdex.lock"
Copy code
$ grep -F pex-2.1. python/lambdex.lock 
              "url": "<https://files.pythonhosted.org/packages/4d/9b/b19610eee28259e635f37e331b32d2b43a5248120f0d52e450a40e3e4b0d/pex-2.1.110-py2.py3-none-any.whl>"
              "url": "<https://files.pythonhosted.org/packages/81/67/ea3c2b17d1d9f4c860b8b7b2db12e48cc99c2fa604c92477d60c4def25ef/pex-2.1.110.tar.gz>"
What change should I see in
lambda.zip
with that change?
I double-checked, and in my environment,
lambda.zip
has the same checksum with and without the
pants.toml
change.
e
Well there are not new files but there is this new class: https://github.com/pantsbuild/pex/blob/f6089f6822e34c9fa614fe77999cfbc6a54c63e2/pex/pex.py#L49-L124 So,
unzip -qc lambda.zip .bootstrap/pex/pex.py | grep IsolatedSysPath
should return some hits.
@#$! OK, yeah - here is how lambdex calls into Pex: https://github.com/pantsbuild/pex/blob/f6089f6822e34c9fa614fe77999cfbc6a54c63e2/pex/pex_bootstrapper.py#L674-L683 It bypasses the PEX class that does the scrubbing.
b
Without the change:
Copy code
$ md5sum dist/python.data-fred.src.data_fred/lambda.zip; unzip -qc dist/python.data-fred.src.data_fred/lambda.zip .bootstrap/pex/pex.py | grep IsolatedSysPath; echo $?
b29939f992c90a79efafd73b8097c615  dist/python.data-fred.src.data_fred/lambda.zip
1
$ grep /pex- python/lambdex.lock 
grep: python/lambdex.lock: No such file or directory
$ grep /pex- python/default.lock 
              "url": "<https://files.pythonhosted.org/packages/6d/bf/6b947016519801aace42feeb2000be13cdf924c22afb05beafc6666cf680/pex-2.1.109-py2.py3-none-any.whl>"
              "url": "<https://files.pythonhosted.org/packages/e6/c6/15d8bc7a7877ad5bb7b11889d961e1fadd14b80604f0480873506ca8424b/pex-2.1.109.tar.gz>"
With the change:
Copy code
$ rm dist/python.data-fred.src.data_fred/lambda.zip
$ git stash pop
[..]
$ grep /pex- python/lambdex.lock 
              "url": "<https://files.pythonhosted.org/packages/4d/9b/b19610eee28259e635f37e331b32d2b43a5248120f0d52e450a40e3e4b0d/pex-2.1.110-py2.py3-none-any.whl>"
              "url": "<https://files.pythonhosted.org/packages/81/67/ea3c2b17d1d9f4c860b8b7b2db12e48cc99c2fa604c92477d60c4def25ef/pex-2.1.110.tar.gz>"
$ ./pants package ./python/data-fred/::
22:09:27.94 [INFO] Initializing scheduler...
22:09:28.13 [INFO] Scheduler initialized.
22:09:29.17 [INFO] Wrote dist/python.data-fred.src.data_fred/lambda.zip
    Runtime: python3.9
    Handler: lambdex_handler.handler
$ md5sum dist/python.data-fred.src.data_fred/lambda.zip; unzip -qc dist/python.data-fred.src.data_fred/lambda.zip .bootstrap/pex/pex.py | grep IsolatedSysPath; echo $?
b29939f992c90a79efafd73b8097c615  dist/python.data-fred.src.data_fred/lambda.zip
1
Checked GH tag 2.1.110 and https://files.pythonhosted.org/packages/81/67/ea3c2b17d1d9f4c860b8b7b2db12e48cc99c2fa604c92477d60c4def25ef/pex-2.1.110.tar.gz Both have IsolatedSysPath. Looks like a weird caching issue on my side!?
e
Sorry about this. So, the answer is to not use `python_awslambda`unfortunately. Use this instead (and go back to the 1st pants.toml edit I asked you to make!):
Copy code
pex_binary(
  output_path="whatever/you/want/bu/end/in/zip/to/appease/aws/lambda.zip",
  platforms=["linux_x86_64-cp-39-cp39"],
  dependecies=[<same as python_aws_lambda target>]
)
The only other change is to the module name you hand AWS. Its now your actual handlers <module name> . <function name> but prefixed with
__pex__.
, so:
__pex__.data_fred.lambda_function
b
On it!
e
I did this
__pex__
magic importer work a while back to obviate the need for things like the Lambdex project but I just haven't gotten around to retiring the project and updating Pants docs and rules.
FWIW, assuming this works for you (it's now the exact experiment I used for https://github.com/pantsbuild/pex/issues/1944), you gain more - notably you can say
execution_mode="venv"
in your
pex_binary
target to gain the maximum compatibility with the Python ecosystem. Your example case did not need this, but some distributions do.
A final note - avoid
lambda.py
! That can be problematic since, if anything tries to import it you'll run into issues since
lambda
is a keyword. This is why I re-named to `lambda_function.py`in my experiment.
b
Okay, I integrated all your recommandations, and deployed it. Now it's validation time, suspens!
Hum, "Unable to import module '__pex__.data_fred.lambda_function': No module named '__pex__.data_fred"
e
Is
data_fred/lambda_function.py
in the zip?
b
lol, nope!
e
This is the bit about renaming. lambda.py is a very dangerous name
b
This must be bogus:
Copy code
pex_binary(
    name = "lambda",
    dependencies = [
        "!!:types-requests",
        ":s3fs",
        ":fastparquet",
    ],
    output_path = "data-fred-downloader-lambda.zip",
    platforms = ["linux_x86_64-cp-39-cp39"],
)
e
Don't use it!
Where is the dependency that adds the lambda function source file?
b
I did not, I renamed lambda.py to lambda_function.py
e
Ok, looks like you need a
python_source()
target to to own that file.
And add a dep top that target in the `pex_binary`dependencies list.
b
Where is the dependency that adds the lambda function source file?
That's my mistake! I am still learning Pants... 😅
e
Yeah, understood. Just check the zip before deploying to speed up iteration.
You can even
python3.9 lambda.zip -c 'from __pex__.data_fred import lambda_function; function({}, None)'
to try running the function 1st locally.
Or maybe add a print:
python3.9 lambda.zip -c 'from __pex__.data_fred import lambda_function; print(my_function({}, None))'
b
And it works well this time! Thanks @enough-analyst-54434!
😄
e
Thank you for hanging in there!
b
My pleasure sir 🙂
And a pro-tip that we should be integrating at some point here in the Pants lambdex builds - at which point you wont need to worry - you want to set
PEX_ROOT: "/tmp/.pex"
in your AWS config for a fast lambdex.
I understand that without setting
PEX_ROOT
, a temporary directory is created every time the function is called. Even if the lambda is warm on AWS side, and you suggest to set it to a static directory to ensure the venv is reused across calls of a pre-warmed lambda? I ask this because in this specific case, the function will only be called once every few hours, so I wouldn't expect any improvement.
But yeah, even in this case, it looks like a good habit to set it in my deployment code, so I'll have it on all lambdas, including others I'd call more often.
e
I understand that without setting
PEX_ROOT
, a temporary directory is created every time the function is called. Even if the lambda is warm on AWS side, and you suggest to set it to a static directory to ensure the venv is reused across calls of a pre-warmed lambda?
Thanks for typing out those words - those words make it clear you probably don't need this setting. As you say, once warm the function is pre-imported ... so no new tmp directory should be created per call. My thinking was fuzzy there. I'll update the issue and maybe close it!
b
It definitely helps me get a better understanding of the PEX machinery!
e
And me of the Lambda runtime machinery! Do more science is the cure I think.
👍 1
c
so… this is quite the necro and a testament to how great the Pants slack is I found reference to a problem I hit - I apologize for the resurrection! I managed to have a pinned botocore be overriden by a lambda default today… 🙂
pants_version = "2.16.0"
using
python_awslambda
- I’m sure I’m a bit behind the updates but not 2 years out of date, so I wondered if the Lambda base image has changed something sneaky? I can see my pinned botocore 1.30.0 in the zip, but I still got broken by the newer botocore for some reason