Hey Pants community! :jeans: Does anyone here hav...
# general
r
Hey Pants community! 👖 Does anyone here have experience managing SpaCy language models with pants? I’m trying to add the _`nb_core_news_lg`_ model into my dependencies. In a regular requirements.txt file I would just add the following link: https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl However, this results in a parse error... I’ve tried a bunch of combinations of adding that link in the requirements.txt, prefacing it with a
git+
specifying it like this: _`spacy[nb_core_news_lg]==3.60`_ and putting it into my
BUILD
file somehow. I’m having a hard time including both the package and the original space dependency. Has anyone faced this problem before? Could someone please point in the right direction for how to solve this? Thanks a lot! 💫
1
b
What’s the error output you see?
r
If I just use the regular link in the requirements.txt I get:
Copy code
Engine traceback:
  in `package` goal

ValueError: Invalid requirement '<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl>' in requirements.txt at line 28: Parse error at "'://githu'": Expected string_end
If I try
Copy code
nb_core_news_lg @ git+<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl#sha256=b0d83b282d68300c8d3fb00abd3e9403a3c71070ce9bc7ecec67e2fe7f9b0198>
I get
Copy code
OSError: [E050] Can't find model 'nb_core_news_lg'. It doesn't seem to be a Python package or a valid path to a data directory.
I tried adding
nb_core_news_lg
as a requirement to my build file:
Copy code
python_requirement(
    name = "nb_core_news_lg",
    requirements = ["git+<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl#sha256=b0d83b282d68300c8d3fb00abd3e9403a3c71070ce9bc7ecec67e2fe7f9b0198>"]
)
but I still get a parsing error
Copy code
InvalidTargetException: BUILD:17: Invalid requirement 'git+<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl#sha256=b0d83b282d68300c8d3fb00abd3e9403a3c71070ce9bc7ecec67e2fe7f9b0198>' in the 'requirements' field for the target //:nb_core_news_lg: Parse error at "'+https:/'": Expected string_end
h
This error sounds like the model was successfully resolved, but then was not aboe to be used?
Copy code
OSError: [E050] Can't find model 'nb_core_news_lg'. It doesn't seem to be a Python package or a valid path to a data directory.
FWIW I get an error even entirely outside of Pants:
Copy code
$ pip install 'nb_core_news_lg@git+<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl#sha256=b0d83b282d68300c8d3fb00abd3e9403a3c71070ce9bc7ecec67e2fe7f9b0198>'
...
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Is this wheel intended to be pip-installed?
Hmm, but
Copy code
pip install '<https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl>'
works
e
Why is no one trying "name @ url" and instead skipping straight to VCS?
Pex and Pip both handle straight URL but Pants has gotten in the way here for a long time now in various ways trying to parse reqs for its own needs.
Yeah, works fine:
Copy code
$ pex "nb-core-news-lg @ <https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl|https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl>"
Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import nb_core_news_lg
>>> nb_core_news_lg.__file__
'/home/jsirois/.pex/installed_wheels/b0d83b282d68300c8d3fb00abd3e9403a3c71070ce9bc7ecec67e2fe7f9b0198/nb_core_news_lg-3.6.0-py3-none-any.whl/nb_core_news_lg/__init__.py'
>>>
Note the URL is a GH releases URL; so not a repo. As such, a VCS url makes no sense.
h
John got ahead of me, but yes
nb_core_news_lg@https://github.com/explosion/spacy-models/releases/download/nb_core_news_lg-3.6.0/nb_core_news_lg-3.6.0-py3-none-any.whl
in your requirements.txt should work fine
Works for me in an example
r
Thanks a lot for your help, I really appreciate it! It's working now 🥳
By declaring the language module as a requirement in requirements.txt, I was able to import the module, but wasn't able to load it with
spacy.load("nb_core_news_lg")
. For anyone facing the same difficulties, I ended up not putting the module in the requirements.txt, but I declared it in the BUILD file as such
Copy code
python_requirement(
    name = "nb_core_news_sm",
    requirements = [
        "nb_core_news_sm@https://github.com/explosion/spacy-models/releases/download/nb_core_news_sm-3.6.0/nb_core_news_sm-3.6.0-py3-none-any.whl",
    ],
)

python_requirement( 
    name = "spacy",
    requirements = [
        "spacy<3.7.0"
    ],
    dependencies = [
        # "//:de_core_news_sm",
        "//:nb_core_news_sm"
    ]
)
This allowed me to load the language module using SpaCy's load mechanism. I have a feeling that this won't scale as it will always need to load all language modules declared here, but it will have to do for now.
h
Hmm, that is not necessary. That explicit
python_requirement
is entirely equivalent to having the dep in
requirements.txt
(there is an implicit
python_requirement
created for each entry in
requirements.txt
).
What makes the difference is not that, but the explicit dependency you added
That dep could also have been on
3rdparty/python#nb_core_news_lg
or wherever your requirements.txt lives
👖 1
The reason you need that explicit dep is that Pants can't infer a dep, because there is no
import
statement.
You're doing custom runtime data loading with
spacy.load("nb_core_news_lg")
and Pants doesn't know about those.
If you will have a lot of these you could write a custom plugin that infers dependencies from those spacy.load() statements
🙏 1