Hi, I'm trying to solve a problem that occurs when...
# general
p
Hi, I'm trying to solve a problem that occurs when I run
./pants export ::
in a package that has Horovod as a dependency. During installation, Horovod looks for TensorFlow and if it finds it, it compiles C++ code that is linked against the TensorFlow library. If not, the support for TensorFlow is disabled. Although I'm installing both Horovod and TensorFlow, the respective Horovod submodule is not being built. I've reduced my problem to the call to
pex
, for which a minimal example would be the following:
Copy code
# First try, as pants calls pex:
$ python -m pex --disable-cache --output-file hvd.pex horovod[tensorflow]
...
$ ./hvd.pex
>>> import tensorflow
>>> import horovod.tensorflow
ImportError: Extension horovod.tensorflow has not been built

# Retry, forcing Horovod to build its Tensorflow submodule:
$ HOROVOD_WITH_TENSORFLOW=1 python -m pex --disable-cache --output-file hvd.pex horovod[tensorflow]
  ERROR: Command errored out with exit status 1:
  [...lots of output...]
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
  ModuleNotFoundError: No module named 'tensorflow'
  CMake Error at /tmp/horovod-cmake-tmpmdzj4w8z/cmake/data/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
    Could NOT find Tensorflow (missing: Tensorflow_LIBRARIES) (Required is at
    least version "1.15.0")
  [...more output...]

# Just for comparison, pip works:
$ python -m venv tmpenv
$ source tmpenv/bin/activate
(tmpenv) $ HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod[tensorflow]
[...]
(tmpenv) $ python
>>> import tensorflow
>>> import horovod.tensorflow
>>> ^D
(tmpenv) $
I realize that it's mainly a problem of Horovod as a package, because it needs TensorFlow to already be installed in order to build its submodule, while it does not list TensorFlow as a build dependency. However, pip follows the order of the dependencies for installation, installing TensorFlow first and then Horovod. Is there a way to make it work with pants/pex? E.g. forcing the order of package installation as in pip, specifying (somehow) an additional build dependency for horovod, or some other way I might be missing? Thanks! PS. If this is not the right place to post this since it is about pex and not pants directly, please let me know where I should post it instead.
1
e
There is no way to do this. Pex installs every dependency in isolation in its own dedicated directory and so no installation can see any other. The installations are only combined at runtime when the PEX unpacks itself.
Pip, on the other hand, mutates a single global venv.
You would need to pre-build a horovod wheel using Pip, and then make that wheel available to Pex via Pants support for a custom find links repo or custom index: https://www.pantsbuild.org/docs/reference-python-repos
p
Thanks for the quick answer! I see, so a direct solution is not possible... I'm trying to avoid a pre-build wheel because I'm afraid it will be fragile, e.g. if it is built against a slightly different version of TensorFlow. I also tried to define a small dummy package that depends on Horovod and includes TensorFlow in its build dependencies, but it didn't work either. Shouldn't that work even with pex?
e
No. The rough build-time process is Pex 1st downloads all transitive dependencies. That results in a set of sdists and wheels. It then builds all sdists into wheels in parallel, each in its own isolated build directory. Finally, it installs each wheel into its own private ~venv directory in parallel. Then, finally, at runtime, the proper set of pre-installed wheel directories are either added to sys.path (in the case of a zipapp PEX) or they are added ti site-packages (in the case of a --venv PEX).
The short answer is that if the distribution does not build as you hope in isolation, it won't work with Pex or Pants. Your only alternatives at that point are to pre-build the wheel just so or chip in and fix the project.
p
Ok, so even in this case, Horodov is built independently of TensorFlow or my dummy package. I'll try pre-building the wheel for now. Just to make sure I understand correctly: If I would change the build dependencies of Horovod to include TensorFlow, it would work then, right?
e
Yes. That said, setup.py `build_requires`is known buggy. You'd want to accomplish this with a modern PEP-517 / PEP-518 build system requires instead.
p
Perfect, thanks for the help and the quick response!
e
I assume, if I were a project owner, I'd want that to only happen if an extra were specified by the user, i.e.:
horovod[tensorflow]
or something similar.
p
I think there is no mechanism for that, at least not that I know of. The extras can only define run time dependencies, but in this case it's both a build and a runtime dependency. It could be done by env variable while building, but not with a PEP-517 build system (as it is static and cannot depend on the environment)
e
There is, but a bit convoluted. Taking Pip as the example (since Pex uses it and supports
-r requirements.txt
), you can, only in a requirements file, say:
horovod --config-settings extra_build_requires=tensorflow
(see: https://pip.pypa.io/en/stable/reference/requirements-file-format/#per-requirement-options) and then, the PEP-517 build backend can use that setting to implement https://peps.python.org/pep-0517/#get-requires-for-build-wheel
That said, Pants does not allow you to use Pex support for
-r requirements
yet; so you couldn't do this anyway with Pants today, only with Pex directly.
p
Wow, good to know, thanks!