Skip to content

Commit

Permalink
Prepare release of TF-DF 1.9.0 and update installation instructions
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 615495475
  • Loading branch information
rstz authored and copybara-github committed Mar 13, 2024
1 parent 632d813 commit 7d9a245
Show file tree
Hide file tree
Showing 10 changed files with 263 additions and 99 deletions.
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## 1.9.0rc0 - 2024-02-26
## 1.9.0 - 2024-03-12

### Fix

Expand All @@ -10,8 +10,15 @@
### Features

- Compatibility with TensorFlow 2.16.0rc0.
- Expose new parameter sparse_oblique_max_num_projections.
- Using tf_keras instead tf.keras in examples, documentation.
- Support NAConditions for fast engine.
- Faster model loading for models with many features and dense oblique
conditions.

### Documentation

- Clarified documentation of parameters for oblique splits.

## 1.8.1 - 2023-11-17

Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,6 @@ The following resources are available:
- [Issue tracker](https://github.com/tensorflow/decision-forests/issues)
- [Known issues](documentation/known_issues.md)
- [Changelog](CHANGELOG.md)
- [TensorFlow Forum](https://discuss.tensorflow.org) (on
discuss.tensorflow.org)
- [More examples](documentation/more_examples.md)

## Installation
Expand Down
6 changes: 3 additions & 3 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
# absl used by tensorflow.
http_archive(
name = "org_tensorflow",
strip_prefix = "tensorflow-2.15.0",
sha256 = "9cec5acb0ecf2d47b16891f8bc5bc6fbfdffe1700bdadc0d9ebe27ea34f0c220",
urls = ["https://github.com/tensorflow/tensorflow/archive/v2.15.0.zip"],
strip_prefix = "tensorflow-2.16.1",
sha256 = "c729e56efc945c6df08efe5c9f5b8b89329c7c91b8f40ad2bb3e13900bd4876d",
urls = ["https://github.com/tensorflow/tensorflow/archive/v2.16.1.tar.gz"],
# Starting with TF 2.14, disable hermetic Python builds.
patch_args = ["-p1"],
patches = ["//third_party/tensorflow:tf.patch"],
Expand Down
17 changes: 11 additions & 6 deletions configure/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,20 @@
from setuptools.command.install import install
from setuptools.dist import Distribution

_VERSION = "1.9.0rc0"
_VERSION = "1.9.0"

with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()

REQUIRED_PACKAGES = [
"numpy",
"pandas",
"tensorflow~=2.16.0rc0",
"tensorflow~=2.16.1",
"six",
"absl_py",
"wheel",
"wurlitzer",
"tf_keras~=2.16.0rc2",
"tf_keras~=2.16",
]


Expand Down Expand Up @@ -84,8 +84,10 @@ def get_tag(self):
name="tensorflow_decision_forests",
version=_VERSION,
author="Google Inc.",
author_email="[email protected]",
description="Collection of training and inference decision forest algorithms.",
author_email="[email protected]",
description=(
"Collection of training and inference decision forest algorithms."
),
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/tensorflow/decision-forests",
Expand Down Expand Up @@ -113,7 +115,10 @@ def get_tag(self):
packages=setuptools.find_packages(),
python_requires=">=3.9",
license="Apache 2.0",
keywords="tensorflow tensor machine learning decision forests random forest gradient boosted decision trees",
keywords=(
"tensorflow tensor machine learning decision forests random forest"
" gradient boosted decision trees"
),
install_requires=REQUIRED_PACKAGES,
include_package_data=True,
zip_safe=False,
Expand Down
157 changes: 104 additions & 53 deletions documentation/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
* [Table of Contents](#table-of-contents)
* [Installation with Pip](#installation-with-pip)
* [Build from source](#build-from-source)
* [Technical details](#technical-details)
* [Linux](#linux)
* [Setup](#setup)
* [Compilation](#compilation)
* [Docker build](#docker-build)
* [Manual build](#manual-build)
* [MacOS](#macos)
* [Setup](#setup-1)
* [Building / Packaging (Apple CPU)](#building---packaging-apple-cpu)
* [Setup](#setup)
* [Arm64 CPU](#arm64-cpu)
* [Cross-compiling for Intel CPUs](#cross-compiling-for-intel-cpus)
* [Final note](#final-note)
* [Troubleshooting](#troubleshooting)
* [Windows](#windows)

<!--te-->

Expand All @@ -44,24 +44,74 @@ python3 -c "import tensorflow_decision_forests as tfdf; print('Found TF-DF v' +

## Build from source

### Technical details

TensorFlow Decision Forests (TF-DF) implements custom ops for TensorFlow and
therefore depends on TensorFlow's ABI. Since the ABI can change between
versions, any TF-DF version is only compatible with one specific TensorFlow
version.

To avoid compiling and shipping all of TensorFlow with TF-DF, TF-DF
links against libtensorflow shared library that is distributed with TensorFlow's
Pip package. Only a small part of Tensorflow is compiled and compilation only
takes ~10 minutes on a strong workstation (instead of multiple hours when
compiling all of TensorFlow). To ensure this works, the version of TensorFlow
that is actually compiled and the libtensorflow shared library must match
exactly.

The `tools/test_bazel.sh` script configures the TF-DF build to ensure the
versions of the packages used match. For details on this process, see the source
code of this script. Since TensorFlow compilation changes often, it only
supports building with the most recent TensorFlow versions and nightly.

**Note**: When distributing builds, you may set the `__git_version__` string in
`tensorflow_decision_forests/__init__.py` to identify the commit you built from.

### Linux

#### Setup
#### Docker build

The easiest way to build TF-DF on Linux is by using TensorFlow's build
[Build docker](https://github.com/tensorflow/build). Just run the following
steps to build:

```shell
./tools/start_compile_docker.sh # Start the docker, might require root
export RUN_TESTS=1 # Whether to run tests after build
export PY_VERSION=3.9 # Python version to use for build
# TensorFlow version to compile against. This must match exactly the version
# of TensorFlow used at runtime, otherwise TF-DF may crash unexpectedly.
export TF_VERSION=2.16.1 # Set to "nightly" for building with tf-nightly
./tools/test_bazel.sh
```

This places the compiled C++ code in the `bazel-bin` directory. Note that this
is a symbolic link that is not exposed outside the container (i.e. the build is
gone after leaving the container).

For building the wheels, run
```shell
tools/build_pip_package.sh ALL_VERSIONS INSTALL_PYENV
```

This will install [Pyenv](https://github.com/pyenv/pyenv) and
[Pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv) inside the docker
and use it to install Python in all supported versions for building. The wheels
are placed in the `dist/` subdirectory.

#### Manual build

Building TF-DF without the docker might be harder, and the team is probably not
able to help with this.

**Requirements**

- Bazel >= 3.7.2
- Bazel >= 6.3.0
- Python >= 3
- Git
- Python packages: numpy tensorflow pandas

Instead of installing the dependencies by hands, you can use the
[TensorFlow Build docker](https://github.com/tensorflow/build). If you choose
this options, install Docker:
- Pyenv, Pyenv-virtualenv (only if packaging for many Python versions)

- [Docker](https://docs.docker.com/get-docker/).

#### Compilation
**Building**

Download TensorFlow Decision Forests as follows:

Expand All @@ -71,31 +121,22 @@ git clone https://github.com/tensorflow/decision-forests.git
cd decision-forests
```

**Optional:** TensorFlow Decision Forests depends on
*Optional:* TensorFlow Decision Forests depends on
[Yggdrasil Decision Forests](https://github.com/google/yggdrasil-decision-forests)
. If you want to edit the Yggdrasil code, you can clone the Yggdrasil repository
and change the path accordingly in
`third_party/yggdrasil_decision_forests/workspace.bzl`.

**Optional:** If you want to use the docker option, run the
`start_compile_docker.sh` script and continue to the next step. If you don't
want to use the docker option, continue to the next step directly.

```shell
# Optional: Install and start the build docker.
./tools/start_compile_docker.sh
```

Compile and run the unit tests of TF-DF with the following command. Note that
`test_bazel.sh` is configured for `python3.8` and the default compiler on your
machine. Edit the file directly to change this configuration.
`test_bazel.sh` is configured for the default compiler on your machine. Edit the
file directly to change this configuration.

```shell
# Build and test TF-DF.
./tools/test_bazel.sh
RUN_TESTS=1 PY_VERSION=3.9 TF_VERSION=2.16.1 ./tools/test_bazel.sh
```

Create and test a pip package with the following command. Replace python3.8 by
Create and test a pip package with the following command. Replace python3.9 by
the version of python you want to use. Note that you don't have to use the same
version of Python as in the `test_bazel.sh` script.

Expand Down Expand Up @@ -154,25 +195,28 @@ For MacOS systems with ARM64 CPU, follow these steps:

1. Prepare your environment

```
```shell
git clone https://github.com/tensorflow/decision-forests.git
python3 -m venv venv
source venv/source/activate
source venv/bin/activate
```

1. Decide which Python version and TensorFlow version you want to use and run

```
```shell
cd decision-forests
export TF_VERSION=2.15.0 # Change to the TensorFlow Version you need.
export PY_VERSION=3.9 # Change to the Python you need.
export RUN_TESTS=1 # Change to 0 if you want to skip tests.
./tools/test_bazel.sh # Takes ~15 minutes on a modern Mac.
bazel clean --expunge # Remove old builds (esp. cross-compiled).
export RUN_TESTS=1 # Whether to run tests after build.
export PY_VERSION=3.9 # Python version to use for build.
# TensorFlow version to compile against. This must match exactly the version
# of TensorFlow used at runtime, otherwise TF-DF may crash unexpectedly.
export TF_VERSION=2.16.1
./tools/test_bazel.sh # Takes ~15 minutes on a modern Mac.
```

1. Package the code.
1. Package the build.

```
```shell
# Building the packages uses different virtualenvs through Pyenv.
deactivate
# Build the packages.
Expand All @@ -188,36 +232,43 @@ machines with Intel CPUs as follows.

1. Prepare your environment

```
```shell
git clone https://github.com/tensorflow/decision-forests.git
python3 -m venv venv
source venv/source/activate
```

1. Decide which Python version you want to use and run

```
```shell
cd decision-forests
export TF_VERSION=2.15.0 # Change to the TensorFlow Version you need.
export PY_VERSION=3.9 # Change to the Python you need.
export RUN_TESTS=0 # Cross-compiled packages cannot be tested.
export MAC_INTEL_CROSSCOMPILE=1
./tools/test_bazel.sh # Takes ~15 minutes on a modern Mac.
bazel clean --expunge # Remove old builds (esp. cross-compiled).
export RUN_TESTS=0 # Cross-compiled builds can't run tests.
export PY_VERSION=3.9 # Python version to use for build.
# TensorFlow version to compile against. This must match exactly the version
# of TensorFlow used at runtime, otherwise TF-DF may crash unexpectedly.
export TF_VERSION=2.16.1
export MAC_INTEL_CROSSCOMPILE=1 # Enable cross-compilation.
./tools/test_bazel.sh # Takes ~15 minutes on a modern Mac.
```

1. Package the code.
1. Package the build.

```
```shell
# Building the packages uses different virtualenvs through Pyenv.
deactivate
# Build the packages.
./tools/build_pip_package.sh ALL_VERSIONS_MAC_INTEL_CROSSCOMPILE
```

1. The packages can be found in `decision-forests/dist/`.
1. The packages can be found in `decision-forests/dist/`. Note that they have
not been tested and it would be prudent to test them before distribution.

### Windows

## Final note
A Windows build has been successfully produced in the past, but is not
maintained at this point. See `tools/test_bazel.bat` and `tools/test_bazel.sh`
for (possibly outdated) pointers for compiling on Windows.

Compiling TF-DF relies on the TensorFlow Pip package *and* the TensorFlow Bazel
dependency. Only a small part of TensorFlow will be compiled.
Compiling TF-DF on a single powerful workstation takes ~10 minutes.
For Windows users, [YDF](https://ydf.readthedocs.io) offers official Windows
builds and most of the functionality (and more!) of TF-DF.
26 changes: 18 additions & 8 deletions documentation/known_issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,26 @@ TensorFlow Decision Forests is not yet available as a Windows Pip package.
[Windows Subsystem for Linux (WSL)](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
on your Windows machine and follow the Linux instructions.

## Incompatibility with Keras 3

Compatibility with Keras 3 is not yet implemented. Use tf_keras or a TensorFlow
version before 2.16.

## Untested for conda

While TF-DF might work with Conda, this is not tested and we currently do not
maintain packages on conda-forge.

## Incompatibility with old or nightly versions of TensorFlow

TensorFlow [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) is
not compatible in between releases. Because TF-DF relies on custom TensorFlow
TensorFlow's [ABI](https://en.wikipedia.org/wiki/Application_binary_interface)
is not compatible in between releases. Because TF-DF relies on custom TensorFlow
C++ ops, each version of TF-DF is tied to a specific version of TensorFlow. The
last released version of TF-DF is always tied to the last released version of
TensorFlow.

For reasons, the current version of TF-DF might not be compatible with older
versions or with the nightly build of TensorFlow.
For these reasons, the current version of TF-DF might not be compatible with
older versions or with the nightly build of TensorFlow.

If using incompatible versions of TF and TF-DF, you will see cryptic errors such
as:
Expand All @@ -37,16 +47,16 @@ tensorflow_decision_forests/tensorflow/ops/training/training.so: undefined symbo

- Use the version of TF-DF that is compatible with your version of TensorFlow.

Note that TF-DF is not compatible with Keras 3 at this time.

### Compatibility table

The following table shows the compatibility between
`tensorflow_decision_forests` and its dependencies:

tensorflow_decision_forests | tensorflow
--------------------------- | ---------------
1.6.0 | 2.14.0
1.9.0 | 2.16.1
1.8.0 - 1.8.1 | 2.15.0
1.6.0 - 1.7.0 | 2.14.0
1.5.0 | 2.13.0
1.3.0 - 1.4.0 | 2.12.0
1.1.0 - 1.2.0 | 2.11.0
Expand All @@ -72,7 +82,7 @@ does.

**Workarounds:**

- Use a model that support distribution strategies (e.g.
- Use a model that supports distribution strategies (e.g.
`DistributedGradientBoostedTreesModel`), or downsample your dataset so that
it fits on a single machine.

Expand Down
Loading

0 comments on commit 7d9a245

Please sign in to comment.