Skip to content

Commit

Permalink
Merge branch 'master' into yiweny-patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
akihironitta authored Oct 14, 2024
2 parents 8bdc5e9 + d1f63cd commit 5d82a1f
Show file tree
Hide file tree
Showing 36 changed files with 468 additions and 137 deletions.
4 changes: 2 additions & 2 deletions .github/actions/setup/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Setup
inputs:
python-version:
required: false
default: '3.8'
default: '3.9'
torch-version:
required: false
default: '2.2.0'
Expand All @@ -16,7 +16,7 @@ runs:

steps:
- name: Set up Python ${{ inputs.python-version }}
uses: actions/setup-python@v4.3.0
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
check-latest: true
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ jobs:
uses: tj-actions/changed-files@v41
with:
files: |
docs/**
examples/**
README.md
CHANGELOG.md
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: 3.9
- name: Install dependencies
run: |
pip install -e '.[full,test]' -f https://download.pytorch.org/whl/cpu
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ venv/*
*.out
data/**
catboost_info/
.pt_tmp/
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ ci:

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: no-commit-to-branch
name: No commits to master
Expand Down Expand Up @@ -38,7 +38,7 @@ repos:
hooks:
- id: pyupgrade
name: Upgrade Python syntax
args: [--py38-plus]
args: [--py39-plus]

- repo: https://github.com/PyCQA/autoflake
rev: v2.3.1
Expand Down Expand Up @@ -67,21 +67,21 @@ repos:
name: Sort imports

- repo: https://github.com/PyCQA/flake8
rev: 7.1.0
rev: 7.1.1
hooks:
- id: flake8
name: Check PEP8
additional_dependencies: [Flake8-pyproject]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.0
rev: v0.6.9
hooks:
- id: ruff
name: Ruff formatting
args: [--fix, --exit-non-zero-on-fix]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.1
rev: v1.11.2
hooks:
- id: mypy
name: Check types
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added a benchmark script to compare PyTorch Frame with PyTorch Tabular ([#398](https://github.com/pyg-team/pytorch-frame/pull/398), [#444](https://github.com/pyg-team/pytorch-frame/pull/444))
- Added `is_floating_point` method to `MultiNestedTensor` and `MultiEmbeddingTensor` ([#445](https://github.com/pyg-team/pytorch-frame/pull/445))
- Added support for inferring `stype.categorical` from boolean columns in `utils.infer_series_stype` ([#421](https://github.com/pyg-team/pytorch-frame/pull/421))

### Changed
Expand All @@ -17,8 +19,11 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Removed

- Dropped support for Python 3.8 ([#462](https://github.com/pyg-team/pytorch-frame/pull/462))

### Fixed

- Fixed size mismatch `RuntimeError` in `transforms.CatToNumTransform` ([#446](https://github.com/pyg-team/pytorch-frame/pull/446))
- Removed CUDA synchronizations from `nn.LinearEmbeddingEncoder` ([#432](https://github.com/pyg-team/pytorch-frame/pull/432))
- Removed CUDA synchronizations from N/A imputation logic in `nn.StypeEncoder` ([#433](https://github.com/pyg-team/pytorch-frame/pull/433), [#434](https://github.com/pyg-team/pytorch-frame/pull/434))

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ The benchmark script for Hugging Face text encoders is in this [file](https://gi

## Installation

PyTorch Frame is available for Python 3.8 to Python 3.11.
PyTorch Frame is available for Python 3.9 to Python 3.11.

```
pip install pytorch_frame
Expand Down
38 changes: 25 additions & 13 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,16 +196,28 @@ Experimental setting: 20 Optuna search trials for XGBoost, CatBoost and LightGBM

Experimental setting: 20 Optuna search trials for XGBoost, CatBoost and LightGBM. 3 Optuna search trials and 10 epochs training for deep learning models.

| | dataset_0 | dataset_1 | dataset_2 |
| :------------------ | :--------------------- | :--------------------- | :----------------------- |
| XGBoost | Too slow\* | Too slow\* | Too slow\* |
| CatBoost | Too slow\* | Too slow\* | Too slow\* |
| LightGBM | Too slow\* | Too slow\* | Too slow\* |
| Trompt | OOM | 0.889±0.063 (55428s) | 0.804±0.013 (23304s) |
| ResNet | 0.892±0.002 (417s) | **0.999±0.001 (396s)** | 0.915±0.001 (405s) |
| MLP | 0.770±0.001 (170s) | 0.549±0.000 (223s) | 0.895±0.001 (192s) |
| FTTransformerBucket | 0.897±0.004 (4436s) | 0.502±0.000 (1892s) | 0.888±0.009 (4414s) |
| ExcelFormer | OOM | TODO (code error) | **0.951±0.002 (10236s)** |
| FTTransformer | 0.872±0.005 (7004s) | 0.540±0.068 (3355s) | 0.908±0.004 (7514s) |
| TabNet | **0.912±0.004 (219s)** | 0.995±0.001 (301s) | 0.919±0.003 (187s) |
| TabTransformer | 0.843±0.003 (2810s) | 0.657±0.187 (2843s) | 0.854±0.001 (284s) |
| | dataset_0 | dataset_1 | dataset_2 |
| :------------------ | :--------------------- | :----------------------- | :----------------------- |
| XGBoost | Too slow\* | Too slow\* | Too slow\* |
| CatBoost | Too slow\* | Too slow\* | Too slow\* |
| LightGBM | Too slow\* | Too slow\* | Too slow\* |
| Trompt | OOM | 0.889±0.063 (55428s) | 0.804±0.013 (23304s) |
| ResNet | 0.892±0.002 (417s) | **0.999±0.001 (396s)** | 0.915±0.001 (405s) |
| MLP | 0.770±0.001 (170s) | 0.549±0.000 (223s) | 0.895±0.001 (192s) |
| FTTransformerBucket | 0.897±0.004 (4436s) | 0.502±0.000 (1892s) | 0.888±0.009 (4414s) |
| ExcelFormer | OOM | **0.999±0.001 (13952s)** | **0.951±0.002 (10236s)** |
| FTTransformer | 0.872±0.005 (7004s) | 0.540±0.068 (3355s) | 0.908±0.004 (7514s) |
| TabNet | **0.912±0.004 (219s)** | 0.995±0.001 (301s) | 0.919±0.003 (187s) |
| TabTransformer | 0.843±0.003 (2810s) | 0.657±0.187 (2843s) | 0.854±0.001 (284s) |

## Benchmarking pytorch-frame and pytorch-tabular

`pytorch_tabular_benchmark` compares the performance of `pytorch-frame` to `pytorch-tabular`. `pytorch-tabular` excels in providing an accessible approach for standard tabular tasks, allowing users to quickly implement and experiment with existing tabular learning models. It also excels with its training loop modifications and explainability feature. On the other hand, `ptroch-frame` offers enhanced flexibility for exploring and building novel tabular learning approaches while still providing access to established models. It distinguishes itself through support for a wider array of data types, more sophisticated encoding schemas, and streamlined integration with LLMs.
The following table shows the speed comparison of `pytorch-frame` to `pytorch-tabular` on implementations of `TabNet` and `FTTransformer`.

| Package | Model | Num iters/sec |
| :-------------- | :------------ | :------------ |
| PyTorch Tabular | TabNet | 41.7 |
| PyTorch Frame | TabNet | 45.0 |
| PyTorch Tabular | FTTransformer | 40.1 |
| PyTorch Frame | FTTransformer | 43.7 |
8 changes: 4 additions & 4 deletions benchmark/data_frame_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import os.path as osp
import time
from typing import Any, Dict, Optional, Tuple
from typing import Any, Optional

import numpy as np
import optuna
Expand Down Expand Up @@ -303,10 +303,10 @@ def test(


def train_and_eval_with_cfg(
model_cfg: Dict[str, Any],
train_cfg: Dict[str, Any],
model_cfg: dict[str, Any],
train_cfg: dict[str, Any],
trial: Optional[optuna.trial.Trial] = None,
) -> Tuple[float, float]:
) -> tuple[float, float]:
# Use model_cfg to set up training procedure
if args.model_type == 'FTTransformerBucket':
# Use LinearBucketEncoder instead
Expand Down
3 changes: 1 addition & 2 deletions benchmark/encoder/encoder_benchmark.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import time
from argparse import ArgumentParser
from contextlib import nullcontext
from typing import Dict

import torch
from line_profiler import profile
Expand Down Expand Up @@ -115,7 +114,7 @@
}


def make_stype_encoder_dict() -> Dict[stype, StypeEncoder]:
def make_stype_encoder_dict() -> dict[stype, StypeEncoder]:
stype_encoder_dict = {}
for stype_str, encoder_str in args.stype_kv:
encoder_kwargs = encoder_str2encoder_cls_kwargs[encoder_str]
Expand Down
Loading

0 comments on commit 5d82a1f

Please sign in to comment.