Merge branch 'master' into yiweny-patch-1

pyg-team · Oct 14, 2024 · 5d82a1f · 5d82a1f
2 parents 8bdc5e9 + d1f63cd
commit 5d82a1f
Show file tree

Hide file tree

Showing 36 changed files with 468 additions and 137 deletions.
diff --git a/.github/actions/setup/action.yml b/.github/actions/setup/action.yml
@@ -3,7 +3,7 @@ name: Setup
 inputs:
   python-version:
     required: false
-    default: '3.8'
+    default: '3.9'
   torch-version:
     required: false
     default: '2.2.0'
@@ -16,7 +16,7 @@ runs:
 
   steps:
     - name: Set up Python ${{ inputs.python-version }}
-      uses: actions/setup-python@v4.3.0
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ inputs.python-version }}
         check-latest: true

diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml
@@ -23,7 +23,6 @@ jobs:
         uses: tj-actions/changed-files@v41
         with:
           files: |
-            docs/**
             examples/**
             README.md
             CHANGELOG.md

diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -15,7 +15,7 @@ jobs:
       - name: Set up Python
         uses: actions/setup-python@v5
         with:
-          python-version: 3.8
+          python-version: 3.9
       - name: Install dependencies
         run: |
           pip install -e '.[full,test]' -f https://download.pytorch.org/whl/cpu

diff --git a/.gitignore b/.gitignore
@@ -19,3 +19,4 @@ venv/*
 *.out
 data/**
 catboost_info/
+.pt_tmp/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -8,7 +8,7 @@ ci:
 
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.6.0
+    rev: v5.0.0
     hooks:
       - id: no-commit-to-branch
         name: No commits to master
@@ -38,7 +38,7 @@ repos:
     hooks:
       - id: pyupgrade
         name: Upgrade Python syntax
-        args: [--py38-plus]
+        args: [--py39-plus]
 
   - repo: https://github.com/PyCQA/autoflake
     rev: v2.3.1
@@ -67,21 +67,21 @@ repos:
         name: Sort imports
 
   - repo: https://github.com/PyCQA/flake8
-    rev: 7.1.0
+    rev: 7.1.1
     hooks:
       - id: flake8
         name: Check PEP8
         additional_dependencies: [Flake8-pyproject]
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.5.0
+    rev: v0.6.9
     hooks:
       - id: ruff
         name: Ruff formatting
         args: [--fix, --exit-non-zero-on-fix]
 
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.10.1
+    rev: v1.11.2
     hooks:
       - id: mypy
         name: Check types

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Added
 
+- Added a benchmark script to compare PyTorch Frame with PyTorch Tabular ([#398](https://github.com/pyg-team/pytorch-frame/pull/398), [#444](https://github.com/pyg-team/pytorch-frame/pull/444))
+- Added `is_floating_point` method to `MultiNestedTensor` and `MultiEmbeddingTensor` ([#445](https://github.com/pyg-team/pytorch-frame/pull/445))
 - Added support for inferring `stype.categorical` from boolean columns in `utils.infer_series_stype` ([#421](https://github.com/pyg-team/pytorch-frame/pull/421))
 
 ### Changed
@@ -17,8 +19,11 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Removed
 
+- Dropped support for Python 3.8 ([#462](https://github.com/pyg-team/pytorch-frame/pull/462))
+
 ### Fixed
 
+- Fixed size mismatch `RuntimeError` in `transforms.CatToNumTransform` ([#446](https://github.com/pyg-team/pytorch-frame/pull/446))
 - Removed CUDA synchronizations from `nn.LinearEmbeddingEncoder` ([#432](https://github.com/pyg-team/pytorch-frame/pull/432))
 - Removed CUDA synchronizations from N/A imputation logic in `nn.StypeEncoder` ([#433](https://github.com/pyg-team/pytorch-frame/pull/433), [#434](https://github.com/pyg-team/pytorch-frame/pull/434))
 

diff --git a/README.md b/README.md
@@ -245,7 +245,7 @@ The benchmark script for Hugging Face text encoders is in this [file](https://gi
 
 ## Installation
 
-PyTorch Frame is available for Python 3.8 to Python 3.11.
+PyTorch Frame is available for Python 3.9 to Python 3.11.
 
 ```
 pip install pytorch_frame

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -196,16 +196,28 @@ Experimental setting: 20 Optuna search trials for XGBoost, CatBoost and LightGBM
 
 Experimental setting: 20 Optuna search trials for XGBoost, CatBoost and LightGBM. 3 Optuna search trials and 10 epochs training for deep learning models.
 
-|                     | dataset_0              | dataset_1              | dataset_2                |
-| :------------------ | :--------------------- | :--------------------- | :----------------------- |
-| XGBoost             | Too slow\*             | Too slow\*             | Too slow\*               |
-| CatBoost            | Too slow\*             | Too slow\*             | Too slow\*               |
-| LightGBM            | Too slow\*             | Too slow\*             | Too slow\*               |
-| Trompt              | OOM                    | 0.889±0.063 (55428s)   | 0.804±0.013 (23304s)     |
-| ResNet              | 0.892±0.002 (417s)     | **0.999±0.001 (396s)** | 0.915±0.001 (405s)       |
-| MLP                 | 0.770±0.001 (170s)     | 0.549±0.000 (223s)     | 0.895±0.001 (192s)       |
-| FTTransformerBucket | 0.897±0.004 (4436s)    | 0.502±0.000 (1892s)    | 0.888±0.009 (4414s)      |
-| ExcelFormer         | OOM                    | TODO (code error)      | **0.951±0.002 (10236s)** |
-| FTTransformer       | 0.872±0.005 (7004s)    | 0.540±0.068 (3355s)    | 0.908±0.004 (7514s)      |
-| TabNet              | **0.912±0.004 (219s)** | 0.995±0.001 (301s)     | 0.919±0.003 (187s)       |
-| TabTransformer      | 0.843±0.003 (2810s)    | 0.657±0.187 (2843s)    | 0.854±0.001 (284s)       |
+|                     | dataset_0              | dataset_1                | dataset_2                |
+| :------------------ | :--------------------- | :----------------------- | :----------------------- |
+| XGBoost             | Too slow\*             | Too slow\*               | Too slow\*               |
+| CatBoost            | Too slow\*             | Too slow\*               | Too slow\*               |
+| LightGBM            | Too slow\*             | Too slow\*               | Too slow\*               |
+| Trompt              | OOM                    | 0.889±0.063 (55428s)     | 0.804±0.013 (23304s)     |
+| ResNet              | 0.892±0.002 (417s)     | **0.999±0.001 (396s)**   | 0.915±0.001 (405s)       |
+| MLP                 | 0.770±0.001 (170s)     | 0.549±0.000 (223s)       | 0.895±0.001 (192s)       |
+| FTTransformerBucket | 0.897±0.004 (4436s)    | 0.502±0.000 (1892s)      | 0.888±0.009 (4414s)      |
+| ExcelFormer         | OOM                    | **0.999±0.001 (13952s)** | **0.951±0.002 (10236s)** |
+| FTTransformer       | 0.872±0.005 (7004s)    | 0.540±0.068 (3355s)      | 0.908±0.004 (7514s)      |
+| TabNet              | **0.912±0.004 (219s)** | 0.995±0.001 (301s)       | 0.919±0.003 (187s)       |
+| TabTransformer      | 0.843±0.003 (2810s)    | 0.657±0.187 (2843s)      | 0.854±0.001 (284s)       |
+
+## Benchmarking pytorch-frame and pytorch-tabular
+
+`pytorch_tabular_benchmark` compares the performance of `pytorch-frame` to `pytorch-tabular`. `pytorch-tabular` excels in providing an accessible approach for standard tabular tasks, allowing users to quickly implement and experiment with existing tabular learning models. It also excels with its training loop modifications and explainability feature. On the other hand, `ptroch-frame` offers enhanced flexibility for exploring and building novel tabular learning approaches while still providing access to established models. It distinguishes itself through support for a wider array of data types, more sophisticated encoding schemas, and streamlined integration with LLMs.
+The following table shows the speed comparison of `pytorch-frame` to `pytorch-tabular` on implementations of `TabNet` and `FTTransformer`.
+
+| Package         | Model         | Num iters/sec |
+| :-------------- | :------------ | :------------ |
+| PyTorch Tabular | TabNet        | 41.7          |
+| PyTorch Frame   | TabNet        | 45.0          |
+| PyTorch Tabular | FTTransformer | 40.1          |
+| PyTorch Frame   | FTTransformer | 43.7          |
diff --git a/benchmark/data_frame_benchmark.py b/benchmark/data_frame_benchmark.py
@@ -3,7 +3,7 @@
 import os
 import os.path as osp
 import time
-from typing import Any, Dict, Optional, Tuple
+from typing import Any, Optional
 
 import numpy as np
 import optuna
@@ -303,10 +303,10 @@ def test(
 
 
 def train_and_eval_with_cfg(
-    model_cfg: Dict[str, Any],
-    train_cfg: Dict[str, Any],
+    model_cfg: dict[str, Any],
+    train_cfg: dict[str, Any],
     trial: Optional[optuna.trial.Trial] = None,
-) -> Tuple[float, float]:
+) -> tuple[float, float]:
     # Use model_cfg to set up training procedure
     if args.model_type == 'FTTransformerBucket':
         # Use LinearBucketEncoder instead

diff --git a/benchmark/encoder/encoder_benchmark.py b/benchmark/encoder/encoder_benchmark.py
@@ -1,7 +1,6 @@
 import time
 from argparse import ArgumentParser
 from contextlib import nullcontext
-from typing import Dict
 
 import torch
 from line_profiler import profile
@@ -115,7 +114,7 @@
 }
 
 
-def make_stype_encoder_dict() -> Dict[stype, StypeEncoder]:
+def make_stype_encoder_dict() -> dict[stype, StypeEncoder]:
     stype_encoder_dict = {}
     for stype_str, encoder_str in args.stype_kv:
         encoder_kwargs = encoder_str2encoder_cls_kwargs[encoder_str]
-Original file line number
+Diff line change
@@ Expand Up / @@ -19,3 +19,4 @@ venv/* @@
     *.out
     data/**
     catboost_info/
+    .pt_tmp/