Skip to content

Commit

Permalink
Update README.md (#4)
Browse files Browse the repository at this point in the history
* Update README.md

* Update ReadMe.md

* Update README.md

* Update data_prepare.py

* Update readme.md

* Update README.md

* Update readme.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update readme.md

* Update ReadMe.md

* Update data_preprcoss.py

* Update run_ssr_link.sh

* Update ReadMe.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* fix directory bug

* tiny fix for nasa

* revise install

* Update unit_testing.yml

---------

Co-authored-by: dalong.zdl <[email protected]>
  • Loading branch information
stevenHust and dalong.zdl authored Sep 6, 2023
1 parent 05e1274 commit f62b619
Show file tree
Hide file tree
Showing 16 changed files with 74 additions and 455 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/unit_testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ jobs:
docker run --net=host --rm -i -m 30000m -v ${GITHUB_WORKSPACE}:/graph_ml -w /graph_ml aglimage/agl:agl-ubuntu-gcc9.4.0-py3.8-cuda11.8-pytorch2.0.1-0825 /bin/bash -c 'cd agl/java && mvn -B package --file pom.xml'
- name: python cpp unit testing
if: always()
run: docker run --net=host --rm -i -m 30000m -v ${GITHUB_WORKSPACE}:/graph_ml -w /graph_ml aglimage/agl:agl-ubuntu-gcc9.4.0-py3.8-cuda11.8-pytorch2.0.1-0825 /bin/bash -c 'git config --global --add safe.directory /graph_ml && bash build.sh'
run: docker run --net=host --rm -i -m 30000m -v ${GITHUB_WORKSPACE}:/graph_ml -w /graph_ml aglimage/agl:agl-ubuntu-gcc9.4.0-py3.8-cuda11.8-pytorch2.0.1-0825 /bin/bash -c 'bash build.sh'
4 changes: 3 additions & 1 deletion agl/python/examples/drgst/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,13 @@ stage = 6
## Benchmark

* 数据下载:
https://github.com/tkipf/gcn/tree/master/gcn/data下载ind.citeseer.开头的数据文件,放在data_process/data/目录下
https://github.com/tkipf/gcn/tree/master/gcn/data 下载ind.citeseer.开头的数据文件,放在data_process/data/目录下
* 数据预处理与子图采样:
运行submit.sh进行数据预处理和spark采样,得到训练集测试集验证集
* 模型
python drgst_citeseer.py
* 效果
```
In stage 0
test loss:0.9647, test acc:0.7070
In stage 1
Expand All @@ -38,3 +39,4 @@ In stage 4
test loss:0.9230, test acc:0.7440
In stage 5
test loss:0.9430, test acc:0.7540
```
2 changes: 1 addition & 1 deletion agl/python/examples/geniepath_ppi/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
## 说明

* 数据下载:
https://github.com/sufeidechabei/PPI-Inductive/tree/master/ppi下载,放到data_process/ppi/目录下
https://github.com/sufeidechabei/PPI-Inductive/tree/master/ppi 下载,放到data_process/ppi/目录下
* 数据预处理与子图采样:
运行submit.sh进行数据预处理和spark采样,得到训练集测试集验证集
* 模型
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def load_data(prefix):
if node["test"] and node["val"]:
print("both test and val error:" + nid)
exit(0)
train_graph_id = np.load(prefix + "-train_graph_id.npy")
train_graph_id = np.load("ppi/train_graph_id.npy")
class_map = json.load(open(prefix + "-class_map.json"))
with open('ppi_label.csv', 'w') as outfile:
outfile.write('node_id,seed,label,train_flag\n')
Expand Down
5 changes: 3 additions & 2 deletions agl/python/examples/hegnn_acm/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@

### 数据

下载数据https://drive.google.com/drive/folders/1koV0rGhj-UXrEMOCZezK1tnwC6zb69uB?usp=sharing ,将node.csv,edge.csv,label.csv文件拷贝到data_process目录下
下载数据 https://drive.google.com/drive/folders/1koV0rGhj-UXrEMOCZezK1tnwC6zb69uB?usp=sharing ,将node.csv,edge.csv,label.csv文件拷贝到data_process目录下

### 数据预处理与子图采样:
运行submit.sh进行spark采样,得到训练集测试集验证集
### 效果
python model_hegnn.py

```
Epoch: 01, Loss: 0.6549, val_micro_f1: 0.3533, test_micro_f1: 0.3812, time_cost:10.1865
(Epoch: 01, best_val_micro_f1: 0.3533, best_test_micro_f1: 0.3812) <br>
Epoch: 02, Loss: 0.5937, val_micro_f1: 0.8100, test_micro_f1: 0.8640, time_cost:7.7812
Expand Down Expand Up @@ -218,3 +218,4 @@ Epoch: 99, Loss: 0.0003, val_micro_f1: 0.8767, test_micro_f1: 0.9068, time_cost:
Epoch: 100, Loss: 0.0003, val_micro_f1: 0.8767, test_micro_f1: 0.9068, time_cost:7.9536
(Epoch: 07, best_val_micro_f1: 0.8900, best_test_micro_f1: 0.9144) <br>
sucess
```
19 changes: 13 additions & 6 deletions agl/python/examples/kcan_movielens/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,17 @@

## 说明
> Warning: 模型部分实现和论文中略有不同,原文是知识图谱表征学习和kcan交替训练,这里只有kcan的训练。同时开源数据只有正边没有负边,所以负样本是随机采样的导致数据集中负样本和原论文不一样,因此效果并不能完全对齐论文。
由于link类算法的样本数量过多,只能在分布式模式(比如yarn)运行,为了方便不想搭建yarn集群的用户,我们提供了采样好的图样本下载地址为:,用户可以直接进行模型训练
### 数据下载:
https://drive.google.com/drive/folders/12_mU1jt7ntuWEMQ-bogF0cLQjFJijnab?usp=sharing下载数据文件,把图数据文件node_table.csv,link_table.csv,edge_table.csv放在data_process/目录下。
由于link模式的样本量巨大,用户需要搭建spark集群运行。对于无法搭建集群的用户,可以下载预先采样的子图数据part-subgraph_kcan_train_test.csv,放在data_process/output_graph_feature目录下
https://drive.google.com/drive/folders/12_mU1jt7ntuWEMQ-bogF0cLQjFJijnab?usp=sharing 下载数据文件,把图数据文件node_table.csv,link_table.csv,edge_table.csv放在data_process/目录下。

### 数据预处理
以movielens为例子
首先我们要把原始数据压缩成子图(pb string)的形式,使用如下data_process/submit.sh的命令

首先我们要把原始数据压缩成子图(pb string)的形式,使用如下data_process/submit.sh的命令。

由于link模式的样本量巨大,用户需要搭建spark集群运行。对于无法搭建集群的用户,可以从上面的链接中下载预先采样的子图数据part-subgraph_kcan_train_test.csv,放在data_process/output_graph_feature目录下

```
base=`dirname "$0"`
cd "$base"
Expand Down Expand Up @@ -50,7 +53,11 @@ python ../../run_spark.py \
- input_node_feature
- 包含node_id,node_feature两个字段

运行data_process/split_graph_features.py,将output_graph_feature目录下的子图根据train_flag划分为subgraph_kcan_movielens_train.txt和subgraph_kcan_movielens_test.txt文件给下游训练。
```
cd data_process
python split_graph_features.py
```
运行上面的脚本,将output_graph_feature目录下的子图根据train_flag划分为subgraph_kcan_movielens_train.txt和subgraph_kcan_movielens_test.txt文件给下游训练。

### 模型运行
```
Expand All @@ -62,7 +69,7 @@ python kcan_subgraph_adj.py
* kcan 未调参,100 epoch, AUC ~ 0.9 左右 (原论文 0.907)

* 效率 * kcan
```
```
Epoch: 01, Loss: 0.4570, auc: 0.8826, best_auc: 0.8826, train_time: 154.442953, val_time: 30.473222
Epoch: 02, Loss: 0.4234, auc: 0.8841, best_auc: 0.8841, train_time: 151.677146, val_time: 29.973186
Epoch: 03, Loss: 0.4215, auc: 0.8842, best_auc: 0.8842, train_time: 153.905838, val_time: 32.188418
Expand Down
Loading

0 comments on commit f62b619

Please sign in to comment.