-
Notifications
You must be signed in to change notification settings - Fork 494
xDeepFM
潜心 edited this page Oct 8, 2020
·
3 revisions
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
创新:CIN层是关键!!!
原文笔记: https://mp.weixin.qq.com/s/TohOmVpQzNlA3vXv0gpobg
采用Criteo数据集进行测试。数据集的处理见utils
文件,主要分为:
- 考虑到Criteo文件过大,因此可以通过
read_part
和sample_sum
读取部分数据进行测试; - 对缺失数据进行填充;
- 对密集数据
I1-I13
进行归一化处理,对稀疏数据C1-C26
进行重新编码LabelEncoder
; - 整理得到
feature_columns
; - 切分数据集,最后返回
feature_columns, (train_X, train_y), (test_X, test_y)
;
class xDeepFM(keras.Model):
def __init__(self, feature_columns, hidden_units, cin_size, dnn_dropout=0, dnn_activation='relu',
embed_reg=1e-5, cin_reg=1e-5):
"""
xDeepFM
:param feature_columns: A list. a list containing dense and sparse column feature information.
:param hidden_units: A list. a list of dnn hidden units.
:param cin_size: A list. a list of the number of CIN layers.
:param dnn_dropout: A scalar. dropout of dnn.
:param dnn_activation: A string. activation function of dnn.
:param embed_reg: A scalar. the regularizer of embedding.
:param cin_reg: A scalar. the regularizer of cin.
"""
- file:Criteo文件;
- read_part:是否读取部分数据,
True
; - sample_num:读取部分时,样本数量,
5000000
; - test_size:测试集比例,
0.2
; - embed_dim:Embedding维度,
8
; - dnn_dropout:Dropout,
0.5
; - hidden_unit:DNN的隐藏单元,
[256, 128, 64]
; - cin_size:CIN尺度,
(128, 128)
; - learning_rate:学习率,
0.001
; - batch_size:
4096
; - epoch:
10
;
采用Criteo数据集中前500w
条数据,最终测试集的结果为:AUC:0.738484