the bigges difficulty lies in the dataset is so BIG(512G), we narrow down the number of classes to fit the model. listed here
As the function is multi classification, therefore we need to change our metric to F2 loss function. and change the loss error from softmax to sigmoid.