Skip to content

Latest commit

 

History

History
109 lines (87 loc) · 12 KB

File metadata and controls

109 lines (87 loc) · 12 KB

Domain Adaptation on Amazon Reviews (four proudcts) data

Resources of domain adaptation papers on sentiment analysis that have used Amazon reviews

Dataset

The Multi-Domain Sentiment Dataset was used in many domain adapation papers for sentiment analysis task. It was first used in Blitzer et al, (2007). It contains more than 340, 000 reviews from 25 different types of products from Amazon.com (Chen et al.2012). Some domains (books and dvds) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed. A subset of this dataset containing four different product types: books, DVDs, electronics and kitchen appliances was used by Blitzer et al. (2007), which contains reviews of four types of products: books, DVDs, electronics, and kitchen appliances. And reviews with rating > 3 were labeled positive, those with rating < 3 were labeled negative. Each domain (product type) consists of 2, 000 labeled inputs and approximately 4, 000 unlabeled ones (varying slightly between domains) and the two classes are exactly balanced. Many works follow that convention, only experiment on this smaller set with its more manageable twelve domain adaptation tasks or multi-source domain adaptation where one domain is used as target all the others used as sources domains. However, the representations in different papers, such as how many features kept in bag-of-word representations, are different. Follow the early deep learning approach Glorot et al.(2011) SDA paper, the representations in Chen et al. 2012 mSDA paper was used in many works, where each reviews are preprocessed as a feature vector of unigram and bigram and you can choose to use the top 5000 most frequent features or use all the features.

Here is a brief list of the papers (to be continued) that have used this dataset along with results reported and also their implementation if there are any.

Single source domain adaptation

  • SCL: Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association of Computational Linguistics [ACL2007] SCL paper reported accuracies

  • SFA: Cross-domain sentiment classification via spectral feature alignment [WWW10]

  • MCT: Multi-domain Adaptation for Sentiment Classification: using Multiple Classifier Combining Methods [NLPKE,IEEE,2008]

  • SDA: Domain Adaptation for Large-Scale Sentiment Classification:A Deep Learning Approach [ICML2011]

  • mSDA: Marginalized Denoising Autoencoders for Domain Adaptatio [ICML2012] [Python] [Matlab] [Data]

  • BTDNN: Bi-Transferring Deep Neural Networks for Domain Adaptation [ACL2016]

    Including results with another method TLDA from [Supervised Representation Learning: Transfer Learning with Deep Autoencoders][IJCAI15] which evaluated on ImageNet dataset.

  • DANN: Domain-Adversarial Training of Neural Networks. [Journal of Machine Learning Research 2016] [Python for Reveiws]

    Yaroslav Ganin and Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation [ICML15][project page(code)] - evaluate on images dataset (Office, Webcam, Amazon).

    Ajakan et al., 2014, Domain-Adversarial Neural Networks [NIPS 2014workshop] - evaluate on Amazon reviewers dataset

    NOTE: The authors came up similar idea then published the journal paper together as said by Ganin and Victor(2015): "a very similar idea to ours has been developed in parallel and independently for shallow architecture (with a single hidden layer) in (Ajakan et al., 2014). Their system is evaluated on a natural language task (sentiment analysis). "

  • CORAL: Return of Frustratingly Easy Domain Adaptation. [AAAI16] [Matlab official]

  • CORAL+mSDA: Domain Adaptation for Sentiment Analysis link

  • AsyTri-training:Asymmetric Tri-training for Unsupervised Domain Adaptation [ICML2017] [python code] Saito et al.2017

  • DAS: Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification[EMNLP2018][code] He et al 2018

  • AMN and HATN:End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification [IJCAI17] Hierarchical Attention Transfer Network for Cross-Domain Sentiment Classification [AAAI-18] [Tensorflow]

  • AE-SCL and PBLM: Neural Structural Correspondence Learning for Domain Adaptation [CoNLL 2017] [Python] and also [SCL] implemented by the authors. Pivot Based Language Modeling for Improved Neural Domain Adaptation." Yftah Ziser and Roi Reichart [[http://www.aclweb.org/anthology/N18-1112]] [Tensorflow]

  • BERT-DAAT: Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis [ACL2020]

  • CFd (pre-tained LM+feature self-distillation+self-training):Feature Adaptation of Pre-Trained Language Models across Languages and Domains for Text Classification [EMNLP2020]

    Note: This paper experiments with cross domain and cross language adaptation, same groups of authors related to DAS paper. The results for this Amazon benchmark dataset is in Apendix B. The Authors' another related cross language adaptation paper

    • XML-UFD: Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [IJCAI2020]
  • ****: PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models [Transactions of the Association for Computational Linguistics Volume 8, 2020 p.504-521]

Multi Source DA

  • Mansour et al.(2009): Domain Adaptation with Multiple Sources [NIPS2009]

  • SST: Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification [ACL-HLT2011]

  • SDAMS: Sentiment Domain Adaptation with Multiple Sources [ACL16]

    Include results with DAM:Domain adaptation from multiple sources via auxiliary classifiers [ICML09], originally evaluated on video data. Also compared with method in: Multi-source domain adaptation and its application to early detection of fatigue [KDD2011]

  • MDAN: Adversarial Multiple Source Domain Adaptation [NIPS2018] [Pytorch] [Data]

  • MoE: Multi-Source Domain Adaptation with Mixture of Experts [EMNLP2018] [Pytorch] [Data]

  • MAN: Multinomial Adversarial Networks for Multi-Domain Text Classification [NAACL2018] [Pytorch] [tgzfromACL]

  • DACL: Dual Adversarial Co-Learning for Multi-Domain Text Classification [arxiv2019]

  • MDANet: Learning Multi-Domain Adversarial Neural Networks for Text Classification [IEEE Acess2019]

  • DSR-at: Learning Domain Representation for Multi-Domain Sentiment Classification[NAACL18]

  • ****:Multi-Domain Sentiment Classification Based on Domain-Aware Embedding and Attention [IJCAI19] NOTE: This paper evaluates on another Amazon dataset(Liu et al, 2017), but compared with for exampel DSR-at.

  • ****: Transformer Based Multi-Source Domain Adaptation [EMNLP2020][Pytorch]

  • ****:

    NOTE: There are a few other domain adaptation papers on EMNLP2020 seems interesting.

    Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models [EMNLP2020] -- This paper evaluate on NER.

    • Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing. Xilun Chen, Asish Ghoshal, Yashar Mehdad, Luke Zettlemoyer and Sonal Gupta.
    • End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems. Siamak Shakeri, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Feng Nan, Zhiguo Wang, Ramesh Nallapati and Bing Xiang.
    • Unified Feature and Instance Based Domain Adaptation for Aspect-Based Sentiment Analysis. Chenggong Gong, Jianfei Yu and Rui Xia.
    • Multi-Stage Pre-training for Low-Resource Domain Adaptation. Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avi Sil and Todd Ward.
    • Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging. Semih Yavuz, Kazuma Hashimoto, Wenhao Liu, Nitish Shirish Keskar, Richard Socher and Caiming Xiong.
  • ****: Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits[AAAI2020]

    • Note: this paper use the dataset contains this Amazon product reivew dataset

Survey Papers

-Amazon Review results for multiple methods: A Comprehensive Survey on Transfer Learning [Proceedings of the IEEE 07 July 2020] -Mainly computer vision: A Survey of Unsupervised Deep Domain Adaptation [ACM, TIST]

Other Resources