This project had the aim to apply different textual analytics techniques and algorithms in order to identify a) emerging industries and b) emerging technologies used in the aforementioned industries. The approach can be simplified as follows:
-
Preprocessing using regexpr
-
Benchmarking different preprocessing assumptions
-
Corpus creation and boundary testing
-
LDA training
-
Identifying emerging topics by emergence analysis
-
Verification of LDA approach with test/train KWIC analysis
-
Using bigrams to identify specific complicated industries
A result presentation (19th of Novermber, 2019) with more details, can be downloaded here: https://github.com/aleksejhoffaerber/EmergingTechnologies/blob/master/Pitch_LDA_Emerging%20Technologies.pdf
LDA training with 75 topics and their assignment to the individual companies by gamma, led to 7 emerging topics: