Poor Trader Joe. After learning in Project #1 that he’ll never be able to afford a house by solely completing passive investments, he’s decided to try his hand in speculative trading! Joe is looking to buy when there's 'blood in the streets'by using Natural Language Processing (NLP's) to analyze sentiment and narrow down a list of candidate stocks. Then, Joe will run technical analysis to see if there are anomolies or outlier deviations from moving averages. Lastly, if the potential stock passes both of these filters, Joe will utilize a classification model to help him predict if the stock will go up or down the next day. By using this 'goldilocks' approach as a pre-trade checklist, Joe takes control of his destiny instead of relying on blind recommendatioon from gossip or news articles!
Joe's on the prowl for hot-of-the-press bearish news of pummeled stocks. Unfortunately, Joe finds traditional media too unreliable and always late to the party! Instead, Joe turns to Twitter. Reasoning that any new info would arrive much quicker than average news articles, Joe decides to look through the tweets of some big name traders. Joe builds a list of traders he respects, then begins his NLP analysis to narrow down a list of "beaten down" stocks. Once he narrows the list, he'll use technical analys and a custom decision tree to further identify potential performance of the stock. Maybe this way he'll be able to post his 'gainz' on r/WSB!
- NLP Analysis : First, Joe needs to find a nice and bloody stock! He'll utilize the Twitter API to create a custom dataframe of tweets containing negative sentiment.
- Technical Analysis : Next, Joe will perform Technical Analysis and compare the stock price vs. the 21/50/200 EMAs. He'll look for extreme outliers, generate signals, then compare the general perfomance of the signals 1 month after relative to SPY.
- Decision Tree Mode; : Lastly, what does the decision tree model say about this stock? Will the model forecast the next trading session to move up or down?
We utilized tthe Twitter API and Yahoo Finance.
This project required the following libraries: pandas, numpy ,tweepy, hvplot, graphviz, matplotlib, nltk, sklearn & yfinance.
Joe is nervous about choosing the right stocks to buy (afterall, Wall Street looks scary when you live with your mom!) The only thing Joe is sure of is that the traditional news outlets are too slow to give him an edge. Instead, Joe uses Twitter to find out what's going on in the markets. He seeks the advice and chatter of top traders to help him make sense of it all. After curating a handful of the most trusted and popular active traders, he builds a collection of their tweets and uses the NLTK library to derive their sentiments. For Joe's 'Buy-the-Dip' strategy, ultra negative sentiment represents a valued investment opportunity! Filtering through the data using NLTK is a much better use of his time (even if he did appreciate all the pictures of food!)
Success! He's found a stock with negative sentiment: $OLLI
Joe pulls 5 years of data of $OLLI and $SPY. He then plots the closing prices and the 21, 50, and 200 EMAs. He builds a dataframe comparing the current price vs. the three EMAs, standardizes the deltas, and looks for anomolies. He uses a 1.5 std.dev equivalent (or 13th percentile) to find the extreme outliers of %moves away from the EMAS. From here, he can see that the range of price moves away from the 21EMA.
He builds a dataframe, initiates a signal, then plots his findings.
In order to backtest the performance of his signal, Joe plots the average returns of his stock 1 trading month after the signal. Is the price higher or lower? How about %return relative to the SPY? Lastly, Joe wants to try his hand at linear regression. He's looking to see if there are any correlations between the "%away from the 21EMA" compared to the average return of the stock 20 days later. Unfortunatley, the r-squared value is incredibly low, and therefore his model is unreliable!Joe chooses to build a classification model to help him determine if the stock's price will go up or down during the next trading session. Joe uses 7 years of historical pricing and volume data to build technical indicator features and train his model. He then uses a PCA method to reduce the dimensionality of his features in an effort to improve the model's effectiveness
Joe was able to identify a few 'beat down' stocks from Twitter based on sentiment analysis. OLLI was considered the best option avaiilable (because it was the worst!), so Joe ran his pre-trade checklist BTFD system. His NLP analysis found the stock, then his TA analysis measured some EMA vs. price anomolies, and finally his decision tree model provided the green light to investing! As for the accuracy of his system, although his TA analysis gave relatively good returns, unfortunately his Decision Tree model needs fine tuning. The classification model only yielded a 53% accuracy, proving it's not very reliable. Joe needs to take a better look at his feature selection, engineering, and probably use more data to better train his model. You live you learn (and hopefully earn along the way!)