Sentimental analysis, along with supervised machine learning principle, were applied to the tweets to analyse a correlation between stock market movement and sentiments observed in the tweets. Word2vec and N-gram were utilised for analysing public sentiments in the tweets. introduced a system which collects previous financial stock connected tweets and used machine learning algorithms such as Support Vector Machine, achieving an accuracy of 0.8, and Naïve Bayes Bernoulli achieving an accuracy of 0.79 after categorising negative and positive sentiments. was able to achieve state-of-the-art on FiQA sentiment scoring and Financial PhraseBank, and the results were compared against ULMFit and ELMo for financial sentiment analysis. where FinBERT was introduced, which is a language model built out of BERT by pre-training it on a financial dataset and fine-tuning the same for sentimental analysis. The usage of financial stock Twitter data trained on a BERT model was conducted by Araci et al. study on BERT pre-training which was known to be expensive, significantly undertrained and used various datasets of various sizes and introduce RoBERTa to measure the significance of key hyperparameter and training data size that achieved state-of-the-art results on GLUE, RACE and SQuAD. , which talks about refining fine-tuning-based methods that have proved to obtain new state-of-the-art results on about 11 natural language processing tasks including GLUE (with 7.7% improvement) and SQuAD v2.0 (with 5.1 point improvement). To understand further about transformer models, we looked into BERT: Bidirectional Encoder Representations from Transformers initiated by Devlin et al. It was necessary to decide on the right set of models for our experiment with financial stock data, so we researched various models that already exist to understand the implementation, and the performance is compared. In this research, such labelling methods are experimented to understand which technique would generate the most optimal results. If the price change is between −0.5% to +0.5% the StockTwits messages are labelled neutral. (3) Percentage change three labels: like the previous method, the only difference is an addition of a third label neutral. (2) Percentage change two labels: with this technique StockTwits messages are labelled positive if the stock price increased more than 0.5% and negative if the stock price dropped less than 0.5%. During the research, the StockTwits data was labelled in three different techniques: (1) Binary Classification: a technique in which StockTwits messages are labelled positive simply if the stock price increased and negative of the stock price decreased. Based on this calculation of price change, the StockTwits dataset is labelled. Using this data, price change is calculated by comparing the Close price and Open price for the same day or previous day. For labelling the StockTwits dataset, historical data from Yahoo finance is extracted for the respective companies. In this research, the polarity of StockTwits data is dependent on the changes in a stock price, and this technique is incorporated by introducing a new data labelling method. This research’s main objective is to identify the correlation between the changes in stock prices and the StockTwits data. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. These datasets were labelled with three labelling techniques based on stock price changes. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments.
0 Comments
Leave a Reply. |