Text Analytics in Finance

What follows is an overview of the general field my forthcoming thesis will be based on. I would love to start a dialogue with anyone interested in it, as it’s something I was unaware of myself as of 10 or 12 months ago. Without further ado, an overview:

Data is always being produced in the world around us. According to a statement from IBM in 2012, this figure was as high as 2.5 Exabytes of data per day. For context, that’s the same as approximately 2,500,000,000 GB of data. For context, that would be the rough equivalent of every American receiving an 8Gb flash drive on their doorstep every day for the rest of their life. According to census.gov, if we assume there are almost 7.5 Billion people on Earth, that means that 336 Mb are being generated, on average, by every single person on the planet every single day. For perspective, if that were all video, it would take someone almost a week to create a movie’s worth of video if we follow the average. If they were typing, that would mean they are creating word documents approximately this long in length 15 times per minute, assuming 16Kb file sizes and an average data generation of 3.99Kb of data per person on Earth per second.

Why is this relevant to a paper on financial markets, though? Because most people aren’t generating any data at all, at least of their own awareness or intentionally. The former refers to governments, corporations and individuals that are collecting data about the world around them and indirectly, about the individuals that caused those data sets to contain the information they contain. Whether it’s the number of times someone runs a red light at a certain intersection or the decrease in short-term debt in a company’s third quarter earnings report, humans are impacting the values of almost all data being created, even if it’s a company’s computer program or employee generating it. Furthermore, regarding the latter notion of data generated without an intention or awareness of it: look no further than social media. The data on sites like Facebook, Twitter, and others are more and more often now stored indexed, and analyzed for business purposes that most people never think about. Gone are the days of online ads being the same on a website for every visitor, with algorithms to predict the user’s behavior ruling the day. These capabilities have even extended to the financial markets, as starts-ups cater to hedge fund and asset managers with promises of being able to predict aggregate buy and sell moves in the markets by collecting information from thousands of tweets in real time. To date, much of the research into applying this technology to financial markets has been focused on social media-generated data sources.

In this paper, a correlation between a different type of unstructured data and financial returns is being examined. The technology that reads tweets is a group of software, databases, and tools known as natural language processing. Within that, a concept of applying weightings to words and phrases within documents to determine intent, known as sentiment analysis, provides the basis for predicting everything from a central bank rate hike to the likelihood of two friends changing status from single to dating (or vice-versa). In this paper, I’ll be examining the relationship between the spot prices of publicly traded common stock over a 3 to 5-year period and statements made in 10-Qs and quarterly earnings call transcripts. By examining only those public companies that are of a low trading volume and/or not covered by equity analysts at major institutions, one may be able to discover inefficiencies between the current market price and the true long-term value of a share of the firm’s common stock today. Equity analysts and others who cover the stock market make biased decisions daily. By quantifying and regimenting the most opinionated – yet formal – statements about smaller firms, one can begin to see the bias in the firm’s leadership more clearly. Whether it’s an intentional obfuscation of fact or a coincidental correlation between how efficient a CFO was last month and the number of times they said “erm” during the Q&A with the analysts, sentiment analysis and the necessary regressions stand to apply the work of big-name equity research firms to companies that once never had the manpower to do so.

If you’re interested in learning more about this field, or are already an expert and would love to discuss it with someone currently researching it, contact me and I’d love to discuss it!

1 Comment

Archives

PLA News

Text Analytics in Finance

1 Comment

Follow Us!

Archives

PLA News