The U.S. Federal Reserve began raising the federal funds rate in March 2022. Since then, almost all asset classes have performed poorly while the correlation between fixed income assets and equities has increased, making fixed income securities ineffective in their traditional role as a hedging tool.
With the value of asset diversification having diminished at least temporarily, it has become increasingly crucial to have an objective and quantifiable understanding of the outlook of the Federal Open Market Committee (FOMC).
This is where machine learning (ML) and natural language processing (NLP) come in. We applied Loughran-McDonald Sentiment Word Lists and BERT and XLNet ML techniques for NLP statements to FOMC to see if they were anticipating changes in the fed funds rate, and then looked at whether our results had any correlation with stock market performance.
Loughran-McDonald Feeling Word Lists
Before calculating sentiment scores, we first built word clouds to visualize the frequency/prominence of particular words in FOMC statements.
Word Cloud: March 2017 FOMC Statement
Word Cloud: July 2019 FOMC Statement
Although the Fed increase the federal funds rate in March 2017 and decreases as of July 2019, the word clouds of the two matching statements look alike. This is because FOMC statements typically contain many non-sentimental words with little bearing on the FOMC’s outlook. Thus, the word clouds failed to distinguish signal from noise. But quantitative analyzes can offer some clarity.
Loughran-McDonald Sentiment Word Lists analyze 10-K documents, income call transcripts, and other texts by classifying words into the following categories: negative, positive, uncertain, contentious, strong modal, modal weak and constraining. We applied this technique to FOMC statements, designating words as positive/hawkish or negative/dovish, while filtering out less important text like dates, page numbers, voting members, and explanations of the implementation of monetary policy. We then calculated the sentiment scores using the following formula:
Sentiment Score = (Positive Words – Negative Words) / (Positive Words + Negative Words)
FOMC Statements: Loughran-McDonald Sentiment Scores
As shown in the previous chart, FOMC statements turned more positive/hawkish in March 2021 and peaked in July 2021. After softening for the next 12 months, sentiment surged again in July 2022. Although While these moves may be partly driven by the recovery from the COVID-19 pandemic, they also reflect the FOMC’s growing aggressiveness in the face of rising inflation over the past year or so.
But the large swings also point to an inherent flaw in the Loughran-McDonald analysis: Sentiment scores only assess words, not sentences. For example, in the sentence “Unemployment has fallen”, both words would be recorded as negative/dovish even though as a sentence the utterance indicates an improvement in the labor market, which most would interpret as positive/hawkish .
To solve this problem, we trained the BERT and XLNet models to analyze statements sentence by sentence.
BERT and XLNet
Bi-Directional Encoder Representations from Transformers, or BERT, is a language representation model that uses a bi-directional rather than a uni-directional encoder for better tuning. Indeed, with its bidirectional encoder, we find that BERT outperforms OpenAI GPT, which uses a unidirectional encoder.
XLNet, on the other hand, is a generalized autoregressive pretraining method that also includes a bidirectional encoder but no masked language modeling (MLM), which feeds BERT a sentence and optimizes the weights inside BERT to produce the same sentence on the other side. Before feeding BERT the input phrase, however, we hide a few tokens in MLM. XLNet avoids this, making it an improved version of BERT.
To train these two models, we divided the FOMC statements into training datasets, test datasets, and out-of-sample datasets. We extracted training and test datasets from February 2017 to December 2020 and out-of-sample datasets from June 2021 to July 2022. We then applied two different labeling techniques: manual and automatic. Using automatic labeling, we assigned the phrases a value of 1, 0, or none depending on whether they indicated an increase, decrease, or no change in the federal funds rate, respectively. Using manual labeling, we ranked sentences as 1, 0, or none depending on whether they were warmongering, dovish, or neutral, respectively.
We then ran the following formula to generate a sentiment score:
Sentiment Score = (Positive Sentences – Negative Sentences) / (Positive Sentences + Negative Sentences)
Performance of AI models
Predicted sentiment score (automatic labeling)
Predicted sentiment score (manual labeling)
The two charts above show that manual labeling better reflects the recent change in FOMC stance. Each statement includes hawkish (or dovish) phrases even though the FOMC ended up lowering (or raising) the fed funds rate. In this sense, sentence-by-sentence tagging trains these ML models well.
Since ML and AI models tend to be black boxes, how we interpret their results is extremely important. One approach is to apply Local Interpretable Model (LIME) independent explanations. These apply a simple model to explain a much more complex model. The two figures below show how XLNet (with manual labeling) interprets the sentences of the FOMC statements, reading the first sentence as positive/hawkish based on labor market strengthening and moderately expanding economic activities and the second sentence as negative/dovish since consumer prices fell and inflation was below 2%. The model’s judgment on economic activity and inflationary pressures seems appropriate.
LIME Results: FOMC Sentence for a Strong Economy
LIME Results: FOMC Weak Inflationary Pressure Sentence
By extracting sentences from statements and then gauging their sentiment, these techniques have given us a better understanding of the policy perspective of the FOMC and have the potential to make central bank communications easier to interpret and understand in the future.
But was there a connection between shifts in sentiment from FOMC statements and US stock market returns? The chart below plots the cumulative returns of the Dow Jones Industrial Average (DJIA) and the NASDAQ Composite (IXIC) as well as the FOMC sentiment scores. We studied correlation, tracking error, excess return, and excess volatility to detect regime shifts in equity returns, which are measured by the vertical axis.
Stock returns and sensitivity scores from FOMC statements
The results show that, as expected, our sentiment scores detect regime shifts, with stock market regime shifts and sudden shifts in the FOMC sentiment score occurring at roughly the same times. According to our analysis, the NASDAQ could be even more reactive to the FOMC sentiment score.
Taken as a whole, this review hints at the vast potential of machine learning techniques for the future of investment management. Of course, in the final analysis, how these techniques combine with human judgment will determine their ultimate value.
We would like to thank Yoshimasa Satoh, CFA, James Sullivan, CFA and Paul McCaffrey. Satoh organized and coordinated AI study groups as a moderator and reviewed and revised our report with thoughtful insights. Sullivan wrote the Python code that converts FOMC statements in PDF format to text and related excerpts and information. McCaffrey provided great support in finalizing this research report.
If you liked this article, don’t forget to subscribe to Enterprising investor.
All posts are the opinion of the author. As such, they should not be construed as investment advice, and the opinions expressed do not necessarily reflect the views of the CFA Institute or the author’s employer.
Image credit: ©Getty Images/AerialPerspective Works
Professional Learning for CFA Institute Members
CFA Institute members are empowered to self-determine and report Professional Learning (PL) credits earned, including content on Enterprising investor. Members can easily register credits using their online truck tracker.