With FRED data and applying gt_table
BT is known for our tough training-style practice questions, and we love it, but I wanted to take it further and add more realism. I’ve been writing a fresh question sets on the regression topics; I’m always writing new questions! It takes more time, but in these regression sets I’ve been pulling actual datasets. For this question, I pulled inflation and unemployment rates from the FRED database (see below). In combination with actual code-based applications, these new question sets are much more realistic than typical exam-prep fare. Let’s face it: in the real world, quantitative tasks (e.g., regression) are performed in software, not with calculators. In a way, it’s our responsibility to help candidates get exposure to tools that are actually useful. We’ve always been the only FRM exam prep provider (EPP) who develops virtually the entire, broad array of quantitative risk (FRM) topics in spreadsheet workbooks (this is a massive construction of hundreds of spreadsheets that has taken me the better part of a decade to build). Now I’ve started to develop associated code-based applications for the ultimate in realistic study material.
I thought I’d try the gt package to see if I could improve the presentation of the regression output table. It’s fairly intuitive but a little unexpected because you pipe (“%>%”) additional format features.
This question (my first in the new regression set) reads as follows:
“20.16.1. Debra is an analyst at a governmental agency. Her boss asked her to investigate whether the Phillips curve applies during high-inflation regimes. To answer the question, Debra collected data from the FRED database at the St. Louis Fed (https://fred.stlouisfed.org/). The Phillips curve describes an inverse relationship between unemployment rates and inflation rates; https://en.wikipedia.org/wiki/Phillips_curve. Debra collected monthly data and she regressed the inflation rate against the unemployment rate (conditional on high-inflation regimes). Her independent variable is unemployment rate (FRED code: UNRATE) and here dependent variable is the Inflation rate (CPIAUCSL). The units are percentages not decimals; e.g., the dataset includes the month of January in 1982 when the unemployment rate was 8.9 and the inflation rate was 6.38. Her regression results are presented below.”
We load the packages
And then see if we might find a super-simple (ie, linear) Phillips curve (BT Question P1.T2.20.16.1)
startdate <- "1980-01-01"
enddate <- "2020-01-01"
# Phillips x = unemployment
unrate <- get_fred_series("UNRATE", "Unemployment", observation_start = startdate, observation_end = enddate)
inflation <- get_fred_series("CPIAUCSL", "inflation", observation_start = startdate, observation_end = enddate)
inflation_rate <- get_fred_series("PCETRIM12M159SFRBDAL", "Inflation_Rate", observation_start = startdate, observation_end = enddate)
df1 <- cbind(inflation, unrate, inflation_rate)
df1 <- df1[ , c(1,2, 4, 6)]
df2 <- df1 %>% filter(Inflation_Rate > 4.3)
df_fit <- lm(Inflation_Rate ~ Unemployment, data = df2)
summary(df_fit)
Call:
lm(formula = Inflation_Rate ~ Unemployment, data = df2)
Residuals:
Min 1Q Median 3Q Max
-1.27424 -0.22839 -0.03048 0.25065 0.83494
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.39276 0.47342 34.63 <2e-16 ***
Unemployment -1.10612 0.05577 -19.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.473 on 38 degrees of freedom
Multiple R-squared: 0.9119, Adjusted R-squared: 0.9096
F-statistic: 393.4 on 1 and 38 DF, p-value: < 2.2e-16
df_fit_tidy <- tidy(df_fit)
gt_table <- gt(df_fit_tidy)
# This is the standard gt table which is an improvement
gt_table
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 16.392765 | 0.47342061 | 34.62622 | 2.380425e-30 |
Unemployment | -1.106116 | 0.05576808 | -19.83421 | 1.206197e-21 |
# But here we'll utilize the pipe to specifically style the table
gt_table <-
gt_table %>%
tab_options(
table.font.size = 14
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_body()
) %>%
tab_header(
title = "Inflation Rate (CPIAUCSL) regressed against Unemployment Rate (UNRATE)",
subtitle = md("1980 to 2020 Monthly but conditioned on high inflation regimes")
) %>%
tab_source_note(
source_note = md("Source: FRED at https://fred.stlouisfed.org/")
) %>% cols_label(
term = "Coefficient",
estimate = "Estimate",
std.error = "Std Error",
statistic = "t-stat",
p.value = "p value"
) %>% fmt_number(
columns = vars(estimate, std.error, statistic),
decimals = 3
) %>% fmt_scientific(
columns = vars(p.value),
) %>%
tab_options(
heading.title.font.size = 14,
heading.subtitle.font.size = 12
) %>%
tab_footnote(
footnote = "Filtered on months with inflation >4.3% deliberately to generate regression results",
locations = cells_title("subtitle")
)
gt_table
Inflation Rate (CPIAUCSL) regressed against Unemployment Rate (UNRATE) | ||||
---|---|---|---|---|
1980 to 2020 Monthly but conditioned on high inflation regimes1 | ||||
Coefficient | Estimate | Std Error | t-stat | p value |
(Intercept) | 16.393 | 0.473 | 34.626 | 2.38 × 10−30 |
Unemployment | −1.106 | 0.056 | −19.834 | 1.21 × 10−21 |
Source: FRED at https://fred.stlouisfed.org/ | ||||
1 Filtered on months with inflation >4.3% deliberately to generate regression results |
The question does not utilize a scatterplot but here it is anyway
df2 %>% ggplot(aes(Unemployment, Inflation_Rate)) +
geom_point() +
geom_smooth(method = "lm")