The Box-Pierce statistic is a simplified version of the Ljung-Box statistic; both are joint tests of autocorrelation
20.24.2. Elizabeth is an economist tasked with modeling quarterly gross domestic product (GDP) in the United States. She starts with the plots displayed below. The raw data is displayed in the upper; she observes this GDP trend is obviously not stationary (why?). She then performs a typical transformation on the raw data: she takes the difference of the natural log of the quarterly GPD values. This plot is displayed in the lower panel. Because LN[GDP(t)] - LN(GDP(t-1)] = LN[GDP(t)/GDP(t-1)], this lower panel plots continuously compounded returns (aka, monthly log returns). First differencing the log returns occasionally renders the non-stationary trend into a stationary process.
library(tidyverse)
library(scales)
library(forecast)
library(tseries)
library(ggthemes)
library(ggfortify)
library(fpp2)
library(gt)
library(astsa)
library(patchwork)
# library(gridExtra)
gdp_log <- diff(log(gdp))
# ts.plot(gdp)
# ts.plot(gdp_log)
p1_gdp <- autoplot(gdp, size = 1.3) + labs(
  title = "US GDP, Quarterly (billion) 1947 to 2018"
) + theme_classic() + theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  axis.text.x = element_blank(),
  axis.text.y = element_text(size = 11, face = "bold")
) + scale_y_continuous(labels = dollar_format())
  
p2_gdp_log <- autoplot(gdp_log, size = 1.3) + labs(
  title = bquote("LN("~GDP[t]~ ") - LN("~GDP[t-1]~")")
) + theme_classic() + theme(
  axis.title.y = element_blank(),
  axis.text.x = element_text(size = 11, face = "bold"),
  axis.text.y = element_text(size = 11, face = "bold")
) + scale_y_continuous(labels = label_percent(accuracy = .1))
patchwork <- p1_gdp / p2_gdp_log
patchwork + plot_annotation(
  caption = "Source: https://tradingeconomics.com/united-states/gdp (via astsa package)"
  )

(on to the Box-Pierce…)
She then fits an autoregressive, AR(1), and moving average, MA(1), model to the log return series. This gives her three models: the log return series (called “diff_logs”), an AR(1) model, and an MA(1) model. She conducts a Box-Pierce test on each of these models; the test of the AR(1) and MA(1) is a test of the residuals. She selects two lags for the test, h = 10 and h = 20. The results of her Box-Pierce test are displayed below.
… Box-Pierce gt table (below) here …
If we assume her desired confidence level is 95.0%, which of the following statements is a TRUE statement with respect to an interpretation of her Box-Pierce test of the three models?
# install.packages("kableExtra")
# install.packages("gapminder")
ar1_gdp_log <- sarima(gdp_log, p = 1, d = 0, q = 0)
initial  value -4.673186 
iter   2 value -4.742918
iter   3 value -4.742921
iter   4 value -4.742923
iter   5 value -4.742925
iter   6 value -4.742925
iter   6 value -4.742925
final  value -4.742925 
converged
initial  value -4.742229 
iter   2 value -4.742234
iter   3 value -4.742245
iter   3 value -4.742245
iter   3 value -4.742245
final  value -4.742245 
converged 
ma1_gdp_log <- sarima(gdp_log, p = 0, d = 0, q = 1)
initial  value -4.672758 
iter   2 value -4.716609
iter   3 value -4.723220
iter   4 value -4.723481
iter   5 value -4.723483
iter   5 value -4.723483
iter   5 value -4.723483
final  value -4.723483 
converged
initial  value -4.723444 
iter   1 value -4.723444
final  value -4.723444 
converged 
ma2_gdp_log <- sarima(gdp_log, p = 0, d = 0, q = 2)
initial  value -4.672758 
iter   2 value -4.749239
iter   3 value -4.750696
iter   4 value -4.750723
iter   5 value -4.750724
iter   6 value -4.750725
iter   7 value -4.750725
iter   7 value -4.750725
iter   7 value -4.750725
final  value -4.750725 
converged
initial  value -4.751078 
iter   2 value -4.751080
iter   3 value -4.751080
iter   4 value -4.751081
iter   5 value -4.751081
iter   5 value -4.751081
iter   5 value -4.751081
final  value -4.751081 
converged 
h_values <- c(10, 20)
# diff of logs
model = "diff_logs"
results_log_list <- h_values %>% map(~Box.test(gdp_log, lag = .))
log_statistic <- results_log_list %>% map_dbl("statistic")
log_p.value <- results_log_list %>% map_dbl("p.value")
log_cols <- data.frame(cbind(h_values, log_statistic, log_p.value))
log_all <- cbind(model, log_cols)
log_all <- log_all %>% rename(
  'h (lags)' = h_values,
  statistic = log_statistic,
  'p-value' = log_p.value
)
# AR(1)
model = "AR(1)"
results_ar1_list <- h_values %>% map(~Box.test(ar1_gdp_log$fit$residuals, lag = .))
ar1_statistic <- results_ar1_list %>% map_dbl("statistic")
ar1_p.value <- results_ar1_list %>% map_dbl("p.value")
ar1_cols <- data.frame(cbind(h_values, ar1_statistic, ar1_p.value))
ar1_all <- cbind(model, ar1_cols)
ar1_all <- ar1_all %>% rename(
  'h (lags)' = h_values,
  statistic = ar1_statistic,
  'p-value' = ar1_p.value
)
# MA(1)
model = "MA(1)"
results_ma1_list <- h_values %>% map(~Box.test(ma1_gdp_log$fit$residuals, lag = .))
ma1_statistic <- results_ma1_list %>% map_dbl("statistic")
ma1_p.value <- results_ma1_list %>% map_dbl("p.value")
ma1_cols <- data.frame(cbind(h_values, ma1_statistic, ma1_p.value))
ma1_all <- cbind(model, ma1_cols)
ma1_all <- ma1_all %>% rename(
  'h (lags)' = h_values,
  statistic = ma1_statistic,
  'p-value' = ma1_p.value
)
models_all <- rbind(log_all, ar1_all, ma1_all)
models_gt <- gt(models_all)
models_gt <- 
  models_gt %>% 
  tab_options(
    table.font.size = 14
  ) %>% tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_body()
  ) %>% tab_style(
    style = cell_text(color = "cadetblue"),
    locations = cells_column_labels(
      columns = vars(model, 'h (lags)', statistic, 'p-value')
    )
  ) %>% tab_header(
    title = md("**Box-Pierce test statistics and p-values**"),
    subtitle = "AR(1) and MA(1) at lags of h = 10 and 20"
  ) %>% fmt_number(
    columns = vars(statistic, 'p-value'),
    decimals = 4
  ) %>% tab_source_note(
    source_note = md("Note: diff_logs = LN[GDP(t)] - LN[GPD(t-1)]")
  ) %>% tab_source_note(
    source_note = md("AR(1) and MA(1) are tests of residuals")
  ) %>% cols_width(
    vars(model, 'h (lags)') ~ px(80),
    vars(statistic, 'p-value') ~ px(90)
  ) %>% cols_label (
    model = md("**model**"),
    'h (lags)' = md("**h (lags)**"),
    statistic = md("**test stat**"),
    'p-value' = md("**p-value**")
  ) %>% cols_align(
    align = "center",
    columns = vars('h (lags)')
  ) %>% tab_options(
    heading.title.font.size = 16,
    heading.subtitle.font.size = 14
  )
models_gt
| Box-Pierce test statistics and p-values | |||
|---|---|---|---|
| AR(1) and MA(1) at lags of h = 10 and 20 | |||
| model | h (lags) | test stat | p-value | 
| diff_logs | 10 | 62.6228 | 0.0000 | 
| diff_logs | 20 | 80.1439 | 0.0000 | 
| AR(1) | 10 | 15.5465 | 0.1134 | 
| AR(1) | 20 | 30.3091 | 0.0650 | 
| MA(1) | 10 | 24.3689 | 0.0067 | 
| MA(1) | 20 | 39.6618 | 0.0055 | 
| Note: diff_logs = LN[GDP(t)] - LN[GPD(t-1)] | |||
| AR(1) and MA(1) are tests of residuals | |||