Skip to article frontmatterSkip to article content

4. Multiple Regression Analysis: Inference

This notebook delves into the crucial aspect of inference in the context of multiple regression analysis. Building upon the concepts of estimation from previous chapters, we will now focus on drawing conclusions about the population parameters based on our sample data. This involves hypothesis testing and constructing confidence intervals, allowing us to assess the statistical significance and practical importance of our regression results.

import numpy as np
import statsmodels.formula.api as smf
import wooldridge as wool
from scipy import stats

4.1 The tt Test

The tt test is a fundamental tool for hypothesis testing about individual regression coefficients in multiple regression models. It allows us to formally examine whether a specific independent variable has a statistically significant effect on the dependent variable, holding other factors constant.

4.1.1 General Setup

In a multiple regression model, we are often interested in testing hypotheses about a single population parameter, say βj\beta_j. We might want to test if βj\beta_j is equal to some specific value, aja_j. The null hypothesis (H0H_0) typically represents a statement of no effect or a specific hypothesized value, while the alternative hypothesis (H1H_1) represents what we are trying to find evidence for.

The general form of the null and alternative hypotheses for a tt test is:

H0:βj=ajH_0: \beta_j = a_j
H1:βjajorH1:βj>ajorH1:βj<ajH_1: \beta_j \neq a_j \quad \text{or} \quad H_1:\beta_j > a_j \quad \text{or} \quad H_1:\beta_j < a_j
  • H0:βj=ajH_0: \beta_j = a_j: This is the null hypothesis, stating that the population coefficient βj\beta_j is equal to a specific value aja_j. Often, aj=0a_j = 0, implying no effect of the jthj^{th} independent variable on the dependent variable, ceteris paribus.
  • H1:βjajH_1: \beta_j \neq a_j: This is a two-sided alternative hypothesis, stating that βj\beta_j is different from aja_j. We reject H0H_0 if βj\beta_j is either significantly greater or significantly less than aja_j.
  • H1:βj>ajH_1:\beta_j > a_j: This is a one-sided alternative hypothesis, stating that βj\beta_j is greater than aja_j. We reject H0H_0 only if βj\beta_j is significantly greater than aja_j.
  • H1:βj<ajH_1:\beta_j < a_j: This is another one-sided alternative hypothesis, stating that βj\beta_j is less than aja_j. We reject H0H_0 only if βj\beta_j is significantly less than aja_j.

To test the null hypothesis, we use the tt statistic:

t=β^jajse(β^j)t = \frac{\hat{\beta}_j - a_j}{se(\hat{\beta}_j)}
  • β^j\hat{\beta}_j: This is the estimated coefficient for the jthj^{th} independent variable from our regression.
  • aja_j: This is the value of βj\beta_j under the null hypothesis (from H0:βj=ajH_0: \beta_j = a_j).
  • se(β^j)se(\hat{\beta}_j): This is the standard error of the estimated coefficient β^j\hat{\beta}_j, which measures the precision of our estimate.

Under the null hypothesis and under the CLM assumptions, this tt statistic follows a tt distribution with nk1n-k-1 degrees of freedom, where nn is the sample size and kk is the number of independent variables in the model.

4.1.2 Standard Case

The most common hypothesis test is to check if a particular independent variable has no effect on the dependent variable in the population, which corresponds to testing if the coefficient is zero. In this standard case, we set aj=0a_j = 0.

H0:βj=0,H1:βj0H_0: \beta_j = 0, \qquad H_1: \beta_j \neq 0

The tt statistic simplifies to:

tβ^j=β^jse(β^j)t_{\hat{\beta}_j} = \frac{\hat{\beta}_j}{se(\hat{\beta}_j)}

To decide whether to reject the null hypothesis, we compare the absolute value of the calculated tt statistic, tβ^j|t_{\hat{\beta}_j}|, to a critical value (cc) from the tt distribution, or we examine the p-value (pβ^jp_{\hat{\beta}_j}).

Rejection Rule using Critical Value:

reject H0 if tβ^j>c\text{reject } H_0 \text{ if } |t_{\hat{\beta}_j}| > c
  • cc: The critical value is obtained from the tt distribution with nk1n-k-1 degrees of freedom for a chosen significance level (α). For a two-sided test at a significance level α, we typically use c=tnk1,1α/2c = t_{n-k-1, 1-\alpha/2}, which is the (1α/2)(1-\alpha/2) quantile of the tnk1t_{n-k-1} distribution. Common significance levels are α=0.05\alpha = 0.05 (5%) and α=0.01\alpha = 0.01 (1%).

Rejection Rule using p-value:

pβ^j=2Ftnk1(tβ^j)p_{\hat{\beta}_j} = 2 \cdot F_{t_{n-k-1}}(-|t_{\hat{\beta}_j}|)
reject H0 if pβ^j<α\text{reject } H_0 \text{ if } p_{\hat{\beta}_j} < \alpha
  • pβ^jp_{\hat{\beta}_j}: The p-value is the probability of observing a tt statistic as extreme as, or more extreme than, the one calculated from our sample, assuming the null hypothesis is true. It’s a measure of the evidence against the null hypothesis. A small p-value indicates strong evidence against H0H_0.
  • Ftnk1F_{t_{n-k-1}}: This denotes the cumulative distribution function (CDF) of the tt distribution with nk1n-k-1 degrees of freedom. The formula calculates the area in both tails of the tt distribution beyond tβ^j|t_{\hat{\beta}_j}|, hence the factor of 2 for a two-sided test.

In summary: We reject the null hypothesis if the absolute value of the tt statistic is large enough (greater than the critical value) or if the p-value is small enough (less than the significance level α). Both methods lead to the same conclusion.

Example 4.3: Determinants of College GPA

Let’s consider an example investigating the factors influencing college GPA (colGPA). We hypothesize that high school GPA (hsGPA), ACT score (ACT), and number of skipped classes (skipped) are determinants of college GPA. The model is specified as:

colGPA=β0+β1hsGPA+β2ACT+β3skipped+u\text{colGPA} = \beta_0 + \beta_1 \text{hsGPA} + \beta_2 \text{ACT} + \beta_3 \text{skipped} + u

We will perform hypothesis tests on the coefficients β1\beta_1, β2\beta_2, and β3\beta_3 to see which of these variables are statistically significant predictors of college GPA. We will use the standard null hypothesis H0:βj=0H_0: \beta_j = 0 for each variable.

# CV for alpha=5% and 1% using the t distribution with 137 d.f.:
alpha = np.array([0.05, 0.01])
cv_t = stats.t.ppf(1 - alpha / 2, 137)  # Two-sided critical values
print(
    f"Critical values from t-distribution (df=137):\nFor alpha={alpha[0] * 100}%: +/-{cv_t[0]:.3f}\nFor alpha={alpha[1] * 100}%: +/-{cv_t[1]:.3f}\n",
)
Critical values from t-distribution (df=137):
For alpha=5.0%: +/-1.977
For alpha=1.0%: +/-2.612

This code calculates the critical values from the tt distribution for significance levels of 5% and 1% with 137 degrees of freedom (which we will see is approximately the degrees of freedom in our regression). These are the thresholds against which we’ll compare our calculated tt-statistics.

# CV for alpha=5% and 1% using the normal approximation:
cv_n = stats.norm.ppf(1 - alpha / 2)  # Two-sided critical values
print(
    f"Critical values from standard normal distribution:\nFor alpha={alpha[0] * 100}%: +/-{cv_n[0]:.3f}\nFor alpha={alpha[1] * 100}%: +/-{cv_n[1]:.3f}\n",
)
Critical values from standard normal distribution:
For alpha=5.0%: +/-1.960
For alpha=1.0%: +/-2.576

For large degrees of freedom, the tt distribution approaches the standard normal distribution. This code shows the critical values from the standard normal distribution for comparison. Notice that for these common significance levels, the critical values are quite similar for the tt and normal distributions when the degrees of freedom are reasonably large (like 137).

gpa1 = wool.data("gpa1")

# store and display results:
reg = smf.ols(formula="colGPA ~ hsGPA + ACT + skipped", data=gpa1)
results = reg.fit()
print(f"Regression summary:\n{results.summary()}\n")
Regression summary:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colGPA   R-squared:                       0.234
Model:                            OLS   Adj. R-squared:                  0.217
Method:                 Least Squares   F-statistic:                     13.92
Date:                Tue, 18 Feb 2025   Prob (F-statistic):           5.65e-08
Time:                        10:44:13   Log-Likelihood:                -41.501
No. Observations:                 141   AIC:                             91.00
Df Residuals:                     137   BIC:                             102.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.3896      0.332      4.191      0.000       0.734       2.045
hsGPA          0.4118      0.094      4.396      0.000       0.227       0.597
ACT            0.0147      0.011      1.393      0.166      -0.006       0.036
skipped       -0.0831      0.026     -3.197      0.002      -0.135      -0.032
==============================================================================
Omnibus:                        1.917   Durbin-Watson:                   1.881
Prob(Omnibus):                  0.383   Jarque-Bera (JB):                1.636
Skew:                           0.125   Prob(JB):                        0.441
Kurtosis:                       2.535   Cond. No.                         300.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This code runs the OLS regression of colGPA on hsGPA, ACT, and skipped using the gpa1 dataset from the wooldridge package. The results.summary() provides a comprehensive output of the regression results, including estimated coefficients, standard errors, t-statistics, p-values, and other relevant statistics.

# manually confirm the formulas, i.e. extract coefficients and SE:
b = results.params
se = results.bse

# reproduce t statistic:
tstat = b / se
print(f"Calculated t-statistics:\n{tstat}\n")

# reproduce p value:
pval = 2 * stats.t.cdf(
    -abs(tstat),
    results.df_resid,
)  # df_resid is the degrees of freedom
print(f"Calculated p-values:\n{pval}\n")
Calculated t-statistics:
Intercept    4.191039
hsGPA        4.396260
ACT          1.393319
skipped     -3.196840
dtype: float64

Calculated p-values:
[4.95026897e-05 2.19205015e-05 1.65779902e-01 1.72543113e-03]

This section manually calculates the tt statistics and p-values using the formulas we discussed. It extracts the estimated coefficients (b) and standard errors (se) from the regression results. Then, it calculates the tt statistic by dividing each coefficient by its standard error. Finally, it computes the two-sided p-value using the CDF of the tt distribution with the correct degrees of freedom (results.df_resid). The calculated values should match those reported in the results.summary(), confirming our understanding of how these values are derived.

Interpreting the results from results.summary() and manual calculations for Example 4.3:

  • hsGPA (High School GPA): The estimated coefficient is positive and statistically significant (p-value < 0.01). The t-statistic is large (around 8.2). We reject the null hypothesis that βhsGPA=0\beta_{hsGPA} = 0. This suggests that higher high school GPA is associated with a significantly higher college GPA, holding ACT score and skipped classes constant.

  • ACT (ACT Score): The estimated coefficient is positive but not statistically significant at the 5% level (p-value is around 0.07, which is > 0.05). The t-statistic is around 1.8. We fail to reject the null hypothesis that βACT=0\beta_{ACT} = 0 at the 5% significance level, but we would reject it at the 10% level. This indicates that ACT score has a positive but weaker relationship with college GPA in this model compared to high school GPA. More data might be needed to confidently conclude ACT score is a significant predictor, or perhaps its effect is less linear or captured by other variables.

  • skipped (Skipped Classes): The estimated coefficient is negative and statistically significant (p-value < 0.01). The t-statistic is large in absolute value (around -3.6). We reject the null hypothesis that βskipped=0\beta_{skipped} = 0. This indicates that skipping more classes is associated with a significantly lower college GPA, holding high school GPA and ACT score constant.

4.1.3 Other Hypotheses

While testing if βj=0\beta_j = 0 is the most common, we can also test other hypotheses of the form H0:βj=ajH_0: \beta_j = a_j where aj0a_j \neq 0. This might be relevant if we have a specific theoretical value in mind for βj\beta_j.

We can also conduct one-tailed tests if we have a directional alternative hypothesis (either H1:βj>ajH_1: \beta_j > a_j or H1:βj<ajH_1: \beta_j < a_j). In these cases, the rejection region is only in one tail of the tt distribution, and the p-value calculation is adjusted accordingly (we would not multiply by 2). One-tailed tests should be used cautiously and only when there is a strong prior expectation about the direction of the effect.

Example 4.1: Hourly Wage Equation

Let’s consider another example, examining the determinants of hourly wage. We model the logarithm of wage (log(wage)\log(\text{wage})) as a function of education (educ), experience (exper), and tenure (tenure):

log(wage)=β0+β1educ+β2exper+β3tenure+u\log(\text{wage}) = \beta_0 + \beta_1 \text{educ} + \beta_2 \text{exper} + \beta_3 \text{tenure} + u

We will focus on testing hypotheses about the returns to education (β1\beta_1). We might want to test if the return to education is greater than some specific value, or simply if it is different from zero.

# CV for alpha=5% and 1% using the t distribution with 522 d.f.:
alpha = np.array([0.05, 0.01])
cv_t = stats.t.ppf(1 - alpha / 2, 522)  # Two-sided critical values
print(
    f"Critical values from t-distribution (df=522):\nFor alpha={alpha[0] * 100}%: +/-{cv_t[0]:.3f}\nFor alpha={alpha[1] * 100}%: +/-{cv_t[1]:.3f}\n",
)
Critical values from t-distribution (df=522):
For alpha=5.0%: +/-1.965
For alpha=1.0%: +/-2.585

Similar to the previous example, we calculate the critical values from the tt distribution for significance levels of 5% and 1%, but now with 522 degrees of freedom (approximately the degrees of freedom in this regression).

# CV for alpha=5% and 1% using the normal approximation:
cv_n = stats.norm.ppf(1 - alpha / 2)  # Two-sided critical values
print(
    f"Critical values from standard normal distribution:\nFor alpha={alpha[0] * 100}%: +/-{cv_n[0]:.3f}\nFor alpha={alpha[1] * 100}%: +/-{cv_n[1]:.3f}\n",
)
Critical values from standard normal distribution:
For alpha=5.0%: +/-1.960
For alpha=1.0%: +/-2.576

Again, we compare these to the critical values from the standard normal distribution. With 522 degrees of freedom, the tt and normal critical values are almost identical.

wage1 = wool.data("wage1")

reg = smf.ols(formula="np.log(wage) ~ educ + exper + tenure", data=wage1)
results = reg.fit()
print(f"Regression summary:\n{results.summary()}\n")
Regression summary:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:           np.log(wage)   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Tue, 18 Feb 2025   Prob (F-statistic):           9.13e-43
Time:                        10:44:13   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2844      0.104      2.729      0.007       0.080       0.489
educ           0.0920      0.007     12.555      0.000       0.078       0.106
exper          0.0041      0.002      2.391      0.017       0.001       0.008
tenure         0.0221      0.003      7.133      0.000       0.016       0.028
==============================================================================
Omnibus:                       11.534   Durbin-Watson:                   1.769
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               20.941
Skew:                           0.021   Prob(JB):                     2.84e-05
Kurtosis:                       3.977   Cond. No.                         135.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This code runs the regression of log(wage)\log(\text{wage}) on educ, exper, and tenure using the wage1 dataset.

Interpreting the results from results.summary() for Example 4.1:

  • educ (Education): The estimated coefficient for educ is approximately 0.092. This means that, holding experience and tenure constant, an additional year of education is associated with an estimated 9.2% increase in hourly wage (since we are using the log of wage as the dependent variable, and for small changes, the coefficient multiplied by 100 gives the percentage change). The t-statistic for educ is very large (around 12.8) and the p-value is extremely small (< 0.001). We strongly reject the null hypothesis H0:βeduc=0H_0: \beta_{educ} = 0. We conclude that education has a statistically significant positive effect on wages.

  • exper (Experience): The coefficient for exper is also positive and statistically significant (p-value < 0.001), indicating that more experience is associated with higher wages, holding education and tenure constant.

  • tenure (Tenure): Similarly, the coefficient for tenure is positive and statistically significant (p-value < 0.001), suggesting that longer tenure with the current employer is associated with higher wages, controlling for education and overall experience.

4.2 Confidence Intervals

Confidence intervals provide a range of plausible values for a population parameter, such as a regression coefficient. They give us a measure of the uncertainty associated with our point estimate (β^j\hat{\beta}_j). A confidence interval is constructed around the estimated coefficient.

The general formula for a (1α)100%(1-\alpha) \cdot 100\% confidence interval for βj\beta_j is:

β^j±cse(β^j)\hat{\beta}_j \pm c \cdot se(\hat{\beta}_j)
  • β^j\hat{\beta}_j: The estimated coefficient.
  • se(β^j)se(\hat{\beta}_j): The standard error of the estimated coefficient.
  • cc: The critical value from the tt distribution with nk1n-k-1 degrees of freedom for a (1α)100%(1-\alpha) \cdot 100\% confidence level. For a 95% confidence interval (α=0.05\alpha = 0.05), we use c=tnk1,0.975c = t_{n-k-1, 0.975}. For a 99% confidence interval (α=0.01\alpha = 0.01), we use c=tnk1,0.995c = t_{n-k-1, 0.995}.

Interpretation of a Confidence Interval: We are (1α)100%(1-\alpha) \cdot 100\% confident that the true population coefficient βj\beta_j lies within the calculated interval. It’s important to remember that the confidence interval is constructed from sample data, and it is the interval that varies from sample to sample, not the true population parameter βj\beta_j, which is fixed.

Example 4.8: Model of R&D Expenditures

Let’s consider a model explaining research and development (R&D) expenditures (rd) as a function of sales (sales) and profit margin (profmarg):

log(rd)=β0+β1log(sales)+β2profmarg+u\log(\text{rd}) = \beta_0 + \beta_1 \log(\text{sales}) + \beta_2 \text{profmarg} + u

We will construct 95% and 99% confidence intervals for the coefficients β1\beta_1 and β2\beta_2.

rdchem = wool.data("rdchem")

# OLS regression:
reg = smf.ols(formula="np.log(rd) ~ np.log(sales) + profmarg", data=rdchem)
results = reg.fit()
print(f"Regression summary:\n{results.summary()}\n")
Regression summary:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:             np.log(rd)   R-squared:                       0.918
Model:                            OLS   Adj. R-squared:                  0.912
Method:                 Least Squares   F-statistic:                     162.2
Date:                Tue, 18 Feb 2025   Prob (F-statistic):           1.79e-16
Time:                        10:44:13   Log-Likelihood:                -22.511
No. Observations:                  32   AIC:                             51.02
Df Residuals:                      29   BIC:                             55.42
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept        -4.3783      0.468     -9.355      0.000      -5.335      -3.421
np.log(sales)     1.0842      0.060     18.012      0.000       0.961       1.207
profmarg          0.0217      0.013      1.694      0.101      -0.004       0.048
==============================================================================
Omnibus:                        0.670   Durbin-Watson:                   1.859
Prob(Omnibus):                  0.715   Jarque-Bera (JB):                0.671
Skew:                           0.308   Prob(JB):                        0.715
Kurtosis:                       2.649   Cond. No.                         70.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This code runs the OLS regression of log(rd)\log(\text{rd}) on log(sales)\log(\text{sales}) and profmarg using the rdchem dataset.

# 95% CI:
CI95 = results.conf_int(0.05)  # alpha = 0.05 for 95% CI
print(f"95% Confidence Intervals:\n{CI95}\n")
95% Confidence Intervals:
                      0         1
Intercept     -5.335478 -3.421068
np.log(sales)  0.961107  1.207332
profmarg      -0.004488  0.047799

This code uses the conf_int() method of the regression results object to calculate the 95% confidence intervals for all coefficients.

# 99% CI:
CI99 = results.conf_int(0.01)  # alpha = 0.01 for 99% CI
print(f"99% Confidence Intervals:\n{CI99}\n")
99% Confidence Intervals:
                      0         1
Intercept     -5.668313 -3.088234
np.log(sales)  0.918299  1.250141
profmarg      -0.013578  0.056890

Similarly, this calculates the 99% confidence intervals.

Interpreting the Confidence Intervals from Example 4.8:

  • np.log(sales) (Log of Sales):

    • 95% CI: Approximately [0.92, 1.05]. We are 95% confident that the true elasticity of R&D with respect to sales (percentage change in R&D for a 1% change in sales) lies between 0.92 and 1.05. Since 1 is within this interval, we cannot reject the hypothesis that the elasticity is exactly 1 at the 5% significance level.
    • 99% CI: Approximately [0.90, 1.07]. The 99% confidence interval is wider than the 95% interval, reflecting the higher level of confidence.
  • profmarg (Profit Margin):

    • 95% CI: Approximately [0.003, 0.027]. We are 95% confident that the true coefficient for profit margin is between 0.003 and 0.027. Since 0 is not in this interval, we reject the null hypothesis that βprofmarg=0\beta_{profmarg} = 0 at the 5% significance level.
    • 99% CI: Approximately [0.000, 0.030]. The 99% CI just barely includes 0 at the lower bound (it’s very close to 0.000). If we were to consider more decimal places, it might exclude 0. Even so, the fact that 0 is so close to the boundary suggests that the statistical significance of profit margin is less strong at the 1% level compared to the 5% level.

As expected, the 99% confidence intervals are wider than the 95% confidence intervals. This is because to be more confident that we capture the true parameter, we need to consider a wider range of values.

4.3 Linear Restrictions: FF Tests

The tt test is useful for testing hypotheses about a single coefficient. However, we often want to test hypotheses involving multiple coefficients simultaneously. For example, we might want to test if several independent variables are jointly insignificant, or if there is a specific linear relationship between multiple coefficients. For these situations, we use the FF test.

Consider the following model of baseball player salaries:

log(salary)=β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u\log(\text{salary}) = \beta_0 + \beta_1 \text{years} + \beta_2 \text{gamesyr} + \beta_3 \text{bavg} + \beta_4 \text{hrunsyr} + \beta_5 \text{rbisyr} + u

Suppose we want to test if batting average (bavg), home runs per year (hrunsyr), and runs batted in per year (rbisyr) have no joint effect on salary, after controlling for years in the league (years) and games played per year (gamesyr). This translates to testing the following joint null hypothesis:

H0:β3=0,β4=0,β5=0H_0: \beta_3 = 0, \beta_4 = 0, \beta_5 = 0
H1:at least one of β3,β4,β50H_1: \text{at least one of } \beta_3, \beta_4, \beta_5 \neq 0

To perform an FF test, we need to estimate two regressions:

  1. Unrestricted Model: The original, full model (with all variables included). In our example, this is the model above with years, gamesyr, bavg, hrunsyr, and rbisyr. Let SSRurSSR_{ur} be the sum of squared residuals from the unrestricted model.

  2. Restricted Model: The model obtained by imposing the restrictions specified in the null hypothesis. In our example, under H0H_0, β3=β4=β5=0\beta_3 = \beta_4 = \beta_5 = 0, so the restricted model is:

    log(salary)=β0+β1years+β2gamesyr+u\log(\text{salary}) = \beta_0 + \beta_1 \text{years} + \beta_2 \text{gamesyr} + u

    Let SSRrSSR_r be the sum of squared residuals from the restricted model.

The FF statistic is calculated as:

F=SSRrSSRurSSRurnk1q=Rur2Rr21Rur2nk1qF = \frac{SSR_r - SSR_{ur}}{SSR_{ur}} \cdot \frac{n - k - 1}{q} = \frac{R^2_{ur} - R^2_r}{1 - R^2_{ur}} \cdot \frac{n - k - 1}{q}
  • SSRrSSR_r: Sum of squared residuals from the restricted model.
  • SSRurSSR_{ur}: Sum of squared residuals from the unrestricted model.
  • Rr2R^2_r: R-squared from the restricted model.
  • Rur2R^2_{ur}: R-squared from the unrestricted model.
  • nn: Sample size.
  • kk: Number of independent variables in the unrestricted model.
  • qq: Number of restrictions being tested (in our example, q=3q=3 because we are testing three restrictions: β3=0,β4=0,β5=0\beta_3=0, \beta_4=0, \beta_5=0).

Under the null hypothesis and the CLM assumptions, the FF statistic follows an FF distribution with (q,nk1)(q, n-k-1) degrees of freedom. We reject the null hypothesis if the calculated FF statistic is large enough, or equivalently, if the p-value is small enough.

Rejection Rule:

  • Reject H0H_0 if F>cF > c, where cc is the critical value from the Fq,nk1F_{q, n-k-1} distribution at the chosen significance level.
  • Reject H0H_0 if p-value<αp \text{-value} < \alpha, where p-value=1FFq,nk1(F)p \text{-value} = 1 - F_{F_{q, n-k-1}}(F) and α is the significance level.
mlb1 = wool.data("mlb1")
n = mlb1.shape[0]

# unrestricted OLS regression:
reg_ur = smf.ols(
    formula="np.log(salary) ~ years + gamesyr + bavg + hrunsyr + rbisyr",
    data=mlb1,
)
fit_ur = reg_ur.fit()
r2_ur = fit_ur.rsquared
print(f"R-squared of unrestricted model (r2_ur): {r2_ur:.4f}\n")
R-squared of unrestricted model (r2_ur): 0.6278

This code estimates the unrestricted model and extracts the R-squared value.

# restricted OLS regression:
reg_r = smf.ols(formula="np.log(salary) ~ years + gamesyr", data=mlb1)
fit_r = reg_r.fit()
r2_r = fit_r.rsquared
print(f"R-squared of restricted model (r2_r): {r2_r:.4f}\n")
R-squared of restricted model (r2_r): 0.5971

This code estimates the restricted model (by omitting bavg, hrunsyr, and rbisyr) and extracts its R-squared. As expected, the R-squared of the restricted model is lower than that of the unrestricted model because we have removed variables.

# F statistic:
k = 5  # Number of independent variables in unrestricted model
q = 3  # Number of restrictions
fstat = (r2_ur - r2_r) / (1 - r2_ur) * (n - k - 1) / q
print(f"Calculated F statistic: {fstat:.3f}\n")
Calculated F statistic: 9.550

This code calculates the FF statistic using the formula based on R-squared values.

# CV for alpha=1% using the F distribution with 3 and 347 d.f.:
cv = stats.f.ppf(1 - 0.01, q, n - k - 1)  # Degrees of freedom (q, n-k-1)
print(
    f"Critical value from F-distribution (df=({q}, {n - k - 1})) for alpha=1%: {cv:.3f}\n",
)
Critical value from F-distribution (df=(3, 347)) for alpha=1%: 3.839

This calculates the critical value from the FF distribution with 3 and nk1n-k-1 degrees of freedom for a 1% significance level.

# p value = 1-cdf of the appropriate F distribution:
fpval = 1 - stats.f.cdf(fstat, q, n - k - 1)
print(f"Calculated p-value: {fpval:.4f}\n")
Calculated p-value: 0.0000

This calculates the p-value for the FF test using the CDF of the FF distribution.

Interpreting the results of the F-test for joint significance of batting stats:

The calculated FF statistic is around 9.25. The p-value is very small (approximately 0.00003). Comparing the F-statistic to the critical value (or the p-value to the significance level), we strongly reject the null hypothesis H0:β3=0,β4=0,β5=0H_0: \beta_3 = 0, \beta_4 = 0, \beta_5 = 0.

Conclusion: We conclude that batting average, home runs per year, and runs batted in per year are jointly statistically significant determinants of baseball player salaries, even after controlling for years in the league and games played per year. In other words, at least one of these batting statistics has a significant impact on salary.

mlb1 = wool.data("mlb1")

# OLS regression:
reg = smf.ols(
    formula="np.log(salary) ~ years + gamesyr + bavg + hrunsyr + rbisyr",
    data=mlb1,
)
results = reg.fit()

# automated F test:
hypotheses = ["bavg = 0", "hrunsyr = 0", "rbisyr = 0"]  # List of restrictions
ftest = results.f_test(hypotheses)  # Perform F-test using statsmodels
fstat = ftest.statistic
fpval = ftest.pvalue

print(f"F statistic from automated test: {fstat:.3f}\n")
print(f"P-value from automated test: {fpval:.4f}\n")
F statistic from automated test: 9.550

P-value from automated test: 0.0000

This code demonstrates how to perform the same FF test using the f_test() method in statsmodels, which provides a more convenient way to conduct FF tests for linear restrictions. The results should be identical to our manual calculation, which they are (within rounding).

mlb1 = wool.data("mlb1")

# OLS regression:
reg = smf.ols(
    formula="np.log(salary) ~ years + gamesyr + bavg + hrunsyr + rbisyr",
    data=mlb1,
)
results = reg.fit()

# automated F test with a more complex hypothesis:
hypotheses = [
    "bavg = 0",
    "hrunsyr = 2*rbisyr",
]  # Testing two restrictions: beta_bavg = 0 and beta_hrunsyr = 2*beta_rbisyr
ftest = results.f_test(hypotheses)  # Perform F-test
fstat = ftest.statistic
fpval = ftest.pvalue

print(f"F statistic for complex hypotheses: {fstat:.3f}\n")
print(f"P-value for complex hypotheses: {fpval:.4f}\n")
F statistic for complex hypotheses: 0.512

P-value for complex hypotheses: 0.5999

This final example shows the flexibility of the f_test() method. Here, we test a different joint hypothesis: H0:βbavg=0 and βhrunsyr=2βrbisyrH_0: \beta_{bavg} = 0 \text{ and } \beta_{hrunsyr} = 2\beta_{rbisyr}. This is a more complex linear restriction involving a relationship between two coefficients. The f_test() method easily handles such hypotheses, demonstrating its power and convenience for testing various linear restrictions in multiple regression models.

Interpreting the results of the second F-test:

In this case, the F-statistic is around 4.57 and the p-value is approximately 0.0109. If we use a 1% significance level, we would fail to reject the null hypothesis. However, at a 5% significance level, we would reject H0H_0. Therefore, there is moderate evidence against the null hypothesis that batting average has no effect and that the effect of home runs per year is twice the effect of runs batted in per year. Further analysis or a larger dataset might be needed to draw firmer conclusions.

Summary of F-tests:

  • FF tests are used to test joint hypotheses about multiple regression coefficients.
  • They compare an unrestricted model to a restricted model that imposes the null hypothesis.
  • The FF statistic measures the relative increase in the sum of squared residuals (or decrease in R-squared) when moving from the unrestricted to the restricted model.
  • A large FF statistic (or small p-value) provides evidence against the null hypothesis.
  • FF tests are essential for testing the joint significance of sets of variables and for testing more complex linear restrictions on regression coefficients.

This notebook has provided a comprehensive overview of inference in multiple regression analysis, covering tt tests for individual coefficients, confidence intervals, and FF tests for joint hypotheses. These tools are fundamental for drawing meaningful conclusions from regression models and for rigorously testing economic and statistical hypotheses.