Week 3 tips and FAQs

FAQs
Posted

Tuesday February 3, 2026 at 9:44 AM

What’s the difference between statistical significance and substantial significance?

All that statistical significance really tells you is how confident you are that a measured effect is not zero. Or, more formally, using our null world idea, if something is significant, it means that in a hypothetical world where the effect is actually zero, the probability of seeing your measured effect is really low (less than 5%).

But most of the time we don’t care if an effect is maybe zero or not zero. We want to know if the effect actually matters!

Here’s a little example. Let’s say you work for a nonprofit that is launching a new educational program for people who have recently released been released from prison. You create an RCT of 325 people and randomly assign some people to the program, and your main outcome of interest is monthly income.

Here’s what that data might look like:

Code
library(tidyverse)
library(parameters)
library(tinytable)

n_people <- 325

withr::with_seed(1234, {
  example_data1 <- tibble(
    person_id = 1:n_people,
    pre_income = rnorm(n = n_people, mean = 3500, sd = 200)
  ) |>
    mutate(
      group = sample(
        c("Treatment", "Control"),
        size = n_people,
        replace = TRUE
      ),
      causal_effect = rnorm(n = n_people, mean = 200, sd = 600),
      null_effect = rnorm(n = n_people, mean = 0, sd = 600),
      post_income = ifelse(
        group == "Treatment",
        pre_income + causal_effect,
        pre_income + null_effect
      )
    ) |>
    select(person_id, group, pre_income, post_income)
})
example_data1
## # A tibble: 325 × 4
##    person_id group     pre_income post_income
##        <int> <chr>          <dbl>       <dbl>
##  1         1 Treatment      3259.       3755.
##  2         2 Treatment      3555.       3328.
##  3         3 Treatment      3717.       4045.
##  4         4 Control        3031.       3554.
##  5         5 Control        3586.       3585.
##  6         6 Control        3601.       4262.
##  7         7 Control        3385.       3375.
##  8         8 Treatment      3391.       3033.
##  9         9 Treatment      3387.       3796.
## 10        10 Treatment      3322.       2751.
## # ℹ 315 more rows

Let’s look at the difference in post-program incomes between the treatment and control groups. This will be the causal effect of the program:

model1 <- lm(post_income ~ group, data = example_data1)
model_parameters(model1)
## Parameter         | Coefficient |    SE |             95% CI | t(323) |      p
## ------------------------------------------------------------------------------
## (Intercept)       |     3515.45 | 48.92 | [3419.21, 3611.70] |  71.86 | < .001
## group [Treatment] |      136.71 | 66.67 | [   5.55,  267.87] |   2.05 | 0.041
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.

Cool. It looks like the program caused monthly incomes to increase by $136.71.

That result is statistically significant, with a p-value of 0.041. That means that in a world where the actual program effect is ≈0, there’s a 4.1% chance of seeing an effect of $136.71 or larger. That’s not super rare to see in a null world, but it’s below the completely arbitrary 5% threshold, so we declare that it’s significant.

The confidence interval shows that we’re 95% confident that the range $5.55 and $267.87 has captured the true effect. Zero isn’t in that range, so we’re again pretty certain that the effect isn’t zero.

Statistically significant? ✓

Does that finding matter? Are you, as the program manager, happy that the program is boosting monthly incomes by $137 on average? Probably!

Let’s look at another example. Let’s say you run the RCT and get this data instead:

Code
n_people <- 4000

withr::with_seed(1234, {
  example_data2 <- tibble(
    person_id = 1:n_people,
    pre_income = rnorm(n = n_people, mean = 3500, sd = 5)
  ) |>
    mutate(
      group = sample(
        c("Treatment", "Control"),
        size = n_people,
        replace = TRUE
      ),
      causal_effect = 0.5,
      null_effect = 0,
      post_income = ifelse(
        group == "Treatment",
        pre_income + causal_effect,
        pre_income + null_effect
      )
    ) |>
    select(person_id, group, pre_income, post_income)
})
example_data2
## # A tibble: 4,000 × 4
##    person_id group     pre_income post_income
##        <int> <chr>          <dbl>       <dbl>
##  1         1 Control        3494.       3494.
##  2         2 Treatment      3501.       3502.
##  3         3 Control        3505.       3505.
##  4         4 Control        3488.       3488.
##  5         5 Treatment      3502.       3503.
##  6         6 Control        3503.       3503.
##  7         7 Control        3497.       3497.
##  8         8 Treatment      3497.       3498.
##  9         9 Treatment      3497.       3498.
## 10        10 Treatment      3496.       3496.
## # ℹ 3,990 more rows

Let’s look at this new causal effect:

model2 <- lm(post_income ~ group, data = example_data2)
model_parameters(model2)
## Parameter         | Coefficient |   SE |             95% CI |  t(3998) |      p
## -------------------------------------------------------------------------------
## (Intercept)       |     3500.00 | 0.11 | [3499.78, 3500.22] | 31330.00 | < .001
## group [Treatment] |        0.51 | 0.16 | [   0.20,    0.82] |     3.25 | 0.001
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.

The effect of the program is still positive—it causes a $0.51 increase in monthly incomes, on average.

Importantly, the p-value is really really tiny: it’s 0.001164 (which, following APA style, model_parameters() reports as “p < 0.001”). The effect is definitely statistically significant! In a world where the effect of the program is ≈0, there’s a 0.1164% chance of seeing an effect as big as $0.51.

That’s, like, super significant.1

BUUUUUT does it matter? Are you, as the program manager, happy that the program is boosting monthly incomes by $0.51 on average? Should you spend tons of organizational resources to run this program if it’s only giving people fifty more cents every month? I’d guess probably not.

Both of these causal effects are statistically significant. But only the first is substantively important.

Can we measure substantial significance with statistics?

No, not really. Kind of.

You have to rely on your knowledge of the subject and the data and the context to know if the measured effect really matters.

There are some more systematic ways to assess substantiveness beyond just managerial vibes, like these:

  • Justin Esarey and Nathan Danneman, “A Quantitative Method for Substantive Robustness Assessment,” Political Science Research and Methods 3, no. 1 (2015): 95–111, https://doi.org/10.1017/psrm.2014.14.
  • Justin H. Gross, “Testing What Matters (If You Must Test at All): A Context‐Driven Approach to Substantive and Statistical Significance,” American Journal of Political Science 59, no. 3 (2015): 775–88, https://doi.org/10.1111/ajps.12149.

But often—especially for practical program evaluation situations where stakeholders have a lot of sway in program implementation and other decisions—the determination of a result’s substantiveness is mostly vibes-based.

What are all the different ways we can look at model coefficients?

In problem set 2, you fed your model objects to model_parameters() to see the coefficients, and also fed them to modelsummary() to see the coefficients. What’s going on here with these different functions?

As I noted in the little mini-lesson on regression in the middle of problem set 2, R returns regression models as objects that you can do stuff with.

For instance, here’s a little model based on penguins data:

some_model <- lm(body_mass ~ bill_dep + species, data = penguins)

That new some_model object has the model results (coefficients, standard errors, p-values, diagnostics like R², etc.) stored inside it. If you want to know what those are or do things with them, you have to look at the object.

There are lots of different ways to do this!

Use summary()

You can feed some_model to the summary() function to see a more detailed set of results:

summary(some_model)
## 
## Call:
## lm(formula = body_mass ~ bill_dep + species, data = penguins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -867.73 -255.61  -27.41  242.41 1190.86 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1007.28     323.56  -3.113  0.00201 ** 
## bill_dep           256.61      17.56  14.611  < 2e-16 ***
## speciesChinstrap    13.38      52.95   0.253  0.80069    
## speciesGentoo     2238.67      73.68  30.383  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 362.4 on 338 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.7975, Adjusted R-squared:  0.7957 
## F-statistic: 443.8 on 3 and 338 DF,  p-value: < 2.2e-16

↑ That gives you a big wall of text (similar to what you see in Stata, if you’re used to Stata), with coefficients, standard errors, t-statistics, p-values, and model details like R² and the number of observations and so on.

That’s fine, but it’s really gross when that wall of text appears in a rendered Quarto document. Also, it’s really hard to do anything with those values. Like, what if you want to round the values? Or plot them? Or only show some of the coefficients? They’re all locked into this chunk of text.

Use tidy() from the {broom} package

The {broom} package (tangentially part of the tidyverse ecosystem) provides functions for converting model objects into data frames that are much easier to work with.

There are two general functions: tidy() for extracting coefficient details and glance() for extracting model details:

library(broom)

tidy(some_model)
## # A tibble: 4 × 5
##   term             estimate std.error statistic  p.value
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)       -1007.      324.     -3.11  2.01e- 3
## 2 bill_dep            257.       17.6    14.6   8.10e-38
## 3 speciesChinstrap     13.4      52.9     0.253 8.01e- 1
## 4 speciesGentoo      2239.       73.7    30.4   1.16e-98
glance(some_model)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC  deviance df.residual  nobs
##       <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>     <dbl>       <int> <int>
## 1     0.798         0.796  362.      444. 7.72e-117     3 -2499. 5007. 5026. 44399670.         338   342

Because tidy() returns a data frame, you can do all the regular dplyr stuff to it, like filtering, mutating, selecting, grouping, summarizing, and so on. Like, here’s just the coefficient for bill depth, without the statistic column:

tidy(some_model) |> 
  filter(term == "bill_dep") |> 
  select(term, estimate, std.error, p.value)
## # A tibble: 1 × 4
##   term     estimate std.error  p.value
##   <chr>       <dbl>     <dbl>    <dbl>
## 1 bill_dep     257.      17.6 8.10e-38

Since this is a data frame, we can even plot these coefficients with ggplot to see if 0 is in their confidence intervals:

tidy(some_model, conf.int = TRUE) |> 
  filter(term == "bill_dep") |> 
  ggplot(aes(x = estimate, y = term)) + 
  geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
  geom_vline(xintercept = 0)

Use model_parameters() and model_performance() from the {parameters} and {performance} packages

{broom} is great and nice and I use it all the time in my own work, but it has a few inconveniences:

  1. tidy() doesn’t show confidence intervals by default. If you want to see those, you have to include conf.int = TRUE. It’s not a big deal, just kind of annoying to have to do all the time:

    tidy(some_model, conf.int = TRUE)
    ## # A tibble: 4 × 7
    ##   term             estimate std.error statistic  p.value conf.low conf.high
    ##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
    ## 1 (Intercept)       -1007.      324.     -3.11  2.01e- 3  -1644.      -371.
    ## 2 bill_dep            257.       17.6    14.6   8.10e-38    222.       291.
    ## 3 speciesChinstrap     13.4      52.9     0.253 8.01e- 1    -90.8      118.
    ## 4 speciesGentoo      2239.       73.7    30.4   1.16e-98   2094.      2384.
  2. tidy() shows the raw p-value, often in scientific notation like 8.10e-38. Students will often see that and forget that it really means \(8.1 \times 10^{-38}\) and will think that the p-value is 8.1 or something impossible. Again, not a deal-breaker—just an extra thing to have to think about when looking at the results.

  3. The function names tidy() and glance() don’t really describe what they’re doing. You just have to remember that tidy() shows the model parameters and coefficients and glance() shows the model performance details.

To make life easier, there’s a collection of package (similar to the tidyverse package ecosystem idea) called easystats that contains a bunch of packages and functions to make it easier to work with models. Two of the easystats packages that you’ve already used are {parameters} and {performance}

These packages fix all the minor inconveniences of {broom} and give you a lot of other neat features.

  1. model_parameters() shows confidence intervals by default
  2. model_parameters() shows an APA-style p-value, where it displays as p < 0.001 when it’s really small—no more 8.10e-38!
  3. Instead of remembering that tidy() shows you model parameters, and glance() shows you model performance details, you can use model_parameters() and model_performance()
library(parameters)
library(performance)

model_parameters(some_model)
## Parameter           | Coefficient |     SE |              95% CI | t(338) |      p
## ----------------------------------------------------------------------------------
## (Intercept)         |    -1007.28 | 323.56 | [-1643.73, -370.83] |  -3.11 | 0.002 
## bill dep            |      256.61 |  17.56 | [  222.07,  291.16] |  14.61 | < .001
## species [Chinstrap] |       13.38 |  52.95 | [  -90.77,  117.52] |   0.25 | 0.801 
## species [Gentoo]    |     2238.67 |  73.68 | [ 2093.74, 2383.60] |  30.38 | < .001
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.
model_performance(some_model)
## # Indices of model performance
## 
## AIC    |   AICc |    BIC |    R2 | R2 (adj.) |    RMSE |   Sigma
## ----------------------------------------------------------------
## 5007.2 | 5007.4 | 5026.4 | 0.798 |     0.796 | 360.310 | 362.436

You can also display these as actual tables in a Quarto document if you pipe them to display():

model_parameters(some_model) |> 
  display()
Parameter Coefficient SE 95% CI t(338) p
(Intercept) -1007.28 323.56 (-1643.73, -370.83) -3.11 0.002
bill dep 256.61 17.56 (222.07, 291.16) 14.61 < .001
species (Chinstrap) 13.38 52.95 (-90.77, 117.52) 0.25 0.801
species (Gentoo) 2238.67 73.68 (2093.74, 2383.60) 30.38 < .001

You can tell model_parameters() which parameters you want to show or not show without needing to use filter():

model_parameters(some_model, keep = "bill_dep", verbose = FALSE)
## Parameter | Coefficient |    SE |           95% CI | t(338) |      p
## --------------------------------------------------------------------
## bill dep  |      256.61 | 17.56 | [222.07, 291.16] |  14.61 | < .001

Even though it’s printing nicely as a table, the results from model_parameters() are actually a data frame, so you can do all the regular dplyr stuff with them just like with tidy():

model_parameters(some_model) |> 
  filter(Parameter == "bill_dep") |> 
  ggplot(aes(x = Coefficient, y = Parameter)) + 
  geom_pointrange(aes(xmin = CI_low, xmax = CI_high)) +
  geom_vline(xintercept = 0)

Make nice polished side-by-side regression tables with {modelsummary}

model_parameters() and model_performance() (or tidy() and glance() from {broom}) are great for working with models on their own, but often you’ll want to show a bunch of models all at once. In problem set 2, I had you do this with modelsummary() from the {modelsummary} package. Check out its documentation here.:

Notemodelsummary() and easystats

modelsummary() actually uses model_parameters() and model_performance() behind the scenes to extract all these numbers from the models and build the fancy table!

library(modelsummary)

modelsummary(
  list(some_model)
)
(1)
(Intercept) -1007.281
(323.561)
bill_dep 256.615
(17.563)
speciesChinstrap 13.377
(52.947)
speciesGentoo 2238.668
(73.682)
Num.Obs. 342
R2 0.798
R2 Adj. 0.796
AIC 5007.2
BIC 5026.4
Log.Lik. -2498.619
F 443.839
RMSE 360.31

You can customize these tables a lot (again, see the documentation). Like, here we can rename the coefficients, change the column names, show confidence intervals instead of standard errors, and control which goodness-of-fit statistics (GOF) appear in the bottom half:

modelsummary(
  list("Example model" = some_model),
  statistic = "conf.int",
  coef_rename = c(
    "bill_dep" = "Bill depth (mm)",
    "speciesChinstrap" = "Species (Chinstrap)",
    "speciesGentoo" = "Species (Gentoo)"
  ),
  gof_map = c("nobs", "r.squared", "adj.r.squared", "F")
)
Example model
(Intercept) -1007.281
[-1643.728, -370.834]
Bill depth (mm) 256.615
[222.068, 291.161]
Species (Chinstrap) 13.377
[-90.770, 117.525]
Species (Gentoo) 2238.668
[2093.735, 2383.601]
Num.Obs. 342
R2 0.798
R2 Adj. 0.796
F 443.839

Make automatic coefficient plots with modelplot() from {modelsummary}

If you like coefficient plots but don’t want to build them yourself with ggplot, like we did above, {modelsummary} also comes with a modelplot() function (see the full documentation here):

modelplot(some_model) + 
  geom_vline(xintercept = 0)

You can automatically plot multiple models too, either as colored lines:

another_model <- lm(body_mass ~ bill_dep + species + sex, data = penguins)

modelplot(
  list("First model" = some_model, "A different model" = another_model)
) +
  geom_vline(xintercept = 0)

Or in facets:

modelplot(
  list("First model" = some_model, "A different model" = another_model),
  facet = TRUE
) +
  geom_vline(xintercept = 0)

Plot model predictions and marginal effects

You can do some other really neat things with model objects with the {marginaleffects} package, which lets you plot predictions and slopes and other things you might be interested in.

Like, here’s how predicted body mass changes across a range of possible bill depths:

library(marginaleffects)

plot_predictions(some_model, condition = "bill_dep")

Or across a range of possible bill depths in each of the species:

plot_predictions(some_model, condition = c("bill_dep", "species"))

This is really helpful when you have interaction terms (where slopes are different for each species):

model_interaction <- lm(body_mass ~ bill_dep * species, data = penguins)

plot_predictions(model_interaction, condition = c("bill_dep", "species"))

That Gentoo line has a steeper slope than the other two species. We could figure out those slopes by looking at the coefficients and then adding/subtracting the different interaction terms:

model_parameters(model_interaction)
## Parameter                      | Coefficient |     SE |              95% CI | t(336) |      p
## ---------------------------------------------------------------------------------------------
## (Intercept)                    |     -283.28 | 437.94 | [-1144.73,  578.17] |  -0.65 | 0.518 
## bill dep                       |      217.15 |  23.82 | [  170.30,  264.00] |   9.12 | < .001
## species [Chinstrap]            |      247.06 | 829.77 | [-1385.14, 1879.26] |   0.30 | 0.766 
## species [Gentoo]               |     -175.71 | 658.43 | [-1470.88, 1119.47] |  -0.27 | 0.790 
## bill dep × species [Chinstrap] |      -12.53 |  45.01 | [ -101.06,   76.01] |  -0.28 | 0.781 
## bill dep × species [Gentoo]    |      152.29 |  40.49 | [   72.64,  231.94] |   3.76 | < .001
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.

…but that’s annoying! avg_slopes() lets us get species-specific slopes automatically:

avg_slopes(model_interaction, variables = "bill_dep", by = "species")
## 
##    species Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
##  Adelie         217       23.8  9.12   <0.001 63.5   170    264
##  Chinstrap      205       38.2  5.36   <0.001 23.5   130    279
##  Gentoo         369       32.7 11.28   <0.001 95.6   305    434
## 
## Term: bill_dep
## Type: response
## Comparison: dY/dX

For Gentoos, a 1 mm increase in bill depth is associated with a 369 g increase in body mass, while for Chinstraps a 1 mm increase in bill depth is associated with a 205 g increase in body mass. Neat.

Automatic interpretation with {report}

Finally, remember that whole template thing you learned in past stats classes about how to interpret regression coefficients? The {report} package (which is part of the larger easystats ecosystem) does this for you!

library(report)

report_text(some_model)
## We fitted a linear model (estimated using OLS) to predict body_mass with bill_dep and species
## (formula: body_mass ~ bill_dep + species). The model explains a statistically significant and
## substantial proportion of variance (R2 = 0.80, F(3, 338) = 443.84, p < .001, adj. R2 = 0.80). The
## model's intercept, corresponding to bill_dep = 0 and species = Adelie, is at -1007.28 (95% CI
## [-1643.73, -370.83], t(338) = -3.11, p = 0.002). Within this model:
## 
##   - The effect of bill dep is statistically significant and positive (beta = 256.61, 95% CI [222.07,
## 291.16], t(338) = 14.61, p < .001; Std. beta = 0.63, 95% CI [0.55, 0.72])
##   - The effect of species [Chinstrap] is statistically non-significant and positive (beta = 13.38,
## 95% CI [-90.77, 117.52], t(338) = 0.25, p = 0.801; Std. beta = 0.02, 95% CI [-0.11, 0.15])
##   - The effect of species [Gentoo] is statistically significant and positive (beta = 2238.67, 95% CI
## [2093.74, 2383.60], t(338) = 30.38, p < .001; Std. beta = 2.79, 95% CI [2.61, 2.97])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the
## dataset. 95% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution
## approximation.

How do I change the reference category for a categorical variable?

By default, R omits the first category in alphabetic order in a regression. Like here, there are three penguin species: Adelie, Chinstrap, and Gentoo. Adelie comes first, so it is omitted and all the coefficients are in relation to Adelies:

model_species <- lm(body_mass ~ species, data = penguins)
model_parameters(model_species)
## Parameter           | Coefficient |    SE |             95% CI | t(339) |      p
## --------------------------------------------------------------------------------
## (Intercept)         |     3700.66 | 37.62 | [3626.67, 3774.66] |  98.37 | < .001
## species [Chinstrap] |       32.43 | 67.51 | [-100.37,  165.22] |   0.48 | 0.631 
## species [Gentoo]    |     1375.35 | 56.15 | [1264.91, 1485.80] |  24.50 | < .001
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.

Chinstraps are a little heavier than Adelies; Gentoos are a lot heavier than Adelies.

But what if you want to change the reference category or base case? Like, what if we want Gentoos to be omitted?

To do this, you need to change the order of the levels in the species column. You can see what the current level order is using levels(), like this:

levels(penguins$species)
## [1] "Adelie"    "Chinstrap" "Gentoo"

Adelie comes first.

The fct_relevel() function lets you move other levels up to the front, like this:

penguins_modified <- penguins |> 
  mutate(species = fct_relevel(species, "Gentoo"))

levels(penguins_modified$species)
## [1] "Gentoo"    "Adelie"    "Chinstrap"

Now Gentoo comes first. If we make a regression model with this new column, Gentoos will be omitted:

model_species1 <- lm(body_mass ~ species, data = penguins_modified)
model_parameters(model_species1)
## Parameter           | Coefficient |    SE |               95% CI | t(339) |      p
## ----------------------------------------------------------------------------------
## (Intercept)         |     5076.02 | 41.68 | [ 4994.03,  5158.00] | 121.78 | < .001
## species [Adelie]    |    -1375.35 | 56.15 | [-1485.80, -1264.91] | -24.50 | < .001
## species [Chinstrap] |    -1342.93 | 69.86 | [-1480.34, -1205.52] | -19.22 | < .001
## 
## Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed using a Wald t-distribution approximation.

Cool cool. Adelies and Chinstraps are both way lighter than Gentoos on average.

Importantly, none of the results here actually changed—everything is exactly the same. The reference category ultimately doesn’t matter. All that changed is the reference—these numbers are in relation to Gentoos.

Another reason this doesn’t matter is that you can use things like predictions() from {marginaleffects} to see predicted values of species weights, and it’ll automatically work with whatever the reference category is.

For instance, here are the predicted weights for the three species using the model where Adelies are the reference category:

predictions(model_species, newdata = datagrid(species = unique))
## 
##    species Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
##  Adelie        3701       37.6  98.4   <0.001 Inf  3627   3774
##  Chinstrap     3733       56.1  66.6   <0.001 Inf  3623   3843
##  Gentoo        5076       41.7 121.8   <0.001 Inf  4994   5158
## 
## Type: response

And here are the predicted weights for the three species using the model where Gentoos are the reference category:

predictions(model_species1, newdata = datagrid(species = unique))
## 
##    species Estimate Std. Error     z Pr(>|z|)   S 2.5 % 97.5 %
##  Gentoo        5076       41.7 121.8   <0.001 Inf  4994   5158
##  Adelie        3701       37.6  98.4   <0.001 Inf  3627   3774
##  Chinstrap     3733       56.1  66.6   <0.001 Inf  3623   3843
## 
## Type: response

They’re the same!

Footnotes

  1. Not really! We can’t think of p-values as almost significant or super significant—they’re either significant or not.↩︎