To estimate the effect of the championship in different countries, we constructed a Bayesian model that uses the reported case numbers in 12 countries. Ethical approval was not sought as we only worked with openly available data. A graphical overview of the inference model is given in Fig. 4 and model variables, prior distributions, indices, country-dependent priors, and sampling performance are summarized in Tables 1, 2, 3, 4 and 5, respectively.

### Modeling the spreading dynamics, including gender imbalance

The model simulates the spread of COVID-19 in each country separately using a discrete renewal process^{22,29,43}. We infer a time-dependent effective reproduction number with gender interactions between genders *g* and \({g}^{{\prime} }\), \({R}_{{{{{{{{\rm{eff}}}}}}}},g,{g}^{{\prime} }}(t)\), for each country^{21}.

Even though participation of women in football fan activity has increased in the last decades^{44}, football fans are still predominantly male^{24}. Hence one expects a higher infection probability at the days of the match for the male compared to the female population. Integrating this information into the model by using gender resolved case numbers, allows improved inference of the Euro 2020’s impact. In the following, genders “male” and “female” are denoted by the subscripts •_{g=1} and •_{g=2}, respectively. Furthermore, we modeled the spreading dynamics of COVID-19 in each country separately.

In the discrete renewal process for disease dynamics of the respective country, we define for each gender *g* a susceptible pool *S*_{g} and an infected pool *I*_{g}. With *N* denoting the population size, the spreading dynamics with daily time resolution *t* reads as

$${I}_{g}(t)=\frac{{S}_{g}(t)}{N}\mathop{\sum }\limits_{{g}^{{\prime} }=1}^{2}{{{{{{{{\bf{R}}}}}}}}}_{{{{{{{{\rm{eff}}}}}}}},g,{g}^{{\prime} }}(t)\mathop{\sum }\limits_{\tau=0}^{10}{I}_{{g}^{{\prime} }}(t-1-\tau )\,G(\tau ),$$

(1)

$${S}_{g}(t)={S}_{g}(t-1)-{E}_{g}(t-1),$$

(2)

$$G(\tau )={{{{{{{\rm{Gamma}}}}}}}}(\tau ;\mu=4,\sigma=1.5).$$

(3)

We apply a discrete convolution in Eq. (1) to account for the latent period and subsequent infection (red box in Fig. 4). This generation interval (between infections) is modeled by a Gamma distribution *G*(*τ*) with a mean *μ* of four days and standard deviation *σ* of one and a half days. This is a little longer than the estimates of the generation interval of the Delta variant^{45,46}, but shorter than the estimated generation interval of the original strain^{47,48}. The impact of the choice of generation interval has negligible impact on our results (Supplementary Fig. S16). The infected compartment (commonly *I*) is not modeled explicitly as a separate compartment, but implicitly with the assumed generation interval kernel.

The effective spread in a given country is described by the country-dependent effective reproduction numbers for infections of individuals of gender *g* by individuals of gender \({g}^{{\prime} }\)

$${{{{{{{{\bf{R}}}}}}}}}_{{{{{{{{\rm{eff}}}}}}}},g,{g}^{{\prime} }}(t)={R}_{{{{{{{{\rm{base}}}}}}}}}(t){C}_{{{{{{{{\rm{base}}}}}}}},g,{g}^{{\prime} }}+\Delta {R}_{{{{{{{{\rm{football}}}}}}}}}(t){C}_{{{{{{{{\rm{match}}}}}}}},g,{g}^{{\prime} }}+\Delta {R}_{{{{{{{{\rm{noise}}}}}}}}}(t){C}_{{{{{{{{\rm{noise}}}}}}}},g,{g}^{{\prime} }},$$

(4)

where \({C}_{{{{{{{{\rm{base}}}}}}}},g,{g}^{{\prime} }}\), \({C}_{{{{{{{{\rm{match}}}}}}}},g,{g}^{{\prime} }}\), and \({C}_{{{{{{{{\rm{noise}}}}}}}},g,{g}^{{\prime} }}\) describe the entries of the contact matrices **C**_{base}, **C**_{match}, **C**_{noise} respectively (purple boxes in Fig. 4).

This effective reproduction number is a function of three different reproduction numbers (yellow and orange boxes in Fig. 4):

- 1.
A slowly changing base reproduction number

*R*_{base}(22) that has the same effect on both genders; besides incorporating the epidemiological information given by the basic reproduction number*R*_{0}, it represents the day-to-day contact behavior, including the impact of non-pharmaceutical interventions (NPIs), voluntary preventive measures, immunity status, etc. - 2.
The reproduction number associated with social gatherings in the context of a football match

*R*_{match}(*t*) (11); this number is only different from zero on days with matches that the respective country’s team participates in and it has a larger effect on men than on women. - 3.
A slowly changing noise term Δ

*R*_{noise}(*t*) (31), which subsumes all additional effects which might change the incidence ratio between males and females (gender imbalance).

The interaction between persons of specific genders is implemented by effective contact matrices **C**_{match}, **C**_{base} and **C**_{noise}. All three are assumed to be symmetric.

**C**_{base} describes non-football related contacts outside the context of Euro 2020 matches (left purple box in Fig. 4):

$${{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{base}}}}}}}}}=\left(\begin{array}{cc}1-{c}_{{{{{{{{\rm{off}}}}}}}}}&{c}_{{{{{{{{\rm{off}}}}}}}}}\\ {c}_{{{{{{{{\rm{off}}}}}}}}}&1-{c}_{{{{{{{{\rm{off}}}}}}}}}\end{array}\right),$$

(5)

$${{{{{{{\rm{with}}}}}}}}\,{c}_{{{{{{{{\rm{off}}}}}}}}} \sim {{{{{{{\rm{Beta}}}}}}}}(\alpha=8,\, \beta=8).$$

(6)

Here, we have the prior assumption that contacts between women, contacts between men, and contacts between women and men are equally probable. Hence, we chose the parameters for the Beta distribution such that *c*_{off} has a mean of 50% with a 2.5th and 97.5th percentile of [27%, 77%]. This prior is chosen such that it is rather uninformative. As shown in Supplementary Fig. S17, this and other priors of auxiliary parameters do not affect the parameter of interest if their width is varied within a factor of 2 up and down.

**C**_{match} describes the contact behavior in the context of the Euro 2020 football matches (right purple box in Fig. 4). Here, we assume as a prior that the female participation in football-related gatherings accounts for ≃ 33% (95% percentiles [18%,51%]) of the total participation. Hence, we get the following contact matrix

$${{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{match}}}}}}}},{{{{{{{\rm{unnorm}}}}}}}}.}=\left(\begin{array}{cc}{(1-{\omega }_{{{{{{{{\rm{gender}}}}}}}}})}^{2}&{\omega }_{{{{{{{{\rm{gender}}}}}}}}}(1-{\omega }_{{{{{{{{\rm{gender}}}}}}}}})\\ {\omega }_{{{{{{{{\rm{gender}}}}}}}}}(1-{\omega }_{{{{{{{{\rm{gender}}}}}}}}})&{\omega }_{{{{{{{{\rm{gender}}}}}}}}}^{2}\end{array}\right)$$

(7)

$${{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{match}}}}}}}}}=\frac{{{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{match}}}}}}}},{{{{{{{\rm{unnorm.}}}}}}}}}}{{\left|{{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{match}}}}}}}},{{{{{{{\rm{unnorm.}}}}}}}}}\cdot {\left(0.5,0.5\right)}^{T}\right|}_{2}}$$

(8)

$${\omega }_{{{{{{{{\rm{gender}}}}}}}}} \sim {{{{{{{\rm{Beta}}}}}}}}\left(\alpha=10,\, \beta=20\right).$$

(9)

The prior beta distribution of *ω*_{gender} is bounded between at 0 and 1 and with the parameter values of *α* = 10 and *β* = 20 has the expectation value of 1/3. The robustness of the choice of this parameter is explored in Supplementary Fig. S15. **C**_{match} is normalized such that for balanced case numbers (equal case numbers for men and women) and an additive reproduction number *R*_{match} = 1 will lead to a unitary increase of total case numbers. The reproduction number of women will therefore increase by 2*ω*_{gender}Δ*R*_{match}(*t*) on match days whereas the one of men will increase by 2(1 − *ω*_{gender})Δ*R*_{match}, assuming balanced case numbers beforehand.

**C**_{noise} describes the effect of an additional noise term, which changes gender balance without being related to football matches (middle purple box in Fig. 4). For simplicity, it is implemented as

$${{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{noise}}}}}}}}}=\left(\begin{array}{cc}1&0\\ 0&-1\end{array}\right),$$

(10)

whereby we center the diagonal elements such that the cases introduced by the noise term sum up to zero, i.e. ∑_{i,j}*R*_{noise} ⋅ *C*_{noise,i,j} = 0.

### Football-related effect

Our aim is to quantify the number of cases (or equivalently the fraction of cases) associated with the Euro 2020, \({{{\Gamma }}}_{g}^{{{{{{{{\rm{Euro}}}}}}}}}\). To that end we assume that infections can occur at public or private football screenings in the two countries participating in the respective match *m* (parameterized by Δ*R*_{match,m}). Note that for the Euro 2020 not a single country, but a set of 11 countries hosted the matches. The participation of a team or the staging of a match in a country may have different effect sizes. Thus, we define the football related additive reproduction number as

$$\Delta {R}_{{{{{{{{\rm{football}}}}}}}}}(t)=\mathop{\sum}\limits_{m}\Delta {R}_{{{{{{{{\rm{match}}}}}}}},m}\cdot \delta ({t}_{m}-t).$$

(11)

We assume the effect of each match to only be effective in a small time window centered around the day of a match *m*, *t*_{m} (light orange box in Fig. 4). Thus, we apply an approximate delta function *δ*(*t*_{m} − *t*). To guarantee differentiability and hence better convergence of the model, we did not use a delta distribution but instead a narrow normal distribution centered around *t*_{m}, with a standard deviation of one day:

$$\delta (t)=\frac{1}{\sqrt{2\pi }}\exp \left(-\frac{{t}^{2}}{2}\right).$$

(12)

We distinguish between the effect size of each match *m* on the spread of COVID-19. For modeling the effect Δ*R*_{match,m}, associated with public or private football screenings in the home country, we introduce one base effect \(\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}\) and a match specific offset Δ*α*_{m} for a typical hierarchical modeling approach (dark orange box in Fig. 4). As prior we assume that the base effect \(\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}\) is centered around zero, which means that in principle also a negative effect of the football matches can be inferred:

$$\Delta {R}_{{{{{{{{\rm{match}}}}}}}},m}={\alpha }_{{{{{{{{\rm{prior}}}}}}}},m}\left(\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}+\Delta {\alpha }_{m}\right)$$

(13)

$$\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}} \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,5\right)$$

(14)

$$\Delta {\alpha }_{m} \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,{\sigma }_{\alpha }\right)$$

(15)

$${\sigma }_{\alpha } \sim {{{{{{{\rm{HalfNormal}}}}}}}}\;\left(5\right).$$

(16)

*α*_{prior,m} is the *m*-th element of the vector that encodes the prior expectation of the effect of a match on the reproduction number. If a country participated in a match, the entry is 1 and otherwise 0. The robustness of the results with respect to the hyperprior *σ*_{α} is explored in Supplementary Fig. S17.

For Supplementary Fig. S9, we expand the model by including the effect of infections happening in stadiums and in the vicinity of it as well as during travel towards the venue of the match. In detail, we add to the football related additive reproduction number (Eq. (11)) an additive effect Δ*R*_{stadium,m}:

$$\Delta {R}_{{{{{{{{\rm{football}}}}}}}}}(t)=\mathop{\sum}\limits_{m}(\Delta {R}_{{{{{{{{\rm{match}}}}}}}},m}+\Delta {R}_{{{{{{{{\rm{stadium}}}}}}}},m})\cdot \delta ({t}_{m}-t).$$

(17)

Analogously to the gathering-related effect we apply the same hierarchy to the effect caused by hosting a match in the stadium – but change the prior of the day of the effect:

$$\Delta {R}_{{{{{{{{\rm{stadium}}}}}}}},m}={\beta }_{{{{{{{{\rm{prior}}}}}}}},m}\left(\Delta {R}_{{{{{{{{\rm{stadium}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}+\Delta {\beta }_{m}\right)$$

(18)

$$\Delta {R}_{{{{{{{{\rm{stadium}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}} \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,5\right)$$

(19)

$$\Delta {\beta }_{m} \sim {{{{{{{\mathcal{N}}}}}}}}(0,{\sigma }_{\beta })$$

(20)

$${\sigma }_{\beta } \sim {{{{{{{\rm{HalfNormal}}}}}}}}\left(5\right).$$

(21)

*β*_{prior,m} encodes whether or not a match was *hosted* by the respective country, i.e equates 1 if the match took place in the country and otherwise equates 0.

### Non-football-related reproduction number

To account for effects not related to the football matches, e.g., non-pharmaceutical interventions, vaccinations, seasonality or variants, we introduce a slowly changing reproduction number *R*_{base}(*t*), which is identical for both genders and should map all other not specifically modeled gender independent effects (left yellow box in Fig. 4):

$${R}_{{{{{{{{\rm{base}}}}}}}}}(t)={R}_{0}\exp \left(\mathop{\sum}\limits_{n}{\gamma }_{n}(t)\right)$$

(22)

$${R}_{0} \sim {{{{{{{\rm{LogNormal}}}}}}}}\left(\mu=1,\sigma=1\right)$$

(23)

This base reproduction number is modeled as a superposition of logistic change points *γ*(*t*) every 10 days, which are parameterized by the transient length of the change points *l*, the date of the change point *d* and the effect of the change point Δ*γ*_{n}. The subscripts *n* denotes the discrete enumeration of the change points:

$${\gamma }_{n}(t)=\frac{1}{1+{e}^{-4/{l}_{n}\cdot (t-{d}_{n})}}\cdot \Delta {\gamma }_{n}$$

(24)

$$\Delta {\gamma }_{n} \sim {{{{{{{\mathcal{N}}}}}}}}(0,{\sigma }_{\Delta \gamma })\,\forall n$$

(25)

$${\sigma }_{\Delta \gamma } \sim {{{{{{{\rm{HalfCauchy}}}}}}}}\left(0.5\right)$$

(26)

$${l}_{n}=\log \big(1+\exp ({l}_{n}^{{{{\dagger}}} })\big)$$

(27)

$${l}_{n}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\left(4,1\right)\,\forall n\,({{{{{\rm{unit}}}}}}\;{{{{{\rm{is}}}}}}\;{{{{{\rm{days}}}}}})$$

(28)

$${d}_{n}=2{7}^{{{{{{{{\rm{th}}}}}}}}}\;{{{{{{{\rm{May\;2021}}}}}}}}\;+10\cdot n+\Delta {d}_{n}\,{{{{{{{\rm{for}}}}}}}}\; n=0,\ldots,9$$

(29)

$$\Delta {d}_{n} \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,3.5\right)\,\forall n\,({{{{{\rm{unit}}}}}}\;{{{{{\rm{is}}}}}}\;{{{{{\rm{days}}}}}}).$$

(30)

The idea behind this parameterization is that Δ*γ*_{n} models the change of R-value, which occurs at times *d*_{n}. These changes are then summed in Eq. (24). Change points that have not occurred yet at time *t* do not contribute in a significant way to the sum as the sigmoid function tends to zero for *t* < < *d*_{n}. The robustness of the results regarding the spacing of the change-points *d*_{n} is explored in Supplementary Fig. S14 and the robustness of the choice of the hyperprior *σ*_{Δγ} is explored in Supplementary Fig. S17.

Similarly, to account for small changes in the gender imbalance, the noise on the ratio between infections in men and women is modeled by a slowly varying reproduction number (middle yellow box in Fig. 4), parameterized by series of change points every 10 days:

$$\Delta {R}_{{{{{{{{\rm{noise}}}}}}}}}(t)=\Delta {R}_{0,{{{{{{{\rm{noise}}}}}}}}}+\bigg(\mathop{\sum}\limits_{n}{\tilde{\gamma }}_{n}(t)\bigg)$$

(31)

$$\Delta {R}_{0,{{{{{{{\rm{noise}}}}}}}}} \sim {{{{{{{\mathcal{N}}}}}}}}\left(\mu=0,\sigma=0.1\right)$$

(32)

$${\tilde{\gamma }}_{n}(t)=\frac{1}{1+{e}^{-4/{\tilde{l}}_{n}\cdot (t-{\tilde{d}}_{n})}}\cdot \Delta {\tilde{\gamma }}_{n}$$

(33)

$$\Delta {\tilde{\gamma }}_{n} \sim {{{{{{{\mathcal{N}}}}}}}}(0,{\sigma }_{\Delta \tilde{\gamma }})$$

(34)

$${\sigma }_{\Delta \tilde{\gamma }} \sim {{{{{{{\rm{HalfCauchy}}}}}}}}\left(0.2\right)$$

(35)

$${\tilde{l}}_{n}=\log \left(1+\exp ({\tilde{l}}_{n}^{{{{\dagger}}} })\right)$$

(36)

$${\tilde{l}}_{n}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\left(4,1\right)\,\forall n\,({{{{{\rm{unit}}}}}}\,{{{{{\rm{is}}}}}}\,{{{{{\rm{days}}}}}})$$

(37)

$${\tilde{d}}_{n}=2{7}^{{{{{{{{\rm{th}}}}}}}}}\,{{{{{{{\rm{May}}}}}}}}\,{{{{{{{\rm{2021}}}}}}}}+10\cdot n+\Delta {\tilde{d}}_{n}\,{{{{{{{\rm{for}}}}}}}}\,n=0,\ldots,9$$

(38)

$$\Delta {\tilde{d}}_{n} \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,3.5\right)\,\forall n\,({{{{{{{\rm{unit}}}}}}}}\,{{{{{{{\rm{is}}}}}}}}\,{{{{{{{\rm{days}}}}}}}}).$$

(39)

### Delay

Modeling the delay between the time of infection and the reporting of it is an important part of the model (blue boxes in Fig. 4); it allows for a precise identification of changes in the infection dynamics because of football matches and the reported cases. We split the delay into two different parts: First we convolved the number of newly infected people with a kernel, which delays the cases between 4 and 7 days. Second, to account for delays that occur because of the weekly structure (some people might delay getting tested until Monday if they have symptoms on Saturday or Sunday), we added a variable fraction that delays cases depending on the day of the week.

#### Constant delay

To account for the latent period and an eventual apparition of symptoms we apply a discrete convolution, a Gamma kernel, to the infected pool (right blue box in Fig. 4). The prior delay distribution *D* is defined by incorporating knowledge about the country specific reporting structure: If the reported date corresponds to the moment of the sample collection (which is the case in England, Scotland and France) or if the reported date corresponds to the onset of symptoms (which is the case in the Netherlands), we assumed 4 days as the prior median of the delay between infection and case. If the reported date corresponds to the transmission of the case data to the authorities, we assumed 7 days as prior median of the delay. If we do not know what the published date corresponds to, we assumed a median \({\bar{D}}_{{{{{{{{\rm{country}}}}}}}}}\) of 5 days, with a larger prior standard deviation \({\sigma }_{\log \bar{D}}\) (see Table 4):

$${C}_{g}^{{{{\dagger}}} }\left(t\right)=\mathop{\sum }\limits_{\tau=1}^{T}{E}_{g}(t-\tau )\cdot {{{{{{{\rm{Gamma}}}}}}}}(\tau ;\mu=D,\sigma={\sigma }_{D})$$

(40)

$$D=\log \left({D}^{{{{\dagger}}} }\right)$$

(41)

$${D}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\big(\mu=\exp \big({\bar{D}}_{{{{{{{{\rm{country}}}}}}}}}\big),\sigma={\sigma }_{\log \bar{D}}\big)$$

(42)

$${\sigma }_{D} \sim {{{{{{{\mathcal{N}}}}}}}}\big(\mu=0.2\cdot {\bar{D}}_{{{{{{{{\rm{country}}}}}}}}},\sigma=0.08\cdot {\bar{D}}_{{{{{{{{\rm{country}}}}}}}}}\big).$$

(43)

Here, Gamma represents the delay kernel. We obtain a delayed number of infected persons \({C}_{g}^{{{{\dagger}}} }\) by delaying the newly infected number of persons *I*_{g}(*t*) of gender *g* from Eq. (1). The robustness of the choice of the width of the delay kernel *σ*_{D} is explored in Supplementary Fig. S17.

#### Weekday-dependent delay

Because of the different availability of testing resources during a week, we further delay a fraction of persons, depending on the day of the week (left blue box in Fig. 4). We model the fraction *η*_{t} of delayed tests on a day *t* in a recurrent fashion, meaning that if a certain fraction gets delayed on Saturday, these same individuals can still get delayed on Sunday (Eq. (44)). The fraction *η*_{t} is drawn separately for each individual day. However, the prior is the same for certain days of the week *d* (Eq. (45)): we assume that few tests get delayed on Tuesday, Wednesday, and Thursday, using a prior with mean 0.67% (Eq. (48)), whereas we assume that more tests might be delayed on Monday, Friday, Saturday and Sunday. Hence compared to \({C}_{g}^{{{{\dagger}}} }\), we obtain slightly more delayed numbers of cases \({\hat{C}}_{g}\), which now include a weekday-dependent delay:

$${\hat{C}}_{g}\left(t\right)=\left(1-{\eta }_{t}\right)\cdot \left({C}_{g}^{{{{\dagger}}} }\left(t\right)+{\eta }_{t-1}{\hat{C}}_{g}\left(t-1\right)\right)\;{{{{{{{\rm{with}}}}}}}}\;{\hat{C}}_{g}\left(0\right)={C}_{g}^{{{{\dagger}}} }\left(0\right)$$

(44)

$${\eta }_{t} \sim {{{{{{{\rm{Beta}}}}}}}}\left(\alpha=\frac{{r}_{d}}{e},\beta=\frac{1-{r}_{d}}{e}\right)\;{{{{{{{\rm{with}}}}}}}}\;d=\,{{{{{\rm{Monday}}}}}}\,,…,\,{{{{{\rm{Sunday}}}}}}\,$$

(45)

$${r}_{d}={{{{{{{\rm{sigmoid}}}}}}}}\left({r}_{d}^{{{{\dagger}}} }\right)$$

(46)

$${r}_{d}^{{{{\dagger}}} }={r}_{{{{{{{{\rm{base}}}}}}}},d}^{{{{\dagger}}} }+\Delta {r}_{d}^{{{{\dagger}}} }$$

(47)

$${r}_{{{{{{{{\rm{base}}}}}}}},d}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\left(-5,1\right)\;{{{{{{{\rm{for}}}}}}}}\;d=\,{{{{{\rm{Tuesday}}}}}},\,{{{{{\rm{Wednesday}}}}}},\,{{{{{\rm{Thursday}}}}}}$$

(48)

$${r}_{{{{{{{{\rm{base}}}}}}}},d}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\left(-3,2\right)\;{{{{{{{\rm{for}}}}}}}}\;d=\,{{{{{\rm{Friday}}}}}},\,{{{{{\rm{Saturday}}}}}},\,{{{{{\rm{Sunday}}}}}},\,{{{{{\rm{Monday}}}}}}$$

(49)

$$\Delta {r}_{d}^{{{{\dagger}}} } \sim {{{{{{{\mathcal{N}}}}}}}}\left(0,{\sigma }_{r}\right)$$

(50)

$${\sigma }_{r} \sim {{{{{{{\rm{HalfNormal}}}}}}}}\left(1\right)$$

(51)

$$e \sim {{{{{{{\rm{HalfCauchy}}}}}}}}\left(0.2\right).$$

(52)

The parameter *r*_{d} is defined such that it models the mean of the Beta distribution of Eq. (45), whereas *e* models the scale of the Beta distribution. *r*_{d} is then transformed to an unbounded space by the sigmoid \(f\left(x\right)=\frac{1}{1+\exp \left(-x\right)}\) (Eq. (46)). This allows to define the hierarchical prior structure for the different weekdays. We chose the prior of \({r}_{{{{{{{{\rm{base}}}}}}}},d}^{{{{\dagger}}} }\) for Tuesday, Wednesday, and Thursday such that only a small fraction of cases are delayed during the week. The chosen prior in Eq. (48) corresponds to a 2.5th and 97.5th percentile of *r*_{d} of [0%; 5%]. For the other days (Friday, Saturday, Sunday, Monday), the chosen prior leaves a lot of freedom for inferring the delay. Equation (49) corresponds to a 2.5th and 97.5th percentile of *r*_{d} of [0%; 72%]. The robustness of the other priors *σ*_{r} and *e* is explored in Supplementary Fig. S17.

### Likelihood

Next we want to define the goodness of fit of our model to the sample data. The likelihood of that is modeled by a Student’s *t*-distribution, which allows for some outliers because of its heavier tails compared to a Normal distribution (green box in Fig. 4). The error of the Student’s *t*-distribution is proportional to the square root of the number of cases, which corresponds to the scaling of the errors in a Poisson or Negative Binomial distribution:

$${C}_{g}(t) \sim {{{{{{{{\rm{StudentT}}}}}}}}}_{\nu \!=\! 4}\left(\mu={\hat{C}}_{g}(t),\sigma=\kappa \sqrt{{\hat{C}}_{g}(t)+1}\right)$$

(53)

$$\kappa \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\sigma=30).$$

(54)

Here *C*_{g}(*t*) is the measured number of cases in the population of gender *g* as reported by the respective health authorities, whereas \({\hat{C}}_{g}(t)\) is the modeled number of cases (Eq. (44)). The robustness of the prior *κ* is explored in Supplementary Fig. S17.

### Average effect across countries

In order to calculate the mean effect size across countries (Fig. 1b, c), we average the individual effects of each country. To be consistent in our approach, we build an hierarchical Bayesian model accounting for the individual uncertainties of each country estimated from the width of the posterior distributions. As effect size, we use the fraction of primary cases associated with football matches during the championship. Then our estimated mean effect size \({\hat{I}}_{g}\) across all countries *c* (except the Netherlands) for the gender *g* is inferred with the following model:

$${\hat{I}}_{g} \sim {{{{{{{\rm{Normal}}}}}}}}(\mu=0,\sigma=2)\,\,\,{{{{{{{\rm{with}}}}}}}}\,\,g=\{{{{{{{{\rm{male}}}}}}}},\,{{{{{{{\rm{female}}}}}}}}\}$$

(55)

$${\tau }_{g} \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10)$$

(56)

$${I}_{c,g}^{{{{\dagger}}} } \sim {{{{{{{\rm{Normal}}}}}}}}(\mu={\hat{I}}_{g},\sigma={\tau }_{g})$$

(57)

$${\hat{\sigma }}_{c,g} \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10)$$

(58)

$${I}_{s,c,g} \sim {{{{{{{{\rm{StudentT}}}}}}}}}_{\nu=4}\left(\mu={I}_{c,g}^{{{{\dagger}}} },\sigma={\hat{\sigma }}_{c,g}\right).$$

(59)

The estimated effect size of each country (the fraction of primary cases) is denoted by \({I}_{c,g}^{{{{\dagger}}} }\) and the effect size of individual samples *s* from the posterior of the main model is denoted by *I*_{s,c,g}.

We applied a similar hierarchical model but without gender dimensions and with slightly different priors to calculate the average mean match effect \(\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}\) (Fig. 1a). Here by reusing the same notation:

$$\hat{I} \sim {{{{{{{\rm{Normal}}}}}}}}(\mu=0,\sigma=10)$$

(60)

$$\tau \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10)$$

(61)

$${I}_{c}^{{{{\dagger}}} } \sim {{{{{{{\rm{Normal}}}}}}}}(\mu=\hat{I},\sigma=\tau )$$

(62)

$${\hat{\sigma }}_{c} \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10)$$

(63)

$$\Delta {R}_{{{{{{{{\rm{match}}}}}}}},c,s}^{{{{{{{{\rm{mean}}}}}}}}} \sim {{{{{{{{\rm{StudentT}}}}}}}}}_{\nu=4}\left(\mu={I}_{c}^{{{{\dagger}}} },\sigma={\hat{\sigma }}_{c}\right),$$

(64)

where \(\Delta {R}_{{{{{{{{\rm{match}}}}}}}},c,s}^{{{{{{{{\rm{mean}}}}}}}}}\) are the posterior samples from the main model runs of the \(\Delta {R}_{{{{{{{{\rm{match}}}}}}}}}^{{{{{{{{\rm{mean}}}}}}}}}\) variable.

### Calculating the primary and subsequent cases

We compute the number of primary football related infected *I*_{primary,g}(*t*) as the number of infections happening at football related gathering. The percentage of primary cases *f*_{g} is then computed by dividing by the total number of infected *I*_{g}(*t*).

$${I}_{{{{{{{{\rm{primary}}}}}}}},g}(t)=\frac{S(t){R}_{{{{{{{{\rm{football}}}}}}}}}(t)}{N}\mathop{\sum}\limits_{{g}^{{\prime} }}{I}_{{g}^{{\prime} }}(t){{{{{{{{\bf{C}}}}}}}}}_{{{{{{{{\rm{football}}}}}}}},{g}^{{\prime} },g}$$

(65)

$${f}_{g}=\mathop{\sum}\limits_{t}\frac{{I}_{{{{{{{{\rm{primary}}}}}}}},g}(t)}{{I}_{g}(t)}\,t\in [11{{{{{{{\rm{th}}}}}}}}\,{{{{{{{\rm{June}}}}}}}},\,31{{{{{{{\rm{st}}}}}}}}\,{{{{{{{\rm{July}}}}}}}}]$$

(66)

To obtain the subsequent infected *I*_{subsequent,g}(*t*), we subtract infected obtained from a hypothetical scenario without football games *I*_{none,g}(*t*) from the total number of infected.

$${I}_{{{{{{{{\rm{subsequent}}}}}}}},g}={I}_{g}(t)-{I}_{{{{{{{{\rm{primary}}}}}}}},g}(t)-{I}_{{{{{{{{\rm{none}}}}}}}},g}(t)$$

(67)

Specific, we consider a counterfactual scenario, where we sample from our model leaving all inferred parameters the same expect for the football related reproduction number *R*_{football,g}(*t*), which we set to zero.

### Sampling

The sampling was done using PyMC3^{49}. We use a NUTS sampler^{50}, which is a Hamiltonian Monte-Carlo sampler. As random initialization often leads to some chains getting stuck in local minima, we run 32 chains for 500 initialization steps and chose the 8 chains with the highest unnormalized posterior to continue tuning and sampling. We then let these chains tune for additional 2000 steps and draw 4000 samples. The maximum tree depth was set to 12.

The quality of the mixing was tested with the \({{{{{{{\mathcal{R}}}}}}}}\)-hat measure^{51} (Table 5). The \({{{{{{{\mathcal{R}}}}}}}}\)-hat value measures how well chains with different starting values mix; optimal are values near one. We measured twice: (1) for all variables and (2) for the subset of variables encoding the reproduction number. Variables modeling the reproduction number are the central part of our model (lower half of Fig. 4). As such, we are satisfied if the \({{{{{{{\mathcal{R}}}}}}}}\)-hat values is sufficiently good for these variables, which it is (≤1.07). The high \({{{{{{{\mathcal{R}}}}}}}}\)-hat when calculated over all variables is mostly due to the weekday-dependent delay, which we assume is not central to the results we are interested in.

### Robustness tests

In the base model for each country, we only consider the matches in which the respective country participated. It is reasonable to ask whether the matches of foreign countries occurring in local stadium have an effect on the case numbers, caused by transmission in and around the stadium and related travel. To investigate this question we ran a model with an additional parameter (in-country effect) associating the case numbers to the in-country matches (Eq. (17)). In some countries the in-country effect parameter and the original fan gathering effect are covariant, as a large number of matches are played by the country at home, whereas in other countries the additional parameter had no significant effect (Supplementary Fig. S9).

We checked that the inferred fractions of football related cases are robust against changes in the priors of the width *σ*_{D} of the delay parameter *D* (see Supplementary Fig. S13) and the intervals of change points of *R*_{base} (Supplementary Fig. S14). The results are also, to a very large degree, robust against a more uninformative prior on the fraction of female participants in the fan activities dominating the additional transmission *ω*_{gender} (Supplementary Fig. S15). To reduce CO_{2} emissions, we performed fewer runs for these robustness tests: We only ran the models for which the original posterior distributions might indicate that one could find a significant effect. Each country required eight cores for about 10 days to finish sampling.

In order to further test the robustness of the association between individual matches and infections, we varied the dates of the matches, i.e., shifted them forward and backward in time. The results for the twelve countries under investigation are shown in Supplementary Figs. S10 and S12. In the countries where sensitivity to a championship-related case surge exists, a stable association is obtained for shifts by up to 2 days. As shown for the examples of England and Scotland in Fig. S19, such a shift is compensated by the model by a complementary adjustment of the delay parameter *D*. For larger shifts, the model might associate other matches to the increase of cases, as matches took place approximately every 4 days.

### Correlations

In order to calculate the correlation between the effect size and various explainable variables (Fig. 3 and Supplementary Figs. S4 and S6), we built a Bayesian regression model, using the previously computed posterior samples from the individual runs of each country. Let us denote the previously computed cumulative primary and subsequent cases related to the Euro 2020 by *Y*_{s,c}, for every sample *s* and analyzed country *c*, and the explainable variable from auxiliary data by *X*_{c}. We used a simple linear model to check for pairwise correlation between *Y*_{s,c} and *X*_{c}:

$${\hat{Y}}_{c}={\beta }_{0}+{\beta }_{1}{\hat{X}}_{c}$$

(68)

$${\beta }_{0} \sim {{{{{{{\rm{Normal}}}}}}}}(\mu=0,\sigma=10000)$$

(69)

$${\beta }_{1} \sim {{{{{{{\rm{Normal}}}}}}}}(\mu=0,\sigma=100000)$$

(70)

We used every sample *s* obtained from the main analysis to incorporate uncertainties on the variable *Y*_{c} from our prior results. The auxiliary data *X*_{c} might also have errors *ϵ*_{c}, which we model using a Normal distribution. Additionally, we allow our estimate for the effect size \({\hat{Y}}_{c}\) to have an error for each country *c* in a typical hierarchical manner and choose uninformative priors for the scale hyper-parameter *τ*. As prior we considered 10k a reasonable choice for the *β* parameter as our data *X*_{c} is normally in a range multiple magnitudes smaller:

$${\hat{X}}_{c} \sim {{{{{{{\rm{Normal}}}}}}}}(\mu={X}_{c},\sigma={\epsilon }_{c})\,\forall c$$

(71)

$$\tau \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10000)$$

(72)

$${Y}_{c}^{{{{\dagger}}} } \sim {{{{{{{\rm{Normal}}}}}}}}(\mu={\hat{Y}}_{c},\sigma=\tau )\,\forall c.$$

(73)

Again using uninformative priors for the error, the likelihood to obtain our results given the individual country effect size estimate \({Y}_{c}^{{{{\dagger}}} }\) from the hierarchical linear model is

$${Y}_{s,c} \sim {{{{{{{{\rm{StudentT}}}}}}}}}_{\nu=4}\left(\mu={Y}_{c}^{{{{\dagger}}} },\sigma={\hat{\sigma }}_{c}\right)\,with$$

(74)

$${\hat{\sigma }}_{c} \sim {{{{{{{\rm{HalfCauchy}}}}}}}}(\beta=10000).$$

(75)

Therefore, our regression model includes the “measurement error” \({\hat{\sigma }}_{c}\), which models the heteroscadistic effect size of every country, and an additional model error *τ* which models the homoscedastic deviations of the country effect sizes from the linear model. In the plots, we plot the regression line \({\hat{Y}}_{c}\) with its shaded 95% CI, and data points (\({\hat{X}}_{c}\), \({Y}_{c}^{{{{\dagger}}} }\)) where the whiskers correspond to the one standard deviation, modeled here by *ϵ*_{c} and \({\hat{\sigma }}_{c}\).

The coefficient of determination, *R*^{2}, is calculated following the procedure suggested by Gelman and colleagues^{52}. Their *R*^{2} measure is intended for Bayesian regression models as it notably uses the expected data variance given the model instead of the observed data variance. For our model, it is defined as

$${R}^{2}=\frac{{{{{{{{\rm{Explained\;variance}}}}}}}}}{{{{{{{{\rm{Residual\;variance}}}}}}}}+{{{{{{{\rm{Explained\;variance}}}}}}}}}=\frac{\frac{1}{{n}_{c}-1}{\sum }_{c}{\hat{Y}}_{c}^{2}}{{\tau }^{2}+\frac{1}{{n}_{c}-1}{\sum }_{c}{\hat{Y}}_{c}^{2}},$$

(76)

where *n*_{c} is the number of countries. With this formula, one obtains the posterior distribution of *R*^{2} by evaluating it for every sample.

As auxiliary data, we used:

- 1.
*Mobility data*: We use the mobility index*m*_{c,t}provided by the “Google COVID-19 Community Mobility Reports”^{53}for each country*c*at day*t*during the Euro 2020 (*t*∈ [June 11 2021, July 11 2021]), where*N*denotes the number of days in the interval. The error is the standard deviation of the mean:$${X}_{c}=\frac{1}{N}\mathop{\sum}\limits_{t}{m}_{c,t}$$

(77)

$${\epsilon }_{c}=\sqrt{\frac{1}{{N}^{2}}\mathop{\sum}\limits_{t}{({m}_{c,t}-{X}_{c})}^{2}}$$

(78)

- 2.
*Reproduction number*: We use the base reproduction number \({R}_{{{{{{{{\rm{pre}}}}}}}},c}\) for each country*c*as inferred from our model 2 weeks prior to the Euro 2020 (*t*∈ [May 28 2021, June 11 2021]).$${X}_{c}=\frac{1}{N}\mathop{\sum}\limits_{t}{R}_{{{{{{{{\rm{pre}}}}}}}},c}(t)$$

(79)

$${\epsilon }_{c}=\sqrt{\frac{1}{N}\mathop{\sum}\limits_{t}{({R}_{{{{{{{{\rm{pre}}}}}}}},c}(t)-{X}_{c})}^{2}}$$

(80)

- 3.
*Cumulative reported cases*: From the daily reported cases*C*(*t*) two weeks prior to the Euro 2020 (*t*∈ [May 28 2021, June 11 2021]), we computed the cumulative reported cases normalized by the number of inhabitants*p*_{c}in each country*c*. Note: We also used reported cases without gender assignment here.$${X}_{c}=\frac{{\sum }_{t}C(t)}{{p}_{c}}$$

(81)

$${\epsilon }_{c}=\overrightarrow{0}$$

(82)

- 4.
*Potential for COVID-19 spread*: As for the cumulative cases we used the daily reported cases*C*(*t*) two weeks prior to the Euro 2020 (*t*∈ [May 28 2021, June 11 2021]), and we computed the cumulative reported cases normalized by the number of inhabitants*p*_{c}in each country*c*. Furthermore, we used the base reproduction number*R*_{base}(*t*) 2 weeks prior to the Euro 2020, as well as the duration of a country participating in the championship*T*_{c}(Table S5) to compute the potential for spread:$${N}_{0}=\frac{{\sum }_{t}C(t)}{{p}_{c}}$$

(83)

$${X}_{c}={N}_{0}\cdot \frac{{\sum }_{t}{R}_{{{{{{{{\rm{pre}}}}}}}},c}^{{T}_{c}/4}(t)}{N}$$

(84)

$${\epsilon }_{c}=\overrightarrow{0}$$

(85)

- 5.
*Proxy for popularity*: To represent popularity of the Euro 2020 in country*c*, we used the union of the number of matches played by each country*n*_{match,c}and the number of matches hosted by each country*n*_{hosted,c}(Table S5). By “union” we mean the sum without the overlap, i.e., we take the sum of these numbers and subtract the number of home matches*n*_{home,c}$${X}_{c}={n}_{{{{{{{{\rm{match}}}}}}}},c}+{n}_{{{{{{{{\rm{hosted}}}}}}}},c}-{n}_{{{{{{{{\rm{home}}}}}}}},c}$$

(86)

$${\epsilon }_{c}=\overrightarrow{0}$$

(87)

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.