Main

Flavonoids are (poly)phenolic compounds that occur abundantly in the human diet1. Sources are quite diverse, ranging from fruits and vegetables to nuts and legumes, as well as wines and teas2. A wide range of flavonoids are found in foods and beverages, and these can be classified into several subclasses including flavonols, anthocyanins, flavan-3-ols, flavanones and flavones1. Following their consumption and absorption, flavonoids—through their downstream metabolites—have the potential to improve human health1. Since the early 1990s3, numerous prospective cohort studies have observed that a higher habitual consumption of several flavonoid subclasses is associated with a lower risk of all-cause mortality4,5,6, cardiovascular disease (CVD)2,7, type 2 diabetes (T2DM)8,9, cancer10, respiratory disease11 and neurodegenerative disease12,13. Due to variations in their chemical structure, bioavailability and metabolism, different flavonoid compounds exert a range of biological effects14. Among these, some of their most widely recognized activities include anti-inflammatory and antioxidative stress effects, which are fundamental mechanisms underlying the development and progression of many chronic diseases15. Additionally, flavonoids exhibit more specific protective functions, including promoting endothelial integrity and function16, crucial for cardiovascular health, and anti-senescence effects17 that may delay age-related tissue deterioration, in addition to antiproliferative activities18 that contribute to cancer prevention. These represent just some examples of the many mechanisms through which flavonoids exert their beneficial effects across diverse chronic conditions1,15.

Because different flavonoid compounds can exert different biological benefits, we hypothesized that consuming a higher diversity of dietary flavonoids may afford better health protection than consuming a low diversity of flavonoids. However, to date, no prospective studies have considered the impact of consuming a higher diversity of dietary flavonoids on the risk of all-cause mortality or major chronic disease. In several research fields, including in the assessment of gut microbial diversity19,20,21, the diversity of a system can be calculated using Shannon’s equation for entropy22 converted to Hill’s effective numbers23,24. Using this approach, we can determine the diversity of flavonoid intake, accounting for both the variation (or number of different flavonoids consumed) and their distribution of intake (wherein those flavonoids consumed in smaller amounts relative to others are weighted less). The aims of this study, therefore, were: (1) to estimate diversity of flavonoid intake across levels of total dietary flavonoids, individual flavonoid subclasses and flavonoid-rich foods, and then examine their associations with the risk of all-cause mortality and incidence of chronic disease including CVD, T2DM, total cancer, respiratory disease and neurodegenerative disease; and (2) to assess the potential benefits of consuming both a higher quantity and a wider diversity of flavonoid intake on the risk of these outcomes in participants from the UK Biobank.

Results

Cohort characteristics

In this cohort of 124,805 UK adults, aged ≥40 yr (median [Q1–Q3], 60.2 [53.0–65.2] yr; Q, quintile), ~56% (n = 69,674) were female and most were non-smokers (>90%; n = 115,961) (Table 1). Around 60% (n = 75,111) of participants were either overweight or obese (Table 1). At baseline, ~4% (n = 5,162) had diabetes (type 1 or 2), ~25% (n = 32,877) were hypertensive and ~15% (n = 19,827) had high cholesterol. Over a range of 8.7–10.6 median years of follow-up for the different outcomes (maximum, 11.8 yr), there were 5,780 deaths, 6,920 CVD cases, 3,421 T2DM cases, 9,441 cancer cases, 12,945 respiratory disease cases and 1,921 cases of neurodegenerative disease. Participants had a median flavonoid intake of 792 mg d−1 (range, 0.05–3,611 mg d−1), which was comprised of a wide diversity of an effective (Hill) number of 9.4 flavonoid types per day (range, 1.8–19.0) (Fig. 1). Flavan-3-ols were the main subclass contributing to total flavonoid intake, accounting for 87% of consumption. Anthocyanins, flavonols and flavanones each contributed ~4.5% of total flavonoid intake; <1% was from flavones. Tea (black and green) was the main source of total flavonoid intake (67%), followed by apples (5.8%), red wine (4.7%), grapes (1.9%), berries (1.9%), dark chocolate (1.2%), oranges and satsumas (1.1%) and orange juice (1.1%), which collectively comprised ~85% of total intake; numerous other food sources contributed to the remaining intake (Fig. 1 and Supplementary Table 1). Overall, those with a higher quantity of flavonoid intake tended to have a lower diversity (r = −0.44), although this varied for individual subclasses (Fig. 1 and Supplementary Table 2). Compared to participants with the lowest diversity, those with the highest diversity had a better distribution of flavonoid intake, consuming more anthocyanins (for example, malvidin, cyanidin), flavanones (for example, hesperidin, naringenin) and proanthocyanidins (for example, dimers to polymers) relative to thearubigin, a compound derived exclusively from tea, and which dominated intake in those with the least diverse consumption (Fig. 1 and Supplementary Table 3). Analysis of flavonoid-rich foods showed those with the lowest diversity consumed mostly tea, and those with the highest diversity consumed relatively more berries, apples, grapes, red wine and oranges (Supplementary Table 4). Those with the highest flavonoid diversity were more likely to be female, older, have a lower body mass index (BMI), be more physically active and have a higher education and were less likely to be current smokers (Table 1).

Table 1 Baseline characteristics of study population
Full size table
Fig. 1: Flavonoid intake in the UK Biobank.
figure 1

a, Composition of flavonoid intake. b, Major dietary contributors to flavonoid intake, showing the topmost contributors to intake only; blank spaces up to 100% represent other smaller contributors that are not shown. c, Two-sided Pearson correlation between quantity and diversity of flavonoid intake. d, Diversity of flavonoid consumption among participants with the most (Q5) and least (Q1) diverse intakes. In d, the bar charts are matched for quantity of flavonoid intake (1,000 mg d−1) and show the average abundance (% intake) of each flavonoid per day. The dotted areas represent each diet, where each circle is an individual flavonoid and each colour is a different flavonoid (corresponding to the colours and distribution on the bar charts). Data from participants with ≥2 Oxford WebQ dietary questionnaires (n = 124,805).

Full size image

Total flavonoids, all-cause mortality and chronic disease

Following mutual adjustment, and after accounting for sociodemographic, lifestyle, dietary and medical risk factors, both the quantity and diversity of total dietary flavonoid intake were independently associated with a lower risk of all-cause mortality and several chronic diseases (model 5; Fig. 2). Holding the quantity of flavonoid intake constant, participants with the highest (compared to lowest) diversity (Q5 versus Q1), characterized as consuming an additional 6.7 effective flavonoid types per day, had a 14% lower risk of all-cause mortality (hazard ratio (HR) (95% confidence interval (CI)), 0.86 (0.78, 0.95)), a 10% lower risk of CVD (0.90 (0.82, 0.98)), a 20% lower risk of T2DM (0.80 (0.70, 0.91)), an 8% lower risk of total cancer (0.92 (0.85, 0.99)) and an 8% lower risk of respiratory disease (0.92 (0.86, 0.98)); no association was observed for neurodegenerative disease (model 5; Table 2 and Fig. 2). For quantity of flavonoid intake, when holding diversity constant, participants in the second quintile (median intake, ~500 mg d−1), were at a 16% (0.84 (0.78, 0.92)), 9% (0.91 (0.84, 0.98)), 12% (0.88 (0.79, 0.98)) and 13% (0.87 (0.83, 0.92)) lower risk of all-cause mortality, CVD, T2DM and respiratory disease, respectively, compared with those in Q1 (median intake, ~230 mg d−1 (model 5; Table 2)). At higher levels of exposure, these HRs remained relatively constant, except for T2DM, for which the lowest risks were observed for those in Q5 (0.75 (0.66, 0.84)). The lowest risks for cancer and neurodegenerative diseases were seen in Q5 (median intake, ~1,400 mg d−1), reaching an 8% (0.92 (0.85, 0.99)) and 20% (0.80 (0.68, 0.94)) lower disease risk, respectively, compared with Q1 (model 5; Table 2 and Fig. 2). In general, progressive adjustment for participant demographics (model 2), lifestyle (model 3), dietary (model 4) and medical risk factors (model 5) attenuated, but did not materially alter, the associations (Table 2). We then tested for interactions between quantity and diversity of flavonoid intake (across the aforementioned outcomes), and although no interactions were observed (Pinteraction all >0.05 (model 5)), the independent prediction of both quantity and diversity of flavonoid intake with all-cause mortality and several chronic diseases still suggests that higher intakes of both is associated with greater disease risk reduction compared with higher intakes of either aspect alone.

Fig. 2: Quantity and diversity of dietary flavonoid intake and risk of all-cause mortality and chronic disease.
figure 2

HRs (95% CI) for all-cause mortality and major chronic disease according to the quantity and diversity of dietary flavonoid intake (in quintiles). HRs are from Cox proportional-hazards models using age as the underlying timescale. Quantity of flavonoid intake is mutually adjusted for diversity of flavonoid intake and vice versa. Further adjustments are made for covariates in model 5 including sex, region of residence, number of dietary assessments, BMI, smoking status, physical activity, alcohol intake, education, ethnicity, socioeconomic status plus intakes of red and processed meat, refined grains, whole grains, sugary drinks, coffee, saturated fatty acids, sodium and dietary energy, and history of diabetes (type 1 or 2; not adjusted in T2DM analysis), hypertension and hypercholesterolaemia and for analysis of all-cause mortality, further adjustments were made for prevalent CVD, cancer, respiratory disease, and neurodegenerative disease at baseline. Corresponding sample sizes, event rates and additional details are provided in Table 2.

Full size image
Table 2 Quantity and diversity of dietary flavonoid intake associate with risk of all-cause mortality and incidence of major chronic disease
Full size table

Flavonoid subclasses, all-cause mortality and chronic disease

Minimally (model 1) and multivariable adjusted models (models 2–5) for diversity of individual flavonoid subclasses and the risk of all-cause mortality and major chronic disease are presented in Supplementary Table 5. Overall, following adjustment for demographic and lifestyle factors (model 3), further adjustments for diet (model 4) and medical history (model 5) did not substantially alter the findings. In the fully adjusted model (model 5), the wider diversities of intake of compounds within the flavan-3-ol and flavanone subclasses were each associated with a lower risk of all-cause mortality, independent of absolute intake; the HR remained stable after both Q4 and Q2 respectively (HR (95% CI) for flavan-3-ols Q4 versus Q1, 0.91 (0.83, 0.99); flavanones Q2 versus Q1, 0.90 (0.83, 0.98); model 5; Table 3 and Supplementary Table 5). When the corresponding model terms for quantity of consumption were examined, only flavan-3-ol intake was associated with lower risk of all-cause mortality; the HRs were relatively stable beyond Q2 (Q2 versus Q1, 0.85 (0.78, 0.93); Supplementary Table 6). The data for chronic disease outcomes reveal that, compared with lower intakes (Q1), significant associations mostly emerged in those with the widest diversity at and above Q4; for flavan-3-ols there was a 13% (Q5, 0.87 (0.77, 0.98)) and an 8% (Q4, 0.92 (0.86, 0.99)) lower risk of T2DM and cancer; for flavanone there was a 7% (Q5, 0.93 (0.88, 0.99)) and a 6% (Q5, 0.93 (0.87, 0.99)) lower risk of cancer and respiratory disease; and for flavones there was a 13% (Q4, 0.89 (0.80, 0.99)) and an 18% (Q5, 0.82 (0.71, 0.95)) lower risk of T2DM and neurodegenerative disease, respectively (model 5; Table 3 and Supplementary Table 5). When we examined the models for the subclasses showing beneficial associations for diversity, associations for quantity of intake emerged with T2DM wherein participants at and above Q3 for flavones and Q4 for flavan-3-ols were at a lower risk (flavones Q3, 0.89 (0.80, 0.99); flavan-3-ols Q4, 0.85 (0.77, 0.95); model 5; Supplementary Table 6). No interactions were observed between quantity and diversity of intake of any subclass with any outcome (Pinteraction all >0.05 (model 5)).

Table 3 Diversity of intake of flavonoid-rich foods and individual flavonoid subclasses associate with risk of all-cause mortality and incidence of major chronic disease
Full size table

Flavonoid-rich foods, all-cause mortality and chronic disease

Minimally (model 1) and multivariable adjusted models (models 2–5) for diversity of flavonoid-rich foods are presented in Supplementary Table 5. Adjustment beyond demographic and lifestyle factors (model 3) for participant diet (model 4) and medical history (model 5) did not appreciably affect the associations. In the fully adjusted model, when holding the quantity of intake constant, the risk of all-cause mortality was progressively lower among those with a higher diversity of flavonoid-rich food intake; compared with an effective serving of 1.3, those with 2, 2.7, 3.4 and 4.5 different effective servings were associated with an 8% (0.92 (0.85, 1.00)), 10% (0.91 (0.84, 0.99)), 13% (0.88 (0.81, 0.96)) and 16% (0.84 (0.76, 0.91)) lower risk of all-cause mortality, respectively (model 5; Table 3 and Supplementary Table 5). Holding the diversity of intake constant, there was no clear association for consuming a higher quantity of flavonoid-rich foods (model 5; Supplementary Table 6). Examination of chronic disease outcomes revealed that those with the highest (versus lowest) diversity of flavonoid-rich food intake had an 8% lower risk of respiratory disease (0.92 (0.87, 0.98)); there were no compelling associations with other endpoints (model 5; Table 3 and Supplementary Table 5). Holding diversity constant, a higher quantity of flavonoid-rich foods, beyond Q2 (Q2, 0.87 (0.78, 0.97)), associated with a lower risk of T2DM; there were no compelling associations with other endpoints (model 5; Supplementary Table 6). No interactions (Pinteraction all >0.05 (Model 5)) were observed between quantity and diversity of flavonoid-rich food consumption.

Sensitivity analyses

Neither removing energy intake nor adjusting for a healthy plant-based diet score substantively altered the HR (sensitivity analyses 1 and 2; Supplementary Tables 5 and 7). Excluding participants who had an event in the first two years of follow-up tended to marginally strengthen the relationships between our exposures and outcomes (sensitivity analysis 3; Supplementary Tables 5 and 7).

Discussion

In >120,000 UK Biobank participants, we observed that participants who consumed the widest diversity of dietary flavonoids, flavonoid-rich foods and/or specific flavonoid subclasses had a lower risk of all-cause mortality and incidence of cause-specific chronic disease, ranging from cardiometabolic disorders (including CVD and T2DM) to other major conditions, such as cancer, respiratory disease and neurodegenerative disease. We also found that both the quantity and diversity of total dietary flavonoids are independent predictors of mortality and several chronic diseases, suggesting that consuming a higher quantity and wider diversity is better for longer-term health than higher intakes of either component alone.

Our findings highlight the importance of consuming a diverse range of flavonoids for the management of chronic disease risk, which, from a public health perspective, provides support for consuming a variety of flavonoid-rich foods such as green and/or black tea, berries, apples, oranges and grapes25. This fits with our current understanding that different flavonoid compounds can exert different biological benefits1,26,27,28. For example, in the regulation of blood pressure alone, compounds from each subclass appear to act on a variety of different mechanisms, increasing nitric oxide bioavailability, reducing endothelial cell oxidative stress and modulating vascular ion channel activity29,30. Indeed, the health-promoting effects of flavonoids are wide ranging, with multiple flavonoid compounds implicated in multiple biological activities, including, among others, inhibiting platelet aggregation, lowering low-density lipoprotein oxidation, mitigating atherosclerotic lesion formation, improving insulin sensitivity indices, inducing antioxidant defences, and reducing inflammatory responses in addition to specific anticarcinogenic actions, such as an ability to induce apoptosis in tumour cells, inhibit cancer cell proliferation, and prevent angiogenesis and tumour cell invasion15,28. As a result, the collective actions of multiple flavonoids appear to lead to greater health protection compared with single subclasses or compounds.

We found that consuming both a higher quantity and wider diversity of dietary flavonoids appears better for longer-term health than higher intakes of either component alone. To date, epidemiological research has focused on the quantity of flavonoid intake, finding that higher consumption of several flavonoid subclasses is associated with a lower risk of several chronic diseases2,7,8,9,10,11,12,31,32. Indeed, the first proposed dietary guideline for flavonoids was released in 202233, and recommended consumption of 400–600 mg d−1 of flavan-3-ols for potential cardiometabolic health benefits. Our results suggest that future guidelines could be reframed to also consider recommending intake from a range of sources. Further studies are also ongoing to determine the environmental footprints of different flavonoid-rich foods to ensure their consumption also supports environmental sustainability and planetary health34. Moreover, our findings also align with our other recent work in which we propose a composite measure of flavonoid intake (termed the Flavodiet score) which is a sum of servings of flavonoid-rich foods6. We observed that those who had a better Flavodiet score had a lower risk of all-cause mortality6. Our current study on flavonoid diversity and health outcomes supports the Flavodiet score concept as means to promote higher intakes of flavonoids from different sources. Our analysis of diversity also complements existing analyses that evaluate associations between specific flavonoid food sources and health outcomes, which enhance the evidence base for the health benefits of specific flavonoid-rich foods35,36. However, by studying diversity specifically, our results suggest that consuming a greater variety of such sources appears better than their intakes in isolation.

To estimate flavonoid diversity, we used Shannon’s equation with Hill’s numbers22,23,24. This provides an approach to explicitly separate out and study the independent benefits of flavonoid diversity, versus quantity, for health outcomes. A fundamental feature of the Shannon equation is that it considers the most diverse diets to consist of all flavonoids consumed in equal proportions. Although this reflects a technical definition of diversity, such an intake is unlikely to occur in the real world and may not be the pattern of consumption that offers the greatest health benefits. Shannon’s equation also only permits calculation of diversity among flavonoid consumers (omitting non-consumers), and results should be interpreted within this context (although <0.01% of participants in this cohort did not consume any flavonoids). We must also consider that calculating diversity within individual subclasses does not account for diversity of other subclasses (which appears important) and that calculating diversity by way of major flavonoid-rich foods does not account for other flavonoid sources (which may potentially be major sources for some individuals). While calculating flavonoid diversity by way of total flavonoid compound intake appears to overcome these limitations, this method relies on the precision of compound intake estimates, and these estimates, given the inherent limitations of dietary assessment methods and nutrient composition databases37, are likely to be relatively crude. Nevertheless, even with these constraints, we observed a significantly lower risk of all-cause mortality and cause-specific chronic disease among those with the most (compared with the least) diverse flavonoid intakes when using this method. Indeed, beyond flavonoids this method could be further used to estimate and evaluate diversity of other (poly)phenolics, or groups of bioactives, or potentially various food groups. Although there have been recent discussion and some previous use of various diversity indices in nutrition science38,39,40,41, Shannon’s equation (with Hill numbers) does not seem to have been used before to partition and study the independent roles of diversity and quantity. Hence, this work introduces a potential approach to study these characteristics of other dietary components in the future.

No previous works appear to have reported on the human health benefits of a flavonoid-diverse diet. Consequently, replication of our findings in other cohorts and clinical trials will be critical, as will the exploration of flavonoid diversity with other disease outcomes. Interpretation, however, requires careful consideration. For the most part, we observed that both quantity and diversity were independent predictors, suggesting there is a benefit to consuming a higher diversity beyond that of simply consuming a high quantity (and vice versa), although this relationship did not interact such that the benefit together was even greater than the combination of the individual parts42. On other occasions we observed quantity but not diversity was a predictor, which could suggest consuming a higher amount of any type provides benefit. Or perhaps a wider diversity of intake within the population under study is required before a role for diversity becomes observable, or that the average compositional make-up of diversity within the population was not relevant to the disease in question. Certainly, the biological relevance of diversity within subclasses may be less important if at least some compounds have similar biological effects. Indeed, those with the lowest diversity could theoretically consume one flavonoid type alone; hypothetically speaking, if this was considered the reference group and compared to those with a wider diversity, then after adjustment for quantity, the comparison compares one against multiple different flavonoid types, holding total quantity constant. If the one flavonoid type was overly protective against the disease in question, then there may be no benefit to consuming a wider diversity if the other flavonoids do not collectively provide a benefit larger than the reference. In other analyses we observed that only flavonoid diversity, but not quantity, predicted the outcomes. This could be due to synergies between different flavonoids, whereas simply consuming higher amounts of less diverse compounds may afford no benefit. We also observed that the quantity and diversity of flavonoid compound intake but not servings of flavonoid-rich foods were significantly associated with more outcomes, suggesting that the absolute intake of flavonoids matters more than the servings of flavonoid-rich foods per se, potentially because different foods have varying flavonoid densities and serving sizes. Moreover, combinations of some foods will probably provide a greater diversity of flavonoids than others—or example, consuming red wine and grapes will probably be less diverse than consuming oranges and grapes because there is less overlap in the flavonoid profiles of the foods.

The strengths of this study include the prospective design, large sample size, high number of cases and long follow-up time of ~10 years. Several limitations, however, should be noted. First, the observational design restricts our ability to infer causality or to exclude the possibility of residual confounding. To this end, we must consider whether the associations observed represent a benefit of higher diversity of flavonoids per se, or a signal that the various flavonoids act synergistically with other compounds found in flavonoid-rich foods, such as phenolic acids, lignans or other bioactives2. Indeed, the possibility of flavonoids being a marker of other unobserved and potential protective factors cannot be discounted. Second, although the Oxford WebQ has been validated against biomarkers and 24-h recalls for selected nutrients43,44, it does not capture data on certain types of flavonoid-rich foods (for example, specific types of berries), which potentially leads to imprecision in the assessment of diversity for certain subclasses (for example, anthocyanins), and as with all self-reported dietary assessments, common limitations and reporting biases apply2,45. Moreover, due to the limited number of dietary assessments, our analyses may have been affected by regression dilution with a probable underestimation of the strengths of associations46; this may be of specific importance when assessing diversity, assuming variation in intake is greater over longer timeframes. Third, incidence of T2DM was ascertained based on hospital and death records, which may not capture all cases, such as those diagnosed and treated in primary care. This may have introduced some degree of error, particularly if hospitalized individuals have different health-seeking behaviours or characteristics than those treated in primary care, highlighting the need for additional studies. Fourth, potential confounders were only assessed at baseline, and it is unclear how potential changes in their trajectories may have impacted upon the observed associations. Fifth, although we conducted extensive analysis showing that the associations of our exposures with the outcomes appear robust, we acknowledge that multiplicity issues should be considered when interpreting the results. Sixth, given our sample is not representative of all populations in terms of age, ethnicity, health status or socioeconomic standing, and so on, the generalizability of our results requires confirmation in other populations.

In conclusion, we found that a wider diversity of intake of total flavonoids, flavonoid-rich foods and/or specific flavonoid subclasses is associated with a lower risk of all-cause mortality and incidence of chronic disease, including CVD, T2DM, cancer, respiratory disease and neurodegenerative disease. We also observed that a higher quantity and wider diversity of dietary flavonoids, when consumed together, may represent the optimal approach for improving long-term health, compared with increasing either flavonoid quantity or diversity alone. Overall, our findings suggest simple and achievable dietary changes such as including several different daily servings of flavonoid-rich foods or beverages, such as tea, berries, apples, oranges or grapes, might have a major impact on population health, lowering the risk of all-cause mortality and major chronic disease.

Methods

Design

For the present investigation, we used data from the UK Biobank—a large, prospective, population-based cohort study47. Between 2006 and 2010, >500,000 male and female adults, aged 40–69 yr, were enrolled47. Participants attended one of 22 assessment centres located across England, Scotland and Wales, where they undertook a comprehensive baseline assessment, completing questionnaires and physical measures, and provided biological samples. The UK Biobank study received ethical approval from the NHS North West Multi-Centre Research Ethics Committee (reference 11/NW/0382) and all participants provided informed consent.

For the current analysis, we excluded participants who withdrew their consent during follow-up or who completed fewer than two 24-h dietary questionnaires (by first removing individual recalls without plausible energy intakes: 4,200 kcal d−1 for men and 3,500 kcal d−1 for women) (Supplementary Fig. 1). Additionally, for the respective outcomes of interest, we excluded participants with prevalent CVD, T2DM, cancer, respiratory disease or neurodegenerative disease, prior to the last date of dietary assessment (Supplementary Table 8). Lastly, because Shannon’s equation requires intake of at least one kind of flavonoid compound, those with zero total flavonoid intake were excluded, and then, depending on the exposure of interest (flavonoid-rich foods or intra-subclass diversity, and so on), participants with zero intake of flavonoid-rich foods or specific subclasses were excluded on a per-analysis basis, because the collective exclusion at the flavonoid-rich food or intra-subclass level would bias diversity of other levels (for example, compounds (Supplementary Fig. 1)).

Exposures

Dietary information was collected using the Oxford WebQ 24-h dietary questionnaire44, which participants completed on up to five separate occasions, between 2009 and 201248. Flavonoid intake was estimated from the Oxford WebQ 24-h dietary questionnaire using the US Department of Agriculture flavonoid and proanthocyanidin food content databases49,50, with food codes derived from the updated version of the nutrient calculations for the Oxford WebQ for food items and composite recipes13,51. Flavonoid intakes (mg d−1) from all completed questionnaires with plausible energy intakes were averaged. We derived intakes of several flavonoids subclasses as follows: flavonols (quercetin, kaempferol, myricetin and isorhamnetin), anthocyanins (cyanidin, delphinidin, malvidin, pelargonidin, petunidin and peonidin), flavan-3-ols ((+)-catechin, (+)-gallocatechin, (−)-epicatechin, (−)-epigallocatechin, (−)-epicatechin 3-gallate and (−)-epigallocatechin-3-gallate, plus dimers, trimers, 4–6-mers, 7–10-mers and polymers, plus theaflavin, theaflavin-3-gallate, theaflavin-3′-gallate, theaflavin-3,3′-digallate and thearubigins), flavanones (eriodictyol, hesperetin and naringenin) and flavones (luteolin and apigenin). Total flavonoid intake was calculated as the sum of all compounds. Intakes of isoflavones were not calculated due to the low consumption of isoflavone-containing foods in the general UK population52.

Diversity of flavonoid intake was calculated using Shannon’s equation for entropy22 which was subsequently converted to Hill’s effective numbers23,24. Calculations of diversity were made for total flavonoid intake, which considered diversity of all 31 flavonoids as described above. In an exploratory analysis we examined (1) intra-subclass diversity, which considered diversity of intake within individual subclasses, and (2) servings of flavonoid-rich foods, which included the key contributors to each flavonoid subclass, including tea (black and green), red wine, apples, berries, grapes, oranges (including satsumas), grapefruit, sweet peppers, onions and dark chocolate. The key contributors were determined as the three foods that contributed the highest percentage to the intakes of each flavonoid subclass (excluding fruit juices), and dark chocolate was included as it is typically high in flavan-3-ols13. Shannon’s equation is as follows:

$${rm{Shannon}}; {rm{index}}left({H};right)=-mathop{sum }limits_{i=1}^{s}{p}_{i}mathrm{ln},{p}_{i}$$

In Shannon’s equation, pi is calculated as the proportion of individual flavonoids consumed per day (that is, the quantity of compounds (mg d−1) or flavonoid-rich foods (servings per day)) relative to total intake (that is, the total quantity of flavonoids (mg d−1) or flavonoid-rich foods (servings per day)) and s is the total number of individual flavonoid types (that is, compounds or flavonoid-rich foods) consumed. Diversity of flavonoid intake was calculated using the R package Vegan53. Conversion of Shannon’s score into Hill’s effective numbers was undertaken by exponentiating H (refs. 23,24).

The purpose of using effective numbers is to convert Shannon’s non-linear score into an interpretable metric that quantifies diversity23. The resulting output, termed effective numbers, shows the number of different types of flavonoids that would need to be consumed in a specific proportional make-up to meet the same relative diversity as the diet from which it was calculated, wherein a higher value indicates wider diversity (a detailed explanation of effective numbers can be found in the Supplementary Methods). The Shannon equation and Hill numbers produce a measure of diversity that is relative to, and independent of, the quantity of flavonoid intake, such that it is possible that two individuals can have exactly the same diversity score, yet one of them may consume, for example, a threefold higher quantity of flavonoids. Therefore, following statistical adjustment for quantity of flavonoid consumption, it is possible to study the independent benefit of diversity of flavonoid intake.

Outcomes

The outcomes in the current study were all-cause mortality and incidence (first-time fatal or non-fatal events) of CVD, T2DM, total cancer, respiratory disease, and neurodegenerative disease. Date of death was obtained from death certificates held by the National Health Service Information Centre (England and Wales) and the National Health Service Central Register Scotland (Scotland). Dates and causes of hospital admissions were identified via record linkage to Health Episode Statistics (England), the Patient Episode Database (Wales) and the Scottish Morbidity Records (Scotland) as well as the National Cancer Registries (England, Scotland and Wales). Incident outcomes were defined as a hospital admission or death identified through primary or secondary diagnosis codes using International Classification of Diseases, Tenth Revision (ICD-10) as follows: CVD (I20-I25, I63 and I70-I74), T2DM (E11), cancer (C00-C97, excluding non-melanoma skin cancer (C44)), respiratory disease (J09-J98, I26 and I27) and neurodegenerative disease (F00–03, G12.2, G20, G21, G23.1–23.3, G23.8, G23.9, G30 and G31). Hospital admission follow-up data for CVD, T2DM, respiratory disease and neurodegenerative disease were available until 31 October 2022 for England, 31 August 2022 for Scotland and 31 May 2022 for Wales. Follow-up data for cancer were available until 31 December 2016 for Wales, 31 December 2020 for England and 30 November 2021 for Scotland. Mortality data were available until 30 November 2022 for England, Scotland and Wales. We therefore censored outcome analyses on these dates.

Covariates

Information on demographics, lifestyle factors and medical history including sex, age, ethnicity, anthropometry, physical activity, education, smoking and alcohol habits were obtained from the baseline assessment. Anthropometric measurements (height and weight) were obtained by trained personnel. BMI was calculated as weight/(height2) (kg m2). Physical activity was derived using the International Physical Activity Questionnaire short form, and total physical activity was calculated as the sum of walking, moderate and vigorous activity measured as metabolic equivalents (MET-h per week). Area-based socioeconomic status was derived from postal code of residence using the Townsend deprivation score. History of hypertension and diabetes mellitus (type 1 or 2) was derived from self-reported physician diagnosis of disease or medication use at recruitment, and from ICD codes dated prior to the last date of dietary assessment (Supplementary Table 8). History of hypercholesterolaemia was identified by physician diagnosis (self-reported) or the taking of cholesterol-lowering medication (Supplementary Table 8). To identify other baseline comorbidities, self-reported physician-diagnosed CVD, cancer, neurodegenerative disease and respiratory disease at recruitment was combined with ICD codes dated prior to the last date of diet assessment (Supplementary Table 8). The Oxford WebQ was used to calculate average daily intakes of foods, nutrients, energy intake via information recorded in the UK Nutrient Databank as previously reported54. The healthful plant-based diet index was derived from 17 food groups55.

Statistical analysis

Cox proportional-hazards models were used to investigate relationships between diversity of flavonoid consumption and all outcomes of interest. Participants were followed up from the completion of the last valid diet questionnaire until the first occurrence of the outcome event, death, loss to follow-up or the end of follow-up (as described above), whichever occurred first. Flavonoid diversity was modelled as quintiles with low flavonoid diversity (Q1) as the reference group. All models examining diversity were mutually adjusted for quantity (quintiles) of the same flavonoids that contributed to flavonoid diversity. All models used age as the underlying timescale56. Five models of adjustment were computed: model 1 minimally adjusted for sex, region of residence (entered as a strata variable: London, North West England, North East England, Yorkshire, West Midlands, East Midlands, South East England, South West England, Scotland and Wales) and number of dietary assessments completed with plausible energy intake (2, 3, 4 or 5); model 2 multivariable adjusted for covariates in model 1 plus demographic factors including: ethnicity (White, Black, Asian, mixed or other), BMI (<18.5, 18.5–24.99, 25–29.99, ≥30 kg m2), education (low (GSEs/O levels/GCSEs or equivalent), medium (NVQ/HND/HNC/A levels/AS levels or equivalent), high (other professional qualifications, college/university degree)) and socioeconomic status (Townsend deprivation index in quintiles); model 3 multivariable adjusted for covariates in model 2 plus lifestyle factors including: smoking status (current, former, never), alcohol intake (<1 g d−1, 1–7 g d−1, 8–15 g d−1, 16+ g d−1) and physical activity (MET-h per week in quintiles); model 4 multivariable adjusted for covariates in model 3 plus dietary factors including: intakes of sugary drinks (0 d−1, >0–1 d−1, >1–2 d−1, 2+ d−1), cups of coffee (0 d−1, >0–1 d−1, >1–2 d−1, 2+ d−1), and red and processed meat, whole grains, refined grains, saturated fatty acids and sodium (all g d−1) and energy (kcal d−1) (all as quintiles); model 5 multivariable adjusted for covariates in model 4 plus medical history including history of diabetes type 1 or 2 (yes versus no), hypertension (yes versus no) and hypercholesterolaemia (yes versus no), and for analysis of all-cause mortality, further adjustments for prevalent CVD, cancer, respiratory disease and neurodegenerative disease at baseline. For variables where participants could select ‘do not know’ or ‘prefer not to answer’, or for those with missing data, responses were combined into an ‘unknown’ indicator group. The proportional-hazards assumption was confirmed using Schoenfeld residual plots. Absence of multicollinearity among predictors was verified using variance inflation factors. To address concerns that occult chronic diseases in the years preceding diagnosis may have influenced dietary patterns, we conducted sensitivity analysis excluding participants who developed events within 2 years of follow-up. We conducted further sensitivity adjustments for the healthful plant-based diet index in place of other dietary factors in model 5. To assess the influence of flavonoid intakes irrespective of dietary energy, model 5 was rerun without calorie adjustment. To assess the potential independent benefits of quantity and diversity of flavonoid intake on the risk of our outcomes, we report the terms for quantity of flavonoid intake following adjustment for diversity of flavonoid consumption. To evaluate whether the joint effect of quantity and diversity of flavonoid intake was together larger (or smaller) than the combination of the individual parts42, likelihood ratio tests were used to compare models with and without interaction terms. We observed and interpreted the magnitude and direction of observed associations through estimated HRs and associated 95% CIs with a HR of 1 indicating no association. All analyses were undertaken using Stata/IC 14.2 (StataCorp) and R statistics (v.4.2.1).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.