• Home
  • News
  • Calendar
  • About DF/HCC
  • Membership
  • Visitor Center
 

Member Resources

Publications

International Journal of Epidemiology

International Journal of Epidemiology - RSS feed of current issue










The UK Millennium Cohort Study (MCS) is an observational, multidisciplinary cohort study that was set up to follow the lives of children born at the turn of the new century. The MCS is nationally representative and 18 552 families (18 827 children) were recruited to the cohort in the first sweep. There have currently been five main sweeps of data collection, at ages 9 months and 3, 5, 7 and 11 years. A further sweep of data collection is planned for age 14 years. A range of health-related data have been collected as well as measures concerning child development, cognitive ability and educational attainment. The data also include a wealth of information describing the social, economic and demographic characteristics of the cohort members and their families. In addition, the MCS data have been linked to administrative data resources including health records. The MCS provides a unique and valuable resource for the analysis of health outcomes and health inequalities. The MCS data are freely available to bona fide researchers under standard access conditions via the UK Data Service (http://ukdataservice.ac.uk) and the MCS website provides detailed information on the study (http://www.cls.ioe.ac.uk/mcs).


Intergenerational longitudinal studies over the lifespan provide valuable information for understanding the contexts and dynamic relations among cognition, family and health in adults and the elderly. The Hawai‘i Family Study of Cognition (HFSC), initiated in the early 1970s, included a cohort of over 6500 individuals representing over 1800 families of parents and their offspring. The HFSC gathered data on cognitive, personality, biological and other psychosocial variables, and provided novel information on the nature of cognitive abilities, especially on family issues. Some families were reassessed with short-term retesting in the 1970s. A select sample of offspring and their siblings and spouses were re-measured in the 1980s. Decades later, a 40-year follow-up of the original HFSC cohort was facilitated by the availability of contemporary tracking and tracing methods and internet-based testing. A subgroup of the original HFSC participants was re-contacted and retested on contemporary cognitive as well as socio-demographic and health measures. In this paper, we describe the original HFSC cohort and the design and methodology of the re-contact and retest studies of the HFSC, plans for expanding the re-contact and retesting, as well as directions for future research and collaborations. The Principal Investigator may be contacted for more information regarding the application, review and approval process for data access requests from qualified individuals outside the project.


The lidA Cohort Study (German Cohort Study on Work, Age, Health and Work Participation) was set up to investigate and follow the effects of work and work context on the physical and psychological health of the ageing workforce in Germany and subsequently on work participation. Cohort participants are initially employed people subject to social security contributions and born in either 1959 (n = 2909) or 1965 (n = 3676). They were personally interviewed in their homes in 2011 and will be visited every 3 years. Data collection comprises socio-demographic data, work and private exposures, work ability, work and work participation attitudes, health, health-related behaviour, personality and attitudinal indicators. Employment biographies are assessed using register data. Subjective health reports and physical strength measures are complemented by health insurance claims data, where permission was given. A conceptual framework has been developed for the lidA Cohort Study within which three confirmatory sub-models assess the interdependencies of work and health considering age, gender and socioeconomic status. The first set of the data will be available to the scientific community by 2015. Access will be given by the Research Data Centre of the German Federal Employment Agency at the Institute for Employment Research (http://fdz.iab.de/en.aspx).


The Social Inequality in Cancer (SIC) cohort study was established to determine pathways through which socioeconomic position affects morbidity and mortality, in particular common subtypes of cancer. Data from seven well-established cohort studies from Denmark were pooled. Combining these cohorts provided a unique opportunity to generate a large study population with long follow-up and sufficient statistical power to develop and apply new methods for quantification of the two basic mechanisms underlying social inequalities in cancer—mediation and interaction. The SIC cohort included 83 006 participants aged 20–98 years at baseline. A wide range of behavioural and biological risk factors such as smoking, physical inactivity, alcohol intake, hormone replacement therapy, body mass index, blood pressure and serum cholesterol were assessed by self-administered questionnaires, physical examinations and blood samples. All participants were followed up in nationwide demographic and healthcare registries. For those interested in collaboration, further details can be obtained by contacting the Steering Committee at the Department of Public Health, University of Copenhagen, at inan@sund.ku.dk.


The aim of the Health and Memory Study (HMS) of Nord-Trøndelag, Norway, was primarily to establish a database suitable as basis for a large number of studies on dementia. Data from the HMS study were collected via questionnaires and examinations during the period from 1995 to 2011. The dementia panel consists of 620 participants residing in nursing homes and 920 participants referred to memory clinics of Nord-Trøndelag. Data from this dementia panel may be linked to the Nord-Trøndelag Health Study (the HUNT study), three large population based health surveys that took place in 1984–86 (HUNT1), 1995–97 (HUNT2) and 2006–08 (HUNT3). Data collection is complete and the participation rate in the HUNT1 for patients diagnosed with dementia was 86%. The sub-studies in the HMS are focused on examining risk factors, caregiver burden, healthcare consumption and economic consequences of treating and having dementia. Researchers interested in the HMS study are invited to contact HUNT at hunt@medisin.ntnu.no.


The DHCS is a cohort of all HIV-infected individuals seen in one of the eight Danish HIV centres after 31 December 1994. Here we update the 2009 cohort profile emphasizing the development of the cohort. Every 12-24 months, DHCS is linked with the Danish Civil Registration System (CRS) in order to extract an age- and sex-matched comparison cohort from the general population, as well as cohorts of family members of the HIV-infected patients and of the comparison cohort. The combined cohort is linked with CRS, the Danish Cancer Registry, the Danish National Hospital Registry, the Danish Registry of Causes of Death, the Danish National Prescription Registry, the Attainment Register and the Integrated Database for Labour Market Research to get information on vital status, migration, cancer, hospital contacts, causes of death, dispensed prescriptions, education and employment. Using this design, rates of a range of outcomes have been compared between HIV-infected patients and the comparison cohort, as well as between families of these two cohorts in order to disaggregate the effects of HIV infection and familial/environmental factors. Data can be shared with foreign institutions following approval from the Danish Data Protection Agency. Potential collaborators can contact the study director, Niels Obel (e-mail: niels.obel@regionh.dk).


The Nahuche Health and Demographic Surveillance System (HDSS) study site, established in 2009 with 137 823 individuals is located in Zamfara State, north western Nigeria. North-West Nigeria is a region with one of the worst maternal and child health indicators in Nigeria. For example, the 2013 Nigeria Demographic and Health Survey estimated an under-five mortality rate of 185 deaths per 1000 live births for the north-west geo-political zone compared with a national average of 128 deaths per 1000 live births. The site comprises over 100 villages under the leadership of six district heads. Virtually all the residents of the catchment population are Hausa by ethnicity. After a baseline census in 2010, regular update rounds of data collection are conducted every 6 months. Data collection on births, deaths, migration events, pregnancies, marriages and marriage termination events are routinely conducted. Verbal autopsy (VA) data are collected on all deaths reported during routine data collection. Annual update data on antenatal care and household characteristics are also collected. Opportunities for collaborations are available at Nahuche HDSS. The Director of Nahuche HDSS, M.O. Oche at [ochedr@hotmail.com] is the contact person for all forms of collaboration.


Background: Mendelian randomization studies have so far restricted attention to linear associations relating the genetic instrument to the exposure, and the exposure to the outcome. In some cases, however, observational data suggest a non-linear association between exposure and outcome. For example, alcohol consumption is consistently reported as having a U-shaped association with cardiovascular events. In principle, Mendelian randomization could address concerns that the apparent protective effect of light-to-moderate drinking might reflect ‘sick-quitters’ and confounding.

Methods: The Alcohol-ADH1B Consortium was established to study the causal effects of alcohol consumption on cardiovascular events and biomarkers, using the single nucleotide polymorphism rs1229984 in ADH1B as a genetic instrument. To assess non-linear causal effects in this study, we propose a novel method based on estimating local average treatment effects for discrete levels of the exposure range, then testing for a linear trend in those effects. Our method requires an assumption that the instrument has the same effect on exposure in all individuals. We conduct simulations examining the robustness of the method to violations of this assumption, and apply the method to the Alcohol-ADH1B Consortium data.

Results: Our method gave a conservative test for non-linearity under realistic violations of the key assumption. We found evidence for a non-linear causal effect of alcohol intake on several cardiovascular traits.

Conclusions: We believe our method is useful for inferring departure from linearity when only a binary instrument is available. We estimated non-linear causal effects of alcohol intake which could not have been estimated through standard instrumental variable approaches.


Background: The recently described interaction between smoking, human leukocyte antigen (HLA) DRB1*15 and absence of HLA-A*02 with regard to multiple sclerosis (MS) risk shows that the risk conveyed by smoking differs depending on genetic background. We aimed to investigate whether a similar interaction exists between passive smoking and HLA genotype.

Methods: We used one case-control study with incident cases of MS (736 cases, 1195 controls) and one with prevalent cases (575 cases, 373 controls). Never-smokers with different genotypes and passive smoking status were compared with regard to occurrence of MS, by calculating odds ratios (ORs) with 95% confidence intervals (CIs). The potential interaction between different genotypes and passive smoking was evaluated by calculating the attributable proportion (AP) due to interaction.

Results: An interaction was observed between passive smoking and carriage of HLA-DRB1*15 (AP 0.3, 95% CI 0.02–0.5 in the incident study, and AP 0.4, 95% CI 0.1–0.7 in the prevalent study), as well as between passive smoking and absence of HLA-A*02. Compared with non-smokers without any of these two genetic risk factors, non-exposed subjects with the two risk genotypes displayed an OR of 4.5 (95% CI 3.3–6.1) whereas the same genotype for subjects exposed to passive smoking rendered an OR of 7.7 (95% CI 5.5–10.8).

Conclusions: The risk of developing MS associated with different HLA genotypes may be influenced by exposure to passive smoking. The finding supports our hypothesis that priming of the immune response in the lungs may subsequently lead to MS in people with a genetic susceptibility to the disease.


Background: In eutherian mammals and in humans, the female fetus may be masculinized while sharing the intra-uterine environment with a male fetus. Telomere length (TL), as expressed in leukocytes, is heritable and is longer in women than in men. The main determinant of leukocyte TL (LTL) is LTL at birth. However, LTL is modified by age-dependent attrition.

Methods: We studied LTL dynamics (LTL and its attrition) in adult same-sex (monozygotic, n = 268; dizygotic, n = 308) twins and opposite-sex (n = 144) twins. LTL was measured by Southern blots of the terminal restriction fragments.

Results: We observed that in same-sex (both monozygotic and dizygotic) twins, as reported in singletons, LTL was longer in females than in males [estimate ± standard error (SE):163 ± 63 bp, P < 0.01]. However, in opposite-sex twins, female LTL was indistinguishable from that of males (–31 ± 52 bp, P = 0.6), whereas male LTL was not affected. Findings were similar when the comparison was restricted to opposite-sex and same-sex dizygotic twins (females relative to males: same-sex: 188 ± 90 bp, P < 0.05; other-sex: –32 ± 64 bp, P = 0.6).

Conclusions: These findings are compatible with masculinization of the female fetus in opposite-sex twins. They suggest that the sex difference in LTL, seen in the general population, is largely determined in utero, perhaps by the intrauterine hormonal environment. Further studies in newborn twins are warranted to test this thesis.


Background: Animal models have suggested that undernutrition during gestation and the early postnatal period may adversely affect kidney development and compromise renal function. As a natural experiment, famines provide an opportunity to test such potential effects in humans. We assessed whether exposure to the Chinese famine of 1959–1961 during gestation and early postnatal life was associated with the levels of proteinuria among female adults three decades after exposure to the famine.

Methods: We measured famine intensity using the cohort size shrinkage index and we constructed a difference-in-difference model to compare the levels of proteinuria, measured with a dipstick test of random urine specimens, among Chinese women (n = 70 543) whose exposure status to the famine varied across birth cohorts (born before, during or after the famine) and counties of residence with different degrees of famine intensity.

Results: Famine exposure was associated with a greater risk [odds ratio (OR) = 1.54; 95% confidence interval (CI): 1.04, 2.28; P = 0.029) of having higher level of proteinuria among women born during the famine years (1959–61) compared with the unexposed post famine-born cohort (1964–65) in rural samples. No association was observed among urban samples. Results were robust to adjustment for covariates.

Conclusions: Severe undernutrition during gestation and the early postnatal period may have long-term effects on levels of proteinuria in humans, but the effect sizes may be small.


Background: Women who give birth at younger ages (e.g. teenage mothers) are more likely to have children who exhibit behaviour problems, such as attention-deficit/hyperactivity disorder (ADHD). However, it is not clear whether young maternal age is causally associated with poor offspring outcomes or confounded by familial factors.

Methods: The association between early maternal age at childbirth and offspring ADHD was studied using data from Swedish national registers. The sample included all children born in Sweden between 1988 and 2003 (N = 1 495 543), including 30 674 children with ADHD. We used sibling- and cousin-comparisons to control for unmeasured genetic and environmental confounding. Further, we used a children-of-siblings model to quantify the genetic and environmental contribution to the association between maternal age and offspring ADHD.

Results: Maternal age at first birth (MAFB) was associated with offspring ADHD. Teenage childbirth (<20 years) was associated with 78% increased risk of ADHD. The association attenuated in cousin-comparison, suggesting unmeasured familial confounding. The children-of-siblings model indicated that the association between MAFB and ADHD was mainly explained by genetic confounding.

Conclusions: All children born to mothers who bore their first child early in their reproductive lives were at increased risk of ADHD. The association was mainly explained by genetic factors transmitted from mothers to their offspring that contribute to both age at childbirth and ADHD in offspring. Our results highlight the importance of using family-based designs to understand how early life circumstances affect child development.


Background: Agent Orange (AO) was a mixture of phenoxy herbicides, containing several dioxin impurities including 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Various military herbicides, including AO, were sprayed by the US military and allied forces for military purposes during the Vietnam War. This study was performed to identify the associations between the AO exposure and mortality in Korean Vietnam veterans.

Methods: From 1 January 1992 to 31 December 2005, 180 639 Korean Vietnam veterans were followed up for vital status and cause of death. The AO exposure index was based on the proximity of the veteran’s unit to AO-sprayed areas, using a geographical information system-based model. The adjusted hazard ratios and 95% confidence intervals were calculated by Cox's proportional hazard model.

Results: The mortality from all causes of death was elevated with AO exposure. The deaths due to all sites of cancers combined and some specific cancers, including cancers of the stomach, small intestine, liver, larynx, lung, bladder and thyroid gland, as well as chronic myeloid leukaemia, were positively associated with AO exposure. The deaths from angina pectoris, chronic obstructive pulmonary disease and liver disease including liver cirrhosis were also increased with an increasing AO exposure.

Conclusions: Overall, this study suggests that AO/TCDD exposure may account for mortality from various diseases even several decades after exposure. Further research is needed to better understand the long-term effects of AO/TCDD exposure on human health.


Background: In many Western populations, blood pressure varies moderately with season and outdoor temperature. Relatively little is known about effects of seasonal changes in blood pressure on the detection and control of hypertension in general populations, especially in low- and middle-income countries.

Methods: We analysed cross-sectional data of 57 375 (42% men) participants aged 30–79 (mean 52.3) years who were enrolled during 2004–08, as part of the China Kadoorie Biobank, from a rural county in the south-east costal Zhejiang Province. Analyses related daily mean outdoor temperature, obtained from local Meteorological Bureau, to mean systolic (SBP) and diastolic blood pressure (DBP), rate of newly detected hypertension and, among those with self-reported physician-diagnosed hypertension, rate of adequate blood pressure control, using multiple linear and logistic regression models.

Results: The overall mean blood pressure was 135.9 mmHg for SBP and 80.5 mmHg for DBP. Daily outdoor temperature ranged between –2.9 and 33.7°C, with July being the hottest month (mean 29.4°C) and January the coldest (mean 4.0°C). Comparing January (the coldest month) with July (the warmest), the differences in the adjusted SBP/DBP were 19.2/7.7 mmHg. Each 10°C lower ambient temperature was associated with 6.9/2.9 mmHg higher SBP/DBP,14.1% higher prevalence of newly detected hypertension and, among those with pre-diagnosed hypertension, 13.0% lower hypertension control rate.

Conclusion: In rural China, lower outdoor temperature is strongly associated with higher mean blood pressure and hypertension prevalence as well as poorer hypertension control, and should be considered when conducting population-based hypertension surveys and providing treatment for hypertensive patients.


Background: Immigrants to Westernized countries adopt the prevalence of allergic diseases of native populations, yet no data are available on immigrants to low-income or low-disease prevalence countries. We investigated these questions using data from the International Study of Asthma and Allergies in Childhood.

Methods: Standardized questionnaires were completed by 13–14-year-old adolescents and by the parent/guardians of 6–7-year-old children. Questions on the symptom prevalence of asthma, rhinoconjunctivitis and eczema, and a wide range of factors postulated to be associated with these conditions, including birth in or not in the country and age at immigration, were asked. Odds ratios for risk of the three diseases according to immigration status were calculated using generalized linear mixed models. These were adjusted for: world region; language and gross national income; and individual risk factors including gender, maternal education, antibiotic and paracetamol use, maternal smoking, and diet. Effect modification by gross national income and by prevalence was examined.

Results: There were 326 691 adolescents from 48 countries and 208 523 children from 31 countries. Immigration was associated with a lower prevalence of asthma, rhinoconjunctivitis and eczema in both age groups than among those born in the country studied, and this association was mainly confined to high-prevalence/affluent countries. This reduced risk was greater in those who had lived fewer years in the host country.

Conclusions: Recent migration to high prevalence/affluent countries is associated with a lower prevalence of allergic diseases. The protective pre-migration environment quickly decreases with increasing time in the host country.


Background: Incidence of contralateral breast cancer (CBC) is much less studied than primary breast cancer. We aimed to assess incidence rates of CBC in relation to age, calendar period and time since first breast cancer.

Methods: Using the nationwide Danish Cancer Registry, we identified 85 863 women with a first primary invasive breast cancer without distant metastases in Denmark during 1978–2009. Among these, 3120 women developed metachronous CBC. Crude incidence rates for CBC were calculated by age and calendar period at first breast cancer as well as time since first breast cancer. Mutual adjustments were made by use of Poisson regression models.

Results: The incidence of CBC decreased with increasing age at first breast cancer. Before 1998, incidence rates of CBC showed little variation. The rates decreased by period of first primary from 546 per 105 person-years in 1993–97 to 328 per 105 person-years in 2003–09. After adjustment for age and calendar period, no clear trend was observed in the overall incidence according to time since first breast cancer.

Conclusions: Occurrence of cancer in the contralateral breast seems to be rather independent of time passed since the first primary. The finding of a decreasing incidence of CBC after 1997 is likely to be due to more women receiving systemic adjuvant therapy such as tamoxifen and longer duration of this treatment as well as the introduction of aromatase inhibitors.


Background: Before their diagnosis, patients with cancer present in primary care more frequently than do matched controls. This has raised hopes that earlier investigation in primary care could lead to earlier stage at diagnosis.

Methods: We re-analysed primary care symptom data collected from 247 lung cancer cases and 1235 matched controls in Devon, UK. We identified the most sensitive and specific definition of symptoms, and estimated its incidence in cases and controls prior to diagnosis. We estimated the symptom lead time (SLT) distribution (the time between symptoms attributable to cancer and diagnosis), taking account of the investigations already carried out in primary care. The impact of route of diagnosis on stage at diagnosis was also examined.

Results: Symptom incidence in cases was higher than in controls 2 years before diagnosis, accelerating markedly in the last 6 months. The median SLT was under 3 months, with mean 5.3 months [95% credible interval (CrI) 4.5–6.1] and did not differ by stage at diagnosis. An earlier stage at diagnosis was observed in patients identified through chest X-ray originated in primary care.

Conclusions: Most symptoms preceded clinical diagnosis by only a few months. Symptom-based investigation would lengthen lead times and result in earlier stage at diagnosis in a small proportion of cases, but would be far less effective than standard screening targeted at smokers.


Background: Smoking, sedentary lifestyle and obesity are risk factors for mortality and dementia. However, their impact on cognitive impairment-free life expectancy (CIFLE) has not previously been estimated.

Methods: Data were drawn from the DYNOPTA dataset which was derived by harmonizing and pooling common measures from five longitudinal ageing studies. Participants for whom the Mini-Mental State Examination was available were included (N = 8111, 48.6% men). Data on education, sex, body mass index, smoking and sedentary lifestyle were collected and mortality data were obtained from Government Records via data linkage. Total life expectancy (LE), CIFLE and years spent with cognitive impairment (CILE) were estimated for each risk factor and total burden of risk factors.

Results: CILE was approximately 2 years for men and 3 years for women, regardless of age. For men and women respectively, reduced LE associated with smoking was 3.82 and 5.88 years, associated with obesity was 0.62 and 1.72 years and associated with being sedentary was 2.50 and 2.89 years. Absence of each risk factor was associated with longer LE and CIFLE, but also longer CILE for smoking in women and being sedentary in both sexes. Compared with participants with no risk factors, those with 2+ had shorter CIFLE of up to 3.5 years depending on gender and education level.

Conclusions: Population level reductions in smoking, sedentary lifestyle and obesity increase longevity and number of years lived without cognitive impairment. Years lived with cognitive impairment may also increase.


Background: There are conflicting findings regarding long- and short-term effects of income on health. Whereas higher average income is associated with better health, there is evidence that health behaviours worsen in the short-term following income receipt. Prior studies revealing such negative short-term effects of income receipt focus on specific subpopulations and examine a limited set of health outcomes.

Methods: The United States Earned Income Tax Credit (EITC) is an income supplement tied to work, and is the largest poverty reduction programme in the USA. We utilize the fact that EITC recipients typically receive large cash transfers in the months of February, March and April, in order to examine associated changes in health outcomes that can fluctuate on a monthly basis. We examine associations with 30 outcomes in the categories of diet, food security, health behaviours, cardiovascular biomarkers, metabolic biomarkers and infection and immunity among 6925 individuals from the U.S. National Health and Nutrition Survey. Our research design approximates a natural experiment, since whether individuals were sampled during treatment or non-treatment months is independent of social, demographic and health characteristics that do not vary with time.

Results: There are both beneficial and detrimental short-term impacts of income receipt. Although there are detrimental impacts on metabolic factors among women, most other impacts are beneficial, including those for food security, smoking and trying to lose weight.

Conclusions: The short-term impacts of EITC income receipt are not universally health promoting, but on balance there are more health benefits than detriments.


Background: Social capital is considered to be an important determinant of life expectancy and cardiovascular health. Evidence on the association between social capital and all-cause mortality, cardiovascular disease (CVD) and cancer was systematically reviewed.

Methods: Prospective studies examining the association of social capital with these outcomes were systematically sought in Medline, Embase and PsycInfo, all from inception to 8 October 2012. We categorized the findings from studies according toseven dimensions of social capital, including social participation, social network, civic participation, social support, trust, norm of reciprocity and sense of community, and pooled the estimates across studies to obtain summary relative risks of the health outcomes for each social capital dimension. We excluded studies focusing on children, refugees or immigrants and studies conducted in the former Soviet Union.

Results: Fourteen prospective studies were identified. The pooled estimates showed no association between most social capital dimensions and all-cause mortality, CVD or cancer. Limited evidence was found for association of increased mortality with social participation and civic participation when comparing the most extreme risk comparisons.

Conclusions: Evidence to support an association between social capital and health outcomes is limited. Lack of consensus on measurements for social capital hinders the comparability of studies and weakens the evidence base.


The international scientific literature reports no data on the prevalence and effectiveness of back protector devices (BPD). In Italy, no data have been collected on BPD because their use is not mandatory. To fill this gap, the National Institute of Health implemented a cross-sectional study in collaboration with the National Traffic Police. Accident cases were collected from 1 December 2011 to 25 October 2013. Overall, data from 2104 accidents involving 2319 injured subjects were analysed: 1821 (78.5%) of these were motorcyclists and 498 (21.5%) mopedists. The use of Hard-shell BPD or jackets with airbags in motorcyclists is higher then in moped drivers (16.2% vs 1.3%, P = 0.000). Concerning level of protection, there are no differences between drivers and passengers. In most severely injured motorcyclists (i.e. hospitalized or deceased), the percentage of injuries to the spine was lower (13.6%) among those who used a high-level safety device (hard-shell BPD and/or airbags) and rose to 27.3% among those who used only protective clothing (P = 0.022). When the variables potentially affecting the results of not using a high-safety device were controlled, a bivariate analysis showed that the odds of serious spinal injury were 2.72 times greater (P = 0.049) and a multivariate analysis showed that they were 2.81 times greater (P = 0.012). This study points out that greater use of BPD could reduce the number of injuries to the spinal column resulting from road traffic accidents involving motorized two-wheeled vehicles.


Background: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK’s proposed ‘care.data’ initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data.

Methods: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC.

Results: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach.

Conclusions: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property—the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.


Background: South African civil registration (CR) provides a key data source for local health decision making, and informs the levels and causes of mortality in data-lacking sub-Saharan African countries. We linked mortality data from CR and the Agincourt Health and Socio-demographic Surveillance System (Agincourt HDSS) to examine the quality of rural CR data.

Methods: Deterministic and probabilistic techniques were used to link death data from 2006 to 2009. Causes of death were aggregated into the WHO Mortality Tabulation List 1 and a locally relevant short list of 15 causes. The matching rate was compared with informant-reported death registration. Using the VA diagnoses as reference, misclassification patterns, sensitivity, positive predictive values and cause-specific mortality fractions (CSMFs) were calculated for the short list.

Results: A matching rate of 61% [95% confidence interval (CI): 59.2 to 62.3] was attained, lower than the informant-reported registration rate of 85% (CI: 83.4 to 85.8). For the 2264 matched cases, cause agreement was 15% (kappa 0.1083, CI: 0.0995 to 0.1171) for the WHO list, and 23% (kappa 0.1631, CI: 0.1511 to 0.1751) for the short list. CSMFs were significantly different for all but four (tuberculosis, cerebrovascular disease, other heart disease, and ill-defined natural) of the 15 causes evaluated.

Conclusion: Despite data limitations, it is feasible to link official CR and HDSS verbal autopsy data. Data linkage proved a promising method to provide empirical evidence about the quality and utility of rural CR mortality data. Agreement of individual causes of death was low but, at the population level, careful interpretation of the CR data can assist health prioritization and planning.


Background: Data on objectively measured physical activity are lacking in low- and middle-income countries. The aim of this study was to describe objectively measured overall physical activity and time spent in moderate-to-vigorous physical activity (MVPA) in individuals from the Pelotas (Brazil) birth cohorts, according to weight status, socioeconomic status (SES) and sex.

Methods: All children born in 1982, 1993 and 2004 in hospitals in the city of Pelotas, Brazil, constitute the sampling frame; of these 99% agreed to participate. The most recent follow-ups were conducted between 2010 and 2013. In total, 8974 individuals provided valid data derived from raw triaxial wrist accelerometry. The average acceleration is presented in milli-g (1 mg = 0.001g), and time (min/d) spent in MVPA (>100 mg) is presented in 5- and 10-min bouts.

Results: Mean acceleration in the 1982 (mean age 30.2 years), 1993 (mean age 18.4 years) and 2004 (mean age 6.7 years) cohorts was 35 mg, 39 mg and 60 mg, respectively. Time spent in MVPA was 26 [95% confidence interval (CI) 25; 27], 43 (95% CI 42; 44) and 45 (95% CI 43; 46) min/d in the three cohorts, respectively, using 10-min bouts. Mean MVPA was on average 42% higher when using 5-min bouts. Males were more active than females and physical activity was inversely associated with age of the cohort and SES. Normal-weight individuals were more active than underweight, overweight and obese participants.

Conclusions: Overall physical activity and time spent in MVPA differed by cohort (age), sex, weight status and SES. Higher levels of activity in low SES groups may be explained by incidental physical activity.


Quantitative bias analysis serves several objectives in epidemiological research. First, it provides a quantitative estimate of the direction, magnitude and uncertainty arising from systematic errors. Second, the acts of identifying sources of systematic error, writing down models to quantify them, assigning values to the bias parameters and interpreting the results combat the human tendency towards overconfidence in research results, syntheses and critiques and the inferences that rest upon them. Finally, by suggesting aspects that dominate uncertainty in a particular research result or topic area, bias analysis can guide efficient allocation of sparse research resources.

The fundamental methods of bias analyses have been known for decades, and there have been calls for more widespread use for nearly as long. There was a time when some believed that bias analyses were rarely undertaken because the methods were not widely known and because automated computing tools were not readily available to implement the methods. These shortcomings have been largely resolved. We must, therefore, contemplate other barriers to implementation. One possibility is that practitioners avoid the analyses because they lack confidence in the practice of bias analysis.

The purpose of this paper is therefore to describe what we view as good practices for applying quantitative bias analysis to epidemiological data, directed towards those familiar with the methods. We focus on answering questions often posed to those of us who advocate incorporation of bias analysis methods into teaching and research. These include the following. When is bias analysis practical and productive? How does one select the biases that ought to be addressed? How does one select a method to model biases? How does one assign values to the parameters of a bias model? How does one present and interpret a bias analysis?.

We hope that our guide to good practices for conducting and presenting bias analyses will encourage more widespread use of bias analysis to estimate the potential magnitude and direction of biases, as well as the uncertainty in estimates potentially influenced by the biases.