Genome-wide association study reveals new genetic risk factor for prostate cancer
The post-genomic era has sparked a new trend in biomedical research that was scarcely imaginable a few years ago. Case in point: the NCI Cancer Genetic Markers of Susceptibility (CGEMS) project, a genome-wide association study that leverages knowledge from the Human Genome and HapMap projects, while capitalizing on the latest generation of DNA chip technologies.
Co-led by David Hunter, MD, professor of cancer prevention at HSPH and director of the DF/HCC High-Throughput Polymorphism Detection Core, CGEMS is a collaboration between the National Cancer Institute and several other institutions, which are pooling population resources to find the common inherited mutations that contribute to increased risk for prostate and breast cancer.
Unlike a candidate gene study, a genome-wide association study makes no assumptions about which genes contribute to disease, but instead combs the entire genome for variations that occur more frequently in patients who have a disease (cases) than in a matched set of individuals (controls) who do not. These data help researchers locate the most common mutated genes that increase susceptibility to disease.
The CGEMS study analyzes the genetic variations known as SNPs (single nucleotide polymorphisms), which represent a one-letter change in the genetic sequence at a particular site, or locus, on a chromosome. (SNP generally refers to the site of variation, whereas allele refers to the particular nucleotide at that site, eg, A or G.) While some alleles contribute to individual traits, such as eye color, or have no appreciable effect, others predispose to disease.
SNPs, chips, and the HapMap converge on cancer research
The human genome harbors 10 million “common” SNPs – those in which the frequency of the variation, or minor allele, is shared by at least one percent of a population. Moreover, SNPs in close proximity to one another are often inherited together in blocks called haplotypes. The HapMap project, which mapped these ancestral segments of the genome, has identified a subset of SNPs that represent – or tag – the haplotype to which they belong, acting as proxies for the other SNPs in that segment. Thus researchers can capture variations across the wide expanse of the genome more efficiently by studying the much smaller number of tag SNPs that reflect the diversity of the study population.
The initial scans in the CGEMS project analyze 550,000 tag SNPs at once, a feat made possible by new ultrahigh-volume DNA chips. The chips analyze the alleles at each SNP for both cases and controls; a readout indicates whether the person carries no copies of the minor allele, one copy, or two copies – producing a value of 0, 1, or 2, respectively. This information is fed into the CGEMS database, which compares the readouts and determines which SNPs have the highest frequencies of the minor alleles – that is, those that are more common in patients with cancer.
The multistage design of the CGEMS study resembles a funnel or inverted cone: the first scan winnows down 550,000 tag SNPs to the top 25,000 SNPs; the second scan narrows these 25,000 to 1,500; and the third scan reveals the final 25-50 candidate markers that may be implicated in cancer. In each case, sequential scans are replicated in different populations of cases and controls. “This multistage design can realize genotyping cost savings without loss of statistical power,” says Peter Kraft, PhD, assistant professor of epidemiology and biostatistics at HSPH and a statistician for CGEMS, who helped design the study and determine sample size. Since some results could be a matter of chance, cautions Kraft, it’s important that other research groups conduct replication studies to determine whether the final 25-50 SNPs continue to demonstrate an association.
A new way of doing science
The CGEMS initiative takes advantage of several existing study populations – including the NCI’s Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), the Nurses’ Health Study, Physicians’ Health Study, and many others – for which researchers have collected not only DNA samples but also extensive lifestyle and biomarker data.
The three-year CGEMS project began in 2006 with the scan of DNA samples from the PLCO study taken from 1,150 men with prostate cancer and 1,150 controls, explains co-investigator and epidemiologist Ed Giovannucci, MD, professor at HSPH and member of DF/HCC’s Prostate Cancer Program. The first scan of 550,000 SNPs was completed late last year, and the results were posted on the CGEMS Web site (http://cgems.cancer.gov/). Other researchers can look up the rank order of each of the 550,000 SNPs, or query the data by gene name, chromosome region, range of base pairs, individual SNP, or P value.
“This is a brand new way of doing science,” says Hunter. “We’re releasing our data six months before publication to try to drive the science faster. And we display the aggregate results in a flexible format, which is a huge leap forward in sharing scientific results. We encourage DF/HCC researchers to access these data to help advance their own prostate cancer work.”
While analyzing their data from the first scan, CGEMS researchers decided to take a closer look at an area around chromosome 8q24, which recent studies had suggested was associated with prostate cancer. This region rose to the top of the first scan, says Kraft, which provided proof of principle that the CGEMS association study works. What’s more, researchers discovered a new locus in the same region that had not previously been reported. The group conducted four follow-up scans of chromosome 8q24, using a total of 4,200 cases and 4,200 controls in different populations, which confirmed their findings (published online April 1 in Nature Genetics).
“We were very excited because we discovered a new genetic risk factor for prostate cancer,” says Hunter. “And because it’s in the same neighborhood as the previously discovered locus, this region looks like a ‘hot spot’ for prostate cancer.” The results of the second scan of 25,000 markers are expected in the next few months.
Hunting for genetic risk factors in breast cancer
Meanwhile, the breast cancer component of the project is also under way. The first scan genotyped DNA of 1,150 cases and 1,150 controls from the Nurses’ Health Study, a group that has been tracked for decades. Hunter and his team selected women who were diagnosed with invasive breast cancer and were postmenopausal at diagnosis.
“We needed to carefully select where we thought we’d have the most impact, given that pre-and postmenopausal breast cancer may have different contributors,” says co-investigator Susan Hankinson, ScD, associate professor of medicine at HMS, co-leader of the DF/HCC Cancer Epidemiology Program, and principal investigator of the Nurses’ Health Study. “We focused in one area so that we’d have better statistical power.” This approach should yield the common SNPs that are much more frequent in the general population than are the BRCA1 and BRCA2 genes. Data from the scan will be released on the CGEMS Web site in the coming weeks.
“We’re hoping to identify new pathways that are important in breast cancer etiology,” adds Hankinson, “and that this study will spur additional collaborations with DF/HCC researchers.”
CGEMS has already spawned other projects: Rulla Tamimi, ScD, assistant professor of medicine at HMS, has proposed using SNP data from the first scan to hunt for genes that correlate with breast tissue density, a strong risk factor for breast cancer that is not well understood. Hankinson explains that CGEMS, which uses existing study populations, offers unique opportunities for related research because of the wealth of lifestyle and biomarker data readily available. “The mammographic density proposal is a good example of other projects that will emerge from CGEMS.”
An association study at this scale – involving tens of thousands of patients – would not have been possible a few years ago, nor would scientific findings have become public so quickly. “Discoveries are going to tumble out over the next two years,” says Hunter, “and it’s because of the genome project, the HapMap, and the magic of these new technologies.”