SPOREsProstateCores
Prostate SPORE Cores
Core 2: Biostatistics and Computational Biology
Co-Principal Investigators: Meredith Regan, ScD Pablo Tamayo, PhD
Participating Institution(s): Dana-Farber Cancer Institute Massachusetts Institute of Technology (Broad Institute)
Core 2: Introduction
The Dana Farber Harvard Cancer Center (DF/HCC) Prostate Cancer SPORE Biostatistics and Computational Biology Core provides design and analytic support to SPORE Projects, Developmental Projects, Projects of the Career Development Awardees, and other SPORE Cores.
The Biostatistics and Computational Biology Core will join and expand the existing Biostatistics Core and computational aspects of the existing Genomics Core of the current DF/HCC Prostate Cancer SPORE in one Core under the leadership of the Core Director, Dr. Meredith Regan, and Co-Director, Dr. Pablo Tamayo. The new integrated Biostatistics and Computational Biology Core reflects the growth of computational, data-intensive approaches to cancer research at cancer centers around the country. The Core personnel will draw from expertise at the Dana-Farber Cancer Institute (DFCI), the Harvard School of Public Health (HSPH), and the Broad Institute, a research collaboration of the Massachusetts Institute of Technology (MIT) and Harvard academic and medical communities that focuses on genomic medicine.
The DF/HCC Prostate Cancer SPORE Biostatistics and Computational Biology Core will provide consultation and collaboration on all research activities within the SPORE, including Projects, Developmental Projects, Projects of the Career Development Awardees, and the other SPORE Cores to ensure the highest standards of scientific rigor in all areas, including study design, data management and integrity, and data analysis and interpretation.
During the last 20 years, the development of new statistical and data analysis methodologies for cancer research has resulted in an expanded role for the statistician in the research process and a higher standard for what constitutes acceptable scientific evidence in a study. Areas of advancement include planning of laboratory and animal experiments, control of both systematic and random errors, development of statistical methods that are robust to outliers or misspecification of distributions, analysis of data sets with many variables and with semi-quantitative variables, adjusting for bias caused by missing longitudinal data or bias inherent in non-randomized screening and prevention studies, and the planned sequential analysis of studies. Statisticians and data analysts are professionally committed to staying on top of these developments, a challenge beyond what can reasonably be expected of clinical investigators. Yet the use of these methods has become a widely accepted requirement for clinical and epidemiological investigations, as well as for animal and laboratory investigations. The great growth in computing power and computing algorithms has permitted the use of complex statistical techniques, even on large data sets, but these computer programs are not always user-friendly to the occasional user and require statistical knowledge to check assumptions and interpret results and limitations.
In addition to classical statistical techniques that are used in observational studies, clinical trials, and small-scale laboratory studies, the translational research component of SPOREs makes use of a variety of data analysis methods from genomics and other fields of molecular and computational biology. In this context and due to the high dimensionality of the datasets created by combining multiple sources of high-throughput data, it is necessary to develop and apply advanced data reduction, multi-step data analysis and sophisticated visualization techniques. These methods include supervised and unsupervised machine learning techniques for finding subsets of genes that behave similarly in a malignancy or predict outcome accurately, permutation based methods of inference that account for correlation among gene sets when calculating significance levels, and the use of false discovery rates as a more rational way to pick significant genes from long candidate lists.
As new genomics technologies are developed, it is becoming increasingly clear that no single technology is likely to replace or dominate all others. Rather, it is expected that many of the new insights into normal and disease states brought about by the use of genomic technologies will be obtained by integrative analysis of information from multiple sources of genomic data—integrating model systems with human tissues and integrating across a variety of genomic platforms. The analysis methodologies have to be commensurate with this level of multiplicity and be able to deal with large and complex datasets. The challenge is to make the methodologies robust, reproducible and able to identify relevant biological processes at a high enough level of abstraction. This is particularly important when the data being generated are biologically and technically noisy, as is often the case. While advances in genomic data analysis methods have occurred, the field remains rather embryonic, with ongoing needs to develop new approaches and new software that is useable by both biologists and computational biologists.
The importance of the biostatisticians and computational biologists in the SPORE setting can be measured in their role in the scientific output of DF/HCC Prostate Cancer SPORE investigators and development of computational tools from which they have benefit. Over the past five years, Dr. Regan and Ms. Judith Manola, members of the current DF/HCC Prostate Cancer SPORE Biostatistics Core, have collaborated with the SPORE investigators on about 50 manuscripts, covering a broad area of research. Of particular note are the Nature Medicine publication (1) of Career Development Awardee Dr. PK Majumder on the identification of HIF1α activation in a model of AKT-induced PIN in mice and its reversion by the mTOR inhibitor RAD001, a publication on the prediction of prostate cancer outcome by gene expression profiling (2) which exemplifies collaborations of the Biostatistics and Genomics Cores, and a recent publication on the development of the Clinical Research Information System (CRIS) that is a cornerstone of the SPORE’s tissue bank (3). They have more broadly collaborated or consulted with the DF/HCC Prostate Cancer Program investigators on numerous presentations, monitoring of clinical trials, preparation of clinical, laboratory, and animal protocols, and preparation of more than a dozen external grant applications.
As described more fully below, the computational biologists joining the Core have contributed significantly to the conception, development, sharing, and application of computational methods, and it is expected that new methods will be required by the Projects as the science of this SPORE proceeds. Dr. Tamayo, who takes over leadership as the Core Co-Director, along with Dr. Golub, who served as the Genomics Core Director and now serves as a consultant on this Core (and Co-PI of Project 4), have worked closely since 1998 in the development of novel methods for class discovery and class prediction, marker selection, molecular signature definition and characterization, and gene set-based methods. They have co-authored more than one dozen collaborative manuscripts on gene expression analysis and its application to cancer research. Also new to the Core are Drs. X. Shirley Liu and Wei Li, who specialize in computational sequence motif finding and have developed a number of widely used algorithms for different biological applications; they have about a half-dozen manuscripts published, in press, or under review in collaboration with Dr. Myles Brown (Project 5 Co-PI), including a manuscript under review on transcription regulation in prostate cancer in collaboration with University of Michigan Prostate Cancer SPORE investigators (4).
Core 2: Specific Aims
The Dana Farber Harvard Cancer Center (DF/HCC) Prostate Cancer SPORE Biostatistics and Computational Biology Core provides design and analytic support to SPORE Projects, Developmental Projects, Projects of the Career Development Awardees, and other SPORE Cores. The specific aims are to:
1. Provide biostatistical and computational biology expertise for the planning and design, conduct, analysis, and reporting of laboratory, genomic, animal, translational, clinical (including associated correlative studies), and epidemiological studies for SPORE projects, Developmental Projects, Projects of the Career Development Awardees, and other SPORE Cores.
2. Provide consultation on data collection, storage and quality assurance, statistics, and computational biology software and programs and coordination of laboratory results with parameters and outcomes from clinical studies or clinical/translational research databases.
3. Provide short-term biostatistics and computational biology consulting to the entire group of SPORE researchers.
|
|