Health Alert:

Starting Feb. 29, masking is optional but encouraged in UPMC medical facilities and most patient care settings.

UPMC’S ‘Big Data’ Technology Shows Promise in Breast Cancer Research

University of Pittsburgh and UPMC


Only eight months into its $100 million, five-year enterprise analytics effort, UPMC and its research partners at the University of Pittsburgh are starting to see the potential of this “big data” technology for accelerating scientific discoveries and the promise of personalized medicine.

With the foundational architecture of UPMC’s new enterprise data warehouse in place, Pitt researchers recently were able to electronically integrate for the first time clinical and genomic information on 140 patients previously treated for breast cancer.

“One of the first questions we asked was, ‘Is there a difference, a unique difference between pre-menopausal and post-menopausal breast cancer?’” said Adrian V. Lee, Ph.D., a renowned expert in the molecular and cellular biology of breast cancer and director of the Women’s Cancer Research Center at the University of Pittsburgh Cancer Institute and Magee-Womens Research Institute. “We are interested in this question from a research standpoint because we are moving toward personalized medicine, and personalized medicine is all about finding subgroups of patients who have a specific type of disease for which we could develop novel therapies.”

In this case, the researchers found intriguing molecular differences in the makeup of pre-menopausal vs. post-menopausal breast cancer. While understanding those differences will require more research, the findings eventually could provide a roadmap for developing targeted therapies, notes Dr. Lee.

This initial cancer question is just the start of UPMC’s and Pitt’s effort to mine massive amounts of data — clinical, genomic, proteomic, imaging and financial, to name a few — in the pursuit of smarter medicine. Traditionally, these data have resided in separate information systems, making it difficult, if not impossible, to integrate and analyze dozens of variables. “The integration of data, which is the goal of the enterprise data warehouse, allows us to ask questions that we just simply couldn’t ask before,” says Dr. Lee.

Having the foundation of the analytics system will now make it easier to explore other types of cancer and other diseases, he notes. And while the data warehouse started with only two types of  breast cancer “omic” data — gene expression and copy number variant data, measuring changes in the amount of DNA — many more will be added.

The breast cancer research was chosen as a test of the enterprise data warehouse because of the rich genomics data available on these 140 patients. Their de-identified information previously had been submitted as part of a federally funded project called The Cancer Genome Atlas (TCGA), a multi-center effort to produce comprehensive genomic maps of the most common cancers. Pitt was the largest contributor of tissue to the TCGA.

UPMC in October announced that it was working with technology partners Oracle, IBM, Informatica and dbMotion to create an enterprise data warehouse that would foster personalized medicine. With the help of these companies, UPMC is installing the hardware and software needed to bring together data from more than 200 sources of information across UPMC, UPMC Health Plan and outside entities, including labs and pharmacies. When the first phase of the multi-year project is completed in the spring of 2014, many researchers, clinicians and administrators will have secure, real-time access to data and analytic tools that fit their particular interests and needs.

A video of Pitt researchers discussing how they recently were able to electronically integrate clinical and genomic information on 140 patients previously treated for breast cancer is available at YouTube.

Trademark: Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.