A gene-set approach to analyze copy number alterations in breast cancer
Copy number alterations (CNAs) have been widely reported as an oncogenic or tumor suppressive feature in cancers. Since CNAs simultaneously affect a large number of genes, previous single gene-based methods are limited in revealing the landscape of CNAs. A systematic method to explore the influence of CNAs on cancer progression is needed. In the present study, a total of 1,045 genome-wide array comparative genomic hybridization (aCGH) data sets and 529 gene expression profiles of breast tumors were collected from The Cancer Genome Atlas (TCGA). We devised an algorithm (called Gene Set analysis for Copy number Alteration, or GSCA) to identify functional gene sets exhibiting significant enrichment in CNAs based on Fisher’s exact test. Gene expression profiles of the enriched gene sets were analyzed to evaluate the influence of CNAs on gene expression changes. We also integrated survival analysis to pinpoint prognostic CNA-affected gene sets. Thirty-five and ten gene sets were identified with significant enrichment in copy number gains and losses, respectively. Forty-four out of the 45 (98%) gene sets showed concordant significant gene expression changes with the CNAs. In addition, survival analysis discovered 31 gene sets in which copy number enrichment was associated with patient survival, including several important transcriptional factor target gene sets, such as MYC. The results indicate that CNAs play essential roles in breast tumor progression and lead to differential clinical outcomes. In conclusion, here we devised a novel method for analyzing and interpreting CNA data at the level of functional gene sets. We demonstrated its capability of identifying CNA-affected, as well as CNA-driven, biological functions and pathways in breast cancer. The analysis workflow can be widely applied to other cancers and provides biological insights into complex mechanisms governing tumor progression.