An integrated approach for identifying genes associated with chemotherapy resistance in high-grade serous epithelial ovarian cancer

Ahmed Hossain, Gias Uddin Ahsan, Hayatun Nabi


Treatment with chemotherapy is important in limiting the intensity of serous epithelial ovarian cancer. However, not all patients are sensitive to platinum chemotherapy corresponding to longer progression-free survival (PFS >8 months). Koti et al.[1] revealed a set of 204 discriminating genes possessing expression levels, which could influence differential chemotherapy response between the platinum-resistant and platinum-sensitive group of patients. They considered Welch two-sample t-test and non-parametric Mann-Whitney U test to identify the differentially expressed genes. However, both the statistical methods turned out to be unsuitable for microarray data. In this paper, we used three alternative statistical methods to select a combined list of genes and compared the genes that were proposed by Koti et al.[1]. Subsequently, we recommended using sparse principal component analysis (sparse PCA) to identify a final list of genes. Sparse PCA incorporates correlation into account among the genes and helps to draw a biologically important gene discovery. We identified 77 differentially expressed genes, which include 11 new genes that can separate the groups of patients who are platinum-resistant and platinum-sensitive to the chemotherapy. The integrative approach can also be effective in another high dimensional dataset to compare between two groups.


ovarian cancer; chemotherapy resistance; gene expression; area under receiver operating characteristic curve; sparse principal component analysis

Full Text:



Koti M, Gooding RJ, Nuin P, Haslehurst A, Crane C, et al. Identification of the IGF1/PI3K/NFB/ERK gene signalling networks associated with chemotherapy resistance and treatment response in high-grade serous epithelial ovarian cancer. BMC Cancer 2013; 13: 549. doi: 10.1186/1471-2407-13-549.

Cannistra SA. Cancer of the ovary. N Engl J Med 2004; 351: 2519–2529. doi: 10.1056/NEJMra041842.

Dressman HK, Berchuck A, Chan G, Zhai J, Bild A, et al. An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 2007; 25: 517–525. doi: 10.1200/JCO.2006.06.3743.

Hossain A, Khan HTA. Identification of genomic markers correlated with sensitivity in solid tumors to Dasatinib using sparse principal components. J Applied Statistics 2016, 43(14): 2538–2549. doi: 10.1080/02664763.2016.1142941.

Tusher VG, Tibshirani R, Chu G. Significance analysis of microarray applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98(9): 5116–5121. doi: 10.1073/pnas.091062498.

Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiment. Stat Appl Genet Mol Biol 2004; 3(1): 1–25. doi: 10.2202/1544-6115.1027.

Smyth GK. Limma: Linear models for microarray data. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (editors). Bioinformatics and computational biology solutions using R and bioconductor. New York: Springer; 2005. p. 397–420. doi: 10.1007/0-387-29362-0_23.

Efron B, Tibshirani R, Storey JD, Tusher V. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96(456): 1151–1160.

Pepe MS, Longton G, Anderson GL, Schummer M. Selecting differentially expressed genes from microarray experiments. Biometrics 2003; 59: 133–142. doi: 10.1111/1541-0420.00016.

Hossain A, Beyene J. An improved method on Wilcoxon rank sum test for gene selection from microarray experiments. Commun Stat Simul Comput2013; 42 (7): 1563–1577. doi: 10.1080/03610918.2012.667479.

Hossain A, Beyene J. Estimation of weighted log partial area under the ROC curve and its application to MicroRNA expression data. Stat Appl Genet Mol Biol 2013; 12(6): 743–755. doi: 10.1515/sagmb-2013-0035.

Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat 2006; 15: 265–286. doi: 10.1198/106186006X113430.

Witten D, Tibshirani R, Hastie T. A penalized matrix decomposition, with application to sparse principal components and canonical correlation analysis. Biostatistics 2009; 10: 515–534. doi: 10.1093/biostatistics/kxp008.

Tibshirani R, Chu G, Narasimhan B, Li J. samr: SAM: Significance analysis of microarrays [Internet]. R package version 2.0: Stanford University; 2011 [cited 2016 Oct 24]. Available from:

Troyanskaya OG, Garber M, Brown P, Botstein D, Altman RB. Non-parametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 2002; 18(11): 1454–1461. doi: 10.1093/bioinformatics/18.11.1454.

Raychaudhuri S, Stuart JM, Altman RB. Principal components analysis to summarize microarray experiments: Application to sporulation time series. Pac Symp Biocomput 2000; 5: 452–463.

Zou H, Hastie T. elasticnet: Elastic-net for sparse estimation and sparse PCA [Internet]. R package version 1.1: University of Minnesota; 2013 [cited 2016 Oct 24]. Available from:

Buja A, Cook D, Swayne DF. Interactive high-dimensional data visualization, J Comput Graph Stat 1996; 5(1): 78–99. doi: 10.1080/10618600.1996.10474696.

Michaels GS, Carr DB, Askenazi M, Fuhrman S, Wen X. et al. Cluster analysis and data visualization of large-scale gene expression data. Pac Symp Biocomput 1998; 3: 42–53.



  • There are currently no refbacks.

Copyright (c) 2018 Ahmed Hossain, Gias Uddin Ahsan, Hayatun Nabi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.