Integrative Molecular Biology and Biotechnology


Genomic and Transcriptomic Data Integration in American Patients with Thymoma

Author(s): Cristobal De Leon Garcia

Thymoma (THYM) is a very rare malignant type of cancer that forms covering the outside surface of the thymus. Thymoma can spread to the lungs, chest wall, major vessels, esophagus, or around of the heart. Cancer can spread through tissue, the lymph system, and the blood. In the tissue, by growing into nearby areas, the lymph system and the blood is used for the cancer cells to travel from thy,us through the lymph and blood vessels to other parts of the body (metastatic tumor). The metastatic tumor is the same type of cancer as the primary tumor. Thymoma are always aggressive tumors of high-grade malignant. Thymomas occur in all ages, but its incidence is more likely in patients between 35 to 70 years old. The prognosis for thymoma is dependent upon the stage of the tumor. Thymomas tend to be slow-growing tumors, and the prognosis is good to excellent for those with stage 1 or stage 2 thymoma. Even 83% of patients with stage 3 thymoma are alive 10 years after diagnosis. The 10-year survival rate for stage 4 thymoma is approximately 47%. Less than one person per 1.5 million people will develop a thymoma. This means about 400 people per year in the U.S. develop thymoma. Thymoma represents less than 1.5% of all malignancies. Thymoma is a cancer with low frequency of mutations specifically insertion and deletion polymorphisms such as Copy Number Alterations (CNA) linked to messenger Ribonucleic Acid (mRNA) transcripts. In this work was carried out omic data integration using CNA genomic data and transcripts from mRNA sequence counts from 124 American patients with different levels of infiltration and invasiveness of THYM analyzing 16383 genes and 60488 transcripts separately. For analyzing CNA genes, Component Principal Analysis (PCA) was carried out and for analyzing mRNA sequences counts, Differential Expression Analysis was carried out. After CNA and mRNA separately analysis, 40 genes and 47 transcripts highly significant were found, which were used in the integration analysis. Integrative analysis was carried out using Sparse Least Square (sPLS) methodology using mixOmics package in R software. Integrative analysis was based on graphical analysis from two output plots. Samples graphical representation, from RNAseq and CNA data show the clustering between samples. On RNAseq, samples showed clustering around central zero of all types of tumors, without clear separation between them. This indicates variance of different samples is not explained by the transcripts (genes). Clusters top and bottom of central zero especially tumor with less infiltration and invasiveness explained the most proportion of variance. On CNA genes, samples showed clear clustering’s according with types of tumors. Tumor of most infiltration and invasiveness were clustered more closely near of central zero and tumor with most infiltration and invasiveness were clustered more closely to the central zero. A few samples were clustered further at central zero especially samples belonging tumors with less infiltration and invasiveness indicating some CNA genes have a weak influence on tumors with less infiltration and invasiveness. Samples from both RNAseq and CNA genes showed a strong negative correlation between them. This indicates both types of genes mRNA transcripts and CNA genes are highly expressed in non-aggressive tumors. According to genes graphical representation, one important CNA gene were highly expressed, ENSG00000237847 pseudo gene (FAM231A) located on chromosome 1 seen keeping direct positive correlation with ENSG00000273415 RNA gene (LINC02725, Long Intergenic Non-protein coding).

Mission and Vision Membership Withdrawal Policy Submit Paper Publication ethics
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License © 2018