Novel analysis of multi-species type 2 diabetes from gene expression data

  • Catherine Zheng

Western Sydney University thesis: Master's thesis

Abstract

Purpose: The incidence of type 2 diabetes is reaching epidemic levels. Today type 2 diabetes is the most common form of diabetes, accounting for 85 to 90 percent of diabetes cases. The James Lab at Garvan Institute for Medical Research are interested in gene expression in insulin resistance and diabetes. They have provided three gene expression data sets: a longitudinal mouse study involving the comparison of a high-fat diet to a standard diet with gene expression in two tissues, a mouse cell line study and a cross-sectional human study. The main goals of this research is to identify differentially expressed genes in both the mouse and human data, compare genomic expression patterns across species, human and mouse, and to focus on pathway analysis for detecting differential expression in predefined gene sets based on Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Methods: Three data sets are normalized in order to remove experimental effects arising from the microarray technology. Linear models can then be fitted on the normalized data using the limma package to identify genes undergoing differential expression. Each gene has its own expression profile and genes with similar profiles can be grouped together. We intend to try and use the data sets together to cluster samples based on gene profiles. In reality, biological processes are complicated with many molecules working together. The goal of annotating the genome is to link all information associated with gene products in order to learn how pathways function in the biological system. In situations where long lists of genes are found to be differentially expressed, we consider focusing on the analysis of gene sets because it is more sensible to investigate gene sets that are functionally related based on prior biological knowledge or experiments. We explore the potentially interesting gene sets using the Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Differentially expressed genes detected in the mouse data are mapped to their corresponding gene sets based on the Gene Ontology terms and KEGG pathways. Competitive and self-contained gene set tests (the mean-rank gene set test and the rotation gene set test) are performed for each comparison in the human data. The correlation adjusted mean-rank gene set test is included in testing insulin or glucose related GO terms and KEGG pathways. To test if any GO terms (Biological Process) or KEGG pathways are over-represented in a list of differentially expressed genes in the mouse or human data sets, we carry out the hypergeometric test. Results: We identify a large number of differentially expressed genes in the muscle tissue from the longitudinal mouse study. The cross-species gene set tests have revealed significant GO terms and KEGG pathways in each condition of obese patients relative to healthy controls. We compare the results produced by the mean-rank gene set test and the rotation gene set test. Significant insulin or glucose related gene sets are found using three gene set testing methods and the results are compared. The FOXO gene set is found to be significantly up-regulated in two contrasts in the human data.
Date of Award2012
Original languageEnglish

Keywords

  • diabetes
  • type 2
  • epidemiology
  • gene therapy
  • gene expression

Cite this

'