Contextual Analysis of Gene Expression Data
Beschreibung
vor 18 Jahren
As measurement of gene expression using microarrays has become a
standard high throughput method in molecular biology, the analysis
of gene expression data is still a very active area of research in
bioinformatics and statistics. Despite some issues in quality and
reproducibility of microarray and derived data, they are still
considered as one of the most promising experimental techniques for
the understanding of complex molecular mechanisms. This work
approaches the problem of expression data analysis using contextual
information. While all analyses must be based on sound statistical
data processing, it is also important to include biological
knowledge to arrive at biologically interpretable results. After
giving an introduction and some biological background, in chapter 2
some standard methods for the analysis of microarray data including
normalization, computation of differentially expressed genes, and
clustering are reviewed. The first source of context information
that is used to aid in the interpretation of the data, is
functional annotation of genes. Such information is often
represented using ontologies such as gene ontology. GO annotations
are provided by many gene and protein databases and have been used
to find functional groups that are significantly enriched in
differentially expressed, or otherwise conspicuous genes. In gene
clustering approaches, functional annotations have been used to
find enriched functional classes within each cluster. In chapter 3,
a clustering method for the samples of an expression data set is
described that uses GO annotations during the clustering process in
order to find functional classes that imply a particularly strong
separation of the samples. The resulting clusters can be
interpreted more easily in terms of GO classes. The clustering
method was developed in joint work with Henning Redestig. More
complex biological information that covers interactions between
biological objects is contained in networks. Such networks can be
obtained from public databases of metabolic pathways, signaling
cascades, transcription factor binding sites, or high-throughput
measurements for the detection of protein-protein interactions such
as yeast two hybrid experiments. Furthermore, networks can be
inferred using literature mining approaches or network inference
from expression data. The information contained in such networks is
very heterogenous with respect to the type, the quality and the
completeness of the contained data. ToPNet, a software tool for the
interactive analysis of networks and gene expression data has been
developed in cooperation with Daniel Hanisch. The basic analysis
and visualization methods as well as some important concepts of
this tool are described in chapter 4. In order to access the
heterogeneous data represented as networks with annotated
experimental data and functions, it is important to provide
advanced querying functionality. Pathway queries allow the
formulation of network templates that can include functional
annotations as well as expression data. The pathway search
algorithm finds all instances of the template in a given network.
In order to do so, a special case of the well known subgraph
isomorphism problem has to be solved. Although the algorithm has
exponential running time in the worst case, some implementation
tricks make it run fast enough for practical purposes. Often, a
pathway query has many matching instances, and it is important to
assess the statistical significance of the individual instances
with respect to expression data or other criteria. In chapter 5 the
pathway query language and the pathway search algorithm are
described in detail and some theoretical properties are derived.
Furthermore, some scoring methods that have been implemented are
described. The possibility of combining different scoring schemes
for different parts of the query result in very flexible scoring
capabilities. In chapter 6, some applications of the methods are
described, using public data sets as well as data sets from
research projects. On the basis of the well studied public data
sets, it is demonstrated that the methods yield biologically
meaningful results. The other analyses show how new hypotheses can
be generated in more complex biological systems, but the validation
of these hypotheses can only be provided by new experiments.
Finally, an outlook is given on how the presented methods can
contribute to ongoing research efforts in the area of expression
data analysis, their applicability to other types of data (such as
proteomics data) and their possible extensions.
standard high throughput method in molecular biology, the analysis
of gene expression data is still a very active area of research in
bioinformatics and statistics. Despite some issues in quality and
reproducibility of microarray and derived data, they are still
considered as one of the most promising experimental techniques for
the understanding of complex molecular mechanisms. This work
approaches the problem of expression data analysis using contextual
information. While all analyses must be based on sound statistical
data processing, it is also important to include biological
knowledge to arrive at biologically interpretable results. After
giving an introduction and some biological background, in chapter 2
some standard methods for the analysis of microarray data including
normalization, computation of differentially expressed genes, and
clustering are reviewed. The first source of context information
that is used to aid in the interpretation of the data, is
functional annotation of genes. Such information is often
represented using ontologies such as gene ontology. GO annotations
are provided by many gene and protein databases and have been used
to find functional groups that are significantly enriched in
differentially expressed, or otherwise conspicuous genes. In gene
clustering approaches, functional annotations have been used to
find enriched functional classes within each cluster. In chapter 3,
a clustering method for the samples of an expression data set is
described that uses GO annotations during the clustering process in
order to find functional classes that imply a particularly strong
separation of the samples. The resulting clusters can be
interpreted more easily in terms of GO classes. The clustering
method was developed in joint work with Henning Redestig. More
complex biological information that covers interactions between
biological objects is contained in networks. Such networks can be
obtained from public databases of metabolic pathways, signaling
cascades, transcription factor binding sites, or high-throughput
measurements for the detection of protein-protein interactions such
as yeast two hybrid experiments. Furthermore, networks can be
inferred using literature mining approaches or network inference
from expression data. The information contained in such networks is
very heterogenous with respect to the type, the quality and the
completeness of the contained data. ToPNet, a software tool for the
interactive analysis of networks and gene expression data has been
developed in cooperation with Daniel Hanisch. The basic analysis
and visualization methods as well as some important concepts of
this tool are described in chapter 4. In order to access the
heterogeneous data represented as networks with annotated
experimental data and functions, it is important to provide
advanced querying functionality. Pathway queries allow the
formulation of network templates that can include functional
annotations as well as expression data. The pathway search
algorithm finds all instances of the template in a given network.
In order to do so, a special case of the well known subgraph
isomorphism problem has to be solved. Although the algorithm has
exponential running time in the worst case, some implementation
tricks make it run fast enough for practical purposes. Often, a
pathway query has many matching instances, and it is important to
assess the statistical significance of the individual instances
with respect to expression data or other criteria. In chapter 5 the
pathway query language and the pathway search algorithm are
described in detail and some theoretical properties are derived.
Furthermore, some scoring methods that have been implemented are
described. The possibility of combining different scoring schemes
for different parts of the query result in very flexible scoring
capabilities. In chapter 6, some applications of the methods are
described, using public data sets as well as data sets from
research projects. On the basis of the well studied public data
sets, it is demonstrated that the methods yield biologically
meaningful results. The other analyses show how new hypotheses can
be generated in more complex biological systems, but the validation
of these hypotheses can only be provided by new experiments.
Finally, an outlook is given on how the presented methods can
contribute to ongoing research efforts in the area of expression
data analysis, their applicability to other types of data (such as
proteomics data) and their possible extensions.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)