Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process
Podcast
Podcaster
Beschreibung
vor 17 Jahren
Background: Causal networks based on the vector autoregressive
(VAR) process are a promising statistical tool for modeling
regulatory interactions in a cell. However, learning these networks
is challenging due to the low sample size and high dimensionality
of genomic data. Results: We present a novel and highly efficient
approach to estimate a VAR network. This proceeds in two steps: (i)
improved estimation of VAR regression coefficients using an
analytic shrinkage approach, and (ii) subsequent model selection by
testing the associated partial correlations. In simulations this
approach outperformed for small sample size all other considered
approaches in terms of true discovery rate (number of correctly
identified edges relative to the significant edges). Moreover, the
analysis of expression time series data from Arabidopsis thaliana
resulted in a biologically sensible network. Conclusion:
Statistical learning of large-scale VAR causal models can be done
efficiently by the proposed procedure, even in the difficult data
situations prevalent in genomics and proteomics. Availability: The
method is implemented in R code that is available from the authors
on request.
(VAR) process are a promising statistical tool for modeling
regulatory interactions in a cell. However, learning these networks
is challenging due to the low sample size and high dimensionality
of genomic data. Results: We present a novel and highly efficient
approach to estimate a VAR network. This proceeds in two steps: (i)
improved estimation of VAR regression coefficients using an
analytic shrinkage approach, and (ii) subsequent model selection by
testing the associated partial correlations. In simulations this
approach outperformed for small sample size all other considered
approaches in terms of true discovery rate (number of correctly
identified edges relative to the significant edges). Moreover, the
analysis of expression time series data from Arabidopsis thaliana
resulted in a biologically sensible network. Conclusion:
Statistical learning of large-scale VAR causal models can be done
efficiently by the proposed procedure, even in the difficult data
situations prevalent in genomics and proteomics. Availability: The
method is implemented in R code that is available from the authors
on request.
Kommentare (0)