High-quality, high-throughput measurement of protein-DNA binding using HiTS-FLIP
Beschreibung
vor 8 Jahren
In order to understand in more depth and on a genome wide scale the
behavior of transcription factors (TFs), novel quantitative
experiments with high-throughput are needed. Recently, HiTS-FLIP
(High-Throughput Sequencing-Fluorescent Ligand Interaction
Profiling) was invented by the Burge lab at the MIT (Nutiu et al.
(2011)). Based on an Illumina GA-IIx machine for next-generation
sequencing, HiTS-FLIP allows to measure the affinity of fluorescent
labeled proteins to millions of DNA clusters at equilibrium in an
unbiased and untargeted way examining the entire sequence space by
Determination of dissociation constants (Kds) for all 12-mer DNA
motifs. During my PhD I helped to improve the experimental design
of this method to allow measuring the protein-DNA binding events at
equilibrium omitting any washing step by utilizing the TIRF (Total
Internal Reflection Fluorescence) based optics of the GA-IIx. In
addition, I developed the first versions of XML based controlling
software that automates the measurement procedure. Meeting the
needs for processing the vast amount of data produced by each run,
I developed a sophisticated, high performance software pipeline
that locates DNA clusters, normalizes and extracts the fluorescent
signals. Moreover, cluster contained k-mer motifs are ranked and
their DNA binding affinities are quantified with high accuracy. My
approach of applying phase-correlation to estimate the relative
translative Offset between the observed tile images and the
template images omits resequencing and thus allows to reuse the
flow cell for several HiTS-FLIP experiments, which greatly reduces
cost and time. Instead of using information from the sequencing
images like Nutiu et al. (2011) for normalizing the cluster
intensities which introduces a nucleotide specific bias, I estimate
the cluster related normalization factors directly from the protein
Images which captures the non-even illumination bias more
accurately and leads to an improved correction for each tile image.
My analysis of the ranking algorithm by Nutiu et al. (2011) has
revealed that it is unable to rank all measured k-mers. Discarding
all the clusters related to previously ranked k-mers has the side
effect of eliminating any clusters on which k-mers could be ranked
that share submotifs with previously ranked k-mers. This
shortcoming affects even strong binding k-mers with only one
mutation away from the top ranked k-mer. My findings show that
omitting the cluster deletion step in the ranking process overcomes
this limitation and allows to rank the full spectrum of all
possible k-mers. In addition, the performance of the ranking
algorithm is drastically reduced by my insight from a quadratic to
a linear run time. The experimental improvements combined with the
sophisticated processing of the data has led to a very high
accuracy of the HiTS-FLIP dissociation constants (Kds) comparable
to the Kds measured by the very sensitive HiP-FA assay (Jung et al.
(2015)). However, experimentally HiTS-FLIP is a very challenging
assay. In total, eight HiTS-FLIP experiments were performed but
only one showed saturation, the others exhibited Protein
aggregation occurring at the amplified DNA clusters. This
biochemical issue could not be remedied. As example TF for studying
the details of HiTS-FLIP, GCN4 was chosen which is a dimeric, basic
leucine zipper TF and which acts as the master regulator of the
amino acid starvation Response in Saccharomyces cerevisiae
(Natarajan et al. (2001)). The fluorescent dye was mOrange. The
HiTS-FLIP Kds for the TF GCN4 were validated by the HiP-FA assay
and a Pearson correlation coefficient of R=0.99 and a relative
error of delta=30.91% was achieved. Thus, a unique and
comprehensive data set of utmost quantitative precision was
obtained that allowed to study the complex binding behavior of GCN4
in a new way. My Downstream analyses reveal that the known 7-mer
consensus motif of GCN4, which is TGACTCA, is modulated by its
2-mer neighboring flanking regions spanning an affinity range over
two orders of magnitude from a Kd=1.56 nM to Kd=552.51 nM. These
results suggest that the common 9-mer PWM (Position Weight Matrix)
for GCN4 is insufficient to describe the binding behavior of GCN4.
Rather, an additional left and right flanking nucleotide is
required to extend the 9-mer to an 11-mer. My analyses regarding
mutations and related delta delta G values suggest long-range
interdependencies between nucleotides of the two dimeric half-sites
of GCN4. Consequently, models assuming positional independence,
such as a PWM, are insufficient to explain these interdependencies.
Instead, the full spectrum of affinity values for all k-mers of
appropriate size should be measured and applied in further analyses
as proposed by Nutiu et al. (2011). Another discovery were new
binding motifs of GCN4, which can only be detected with a method
like HiTS-FLIP that examines the entire sequence space and allows
for unbiased, de-novo motif discovery. All These new motifs contain
GTGT as a submotif and the data collected suggests that GCN4 binds
as monomer to these new motifs. Therefore, it might be even
possible to detect different binding modes with HiTS-FLIP. My
results emphasize the binding complexity of GCN4 and demonstrate
the advantage of HiTS-FLIP for investigating the complexity of
regulative processes.
behavior of transcription factors (TFs), novel quantitative
experiments with high-throughput are needed. Recently, HiTS-FLIP
(High-Throughput Sequencing-Fluorescent Ligand Interaction
Profiling) was invented by the Burge lab at the MIT (Nutiu et al.
(2011)). Based on an Illumina GA-IIx machine for next-generation
sequencing, HiTS-FLIP allows to measure the affinity of fluorescent
labeled proteins to millions of DNA clusters at equilibrium in an
unbiased and untargeted way examining the entire sequence space by
Determination of dissociation constants (Kds) for all 12-mer DNA
motifs. During my PhD I helped to improve the experimental design
of this method to allow measuring the protein-DNA binding events at
equilibrium omitting any washing step by utilizing the TIRF (Total
Internal Reflection Fluorescence) based optics of the GA-IIx. In
addition, I developed the first versions of XML based controlling
software that automates the measurement procedure. Meeting the
needs for processing the vast amount of data produced by each run,
I developed a sophisticated, high performance software pipeline
that locates DNA clusters, normalizes and extracts the fluorescent
signals. Moreover, cluster contained k-mer motifs are ranked and
their DNA binding affinities are quantified with high accuracy. My
approach of applying phase-correlation to estimate the relative
translative Offset between the observed tile images and the
template images omits resequencing and thus allows to reuse the
flow cell for several HiTS-FLIP experiments, which greatly reduces
cost and time. Instead of using information from the sequencing
images like Nutiu et al. (2011) for normalizing the cluster
intensities which introduces a nucleotide specific bias, I estimate
the cluster related normalization factors directly from the protein
Images which captures the non-even illumination bias more
accurately and leads to an improved correction for each tile image.
My analysis of the ranking algorithm by Nutiu et al. (2011) has
revealed that it is unable to rank all measured k-mers. Discarding
all the clusters related to previously ranked k-mers has the side
effect of eliminating any clusters on which k-mers could be ranked
that share submotifs with previously ranked k-mers. This
shortcoming affects even strong binding k-mers with only one
mutation away from the top ranked k-mer. My findings show that
omitting the cluster deletion step in the ranking process overcomes
this limitation and allows to rank the full spectrum of all
possible k-mers. In addition, the performance of the ranking
algorithm is drastically reduced by my insight from a quadratic to
a linear run time. The experimental improvements combined with the
sophisticated processing of the data has led to a very high
accuracy of the HiTS-FLIP dissociation constants (Kds) comparable
to the Kds measured by the very sensitive HiP-FA assay (Jung et al.
(2015)). However, experimentally HiTS-FLIP is a very challenging
assay. In total, eight HiTS-FLIP experiments were performed but
only one showed saturation, the others exhibited Protein
aggregation occurring at the amplified DNA clusters. This
biochemical issue could not be remedied. As example TF for studying
the details of HiTS-FLIP, GCN4 was chosen which is a dimeric, basic
leucine zipper TF and which acts as the master regulator of the
amino acid starvation Response in Saccharomyces cerevisiae
(Natarajan et al. (2001)). The fluorescent dye was mOrange. The
HiTS-FLIP Kds for the TF GCN4 were validated by the HiP-FA assay
and a Pearson correlation coefficient of R=0.99 and a relative
error of delta=30.91% was achieved. Thus, a unique and
comprehensive data set of utmost quantitative precision was
obtained that allowed to study the complex binding behavior of GCN4
in a new way. My Downstream analyses reveal that the known 7-mer
consensus motif of GCN4, which is TGACTCA, is modulated by its
2-mer neighboring flanking regions spanning an affinity range over
two orders of magnitude from a Kd=1.56 nM to Kd=552.51 nM. These
results suggest that the common 9-mer PWM (Position Weight Matrix)
for GCN4 is insufficient to describe the binding behavior of GCN4.
Rather, an additional left and right flanking nucleotide is
required to extend the 9-mer to an 11-mer. My analyses regarding
mutations and related delta delta G values suggest long-range
interdependencies between nucleotides of the two dimeric half-sites
of GCN4. Consequently, models assuming positional independence,
such as a PWM, are insufficient to explain these interdependencies.
Instead, the full spectrum of affinity values for all k-mers of
appropriate size should be measured and applied in further analyses
as proposed by Nutiu et al. (2011). Another discovery were new
binding motifs of GCN4, which can only be detected with a method
like HiTS-FLIP that examines the entire sequence space and allows
for unbiased, de-novo motif discovery. All These new motifs contain
GTGT as a submotif and the data collected suggests that GCN4 binds
as monomer to these new motifs. Therefore, it might be even
possible to detect different binding modes with HiTS-FLIP. My
results emphasize the binding complexity of GCN4 and demonstrate
the advantage of HiTS-FLIP for investigating the complexity of
regulative processes.
Weitere Episoden
In Podcasts werben
Kommentare (0)