Implementation of Web Query Languages Reconsidered
Beschreibung
vor 16 Jahren
Visions of the next generation Web such as the "Semantic Web" or
the "Web 2.0" have triggered the emergence of a multitude of data
formats. These formats have different characteristics as far as the
shape of data is concerned (for example tree- vs. graph-shaped).
They are accompanied by a puzzlingly large number of query
languages each limited to one data format. Thus, a key feature of
the Web, namely to make it possible to access anything published by
anyone, is compromised. This thesis is devoted to versatile query
languages capable of accessing data in a variety of Web formats.
The issue is addressed from three angles: language design, common,
yet uniform semantics, and common, yet uniform evaluation. % Thus
it is divided in three parts: First, we consider the query language
Xcerpt as an example of the advocated class of versatile Web query
languages. Using this concrete exemplar allows us to clarify and
discuss the vision of versatility in detail. Second, a number of
query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated
into a common intermediary language, CIQLog. This language has a
purely logical semantics, which makes it easily amenable to
optimizations. As a side effect, this provides the, to the best of
our knowledge, first logical semantics for XQuery and SPARQL. It is
a very useful tool for understanding the commonalities and
differences of the considered languages. Third, the intermediate
logical language is translated into a query algebra, CIQCAG. The
core feature of CIQCAG is that it scales from tree- to graph-shaped
data and queries without efficiency losses when tree-data and
-queries are considered: it is shown that, in these cases, optimal
complexities are achieved. CIQCAG is also shown to evaluate each of
the aforementioned query languages with a complexity at least as
good as the best known evaluation methods so far. For example,
navigational XPath is evaluated with space complexity O(q d) and
time complexity O(q n) where q is the query size, n the data size,
and d the depth of the (tree-shaped) data. CIQCAG is further shown
to provide linear time and space evaluation of tree-shaped queries
for a larger class of graph-shaped data than any method previously
proposed. This larger class of graph-shaped data, called
continuous-image graphs, short CIGs, is introduced for the first
time in this thesis. A (directed) graph is a CIG if its nodes can
be totally ordered in such a manner that, for this order, the
children of any node form a continuous interval. CIQCAG achieves
these properties by employing a novel data structure, called
sequence map, that allows an efficient evaluation of tree-shaped
queries, or of tree-shaped cores of graph-shaped queries on any
graph-shaped data. While being ideally suited to trees and CIGs,
the data structure gracefully degrades to unrestricted graphs. It
yields a remarkably efficient evaluation on graph-shaped data that
only a few edges prevent from being trees or CIGs.
the "Web 2.0" have triggered the emergence of a multitude of data
formats. These formats have different characteristics as far as the
shape of data is concerned (for example tree- vs. graph-shaped).
They are accompanied by a puzzlingly large number of query
languages each limited to one data format. Thus, a key feature of
the Web, namely to make it possible to access anything published by
anyone, is compromised. This thesis is devoted to versatile query
languages capable of accessing data in a variety of Web formats.
The issue is addressed from three angles: language design, common,
yet uniform semantics, and common, yet uniform evaluation. % Thus
it is divided in three parts: First, we consider the query language
Xcerpt as an example of the advocated class of versatile Web query
languages. Using this concrete exemplar allows us to clarify and
discuss the vision of versatility in detail. Second, a number of
query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated
into a common intermediary language, CIQLog. This language has a
purely logical semantics, which makes it easily amenable to
optimizations. As a side effect, this provides the, to the best of
our knowledge, first logical semantics for XQuery and SPARQL. It is
a very useful tool for understanding the commonalities and
differences of the considered languages. Third, the intermediate
logical language is translated into a query algebra, CIQCAG. The
core feature of CIQCAG is that it scales from tree- to graph-shaped
data and queries without efficiency losses when tree-data and
-queries are considered: it is shown that, in these cases, optimal
complexities are achieved. CIQCAG is also shown to evaluate each of
the aforementioned query languages with a complexity at least as
good as the best known evaluation methods so far. For example,
navigational XPath is evaluated with space complexity O(q d) and
time complexity O(q n) where q is the query size, n the data size,
and d the depth of the (tree-shaped) data. CIQCAG is further shown
to provide linear time and space evaluation of tree-shaped queries
for a larger class of graph-shaped data than any method previously
proposed. This larger class of graph-shaped data, called
continuous-image graphs, short CIGs, is introduced for the first
time in this thesis. A (directed) graph is a CIG if its nodes can
be totally ordered in such a manner that, for this order, the
children of any node form a continuous interval. CIQCAG achieves
these properties by employing a novel data structure, called
sequence map, that allows an efficient evaluation of tree-shaped
queries, or of tree-shaped cores of graph-shaped queries on any
graph-shaped data. While being ideally suited to trees and CIGs,
the data structure gracefully degrades to unrestricted graphs. It
yields a remarkably efficient evaluation on graph-shaped data that
only a few edges prevent from being trees or CIGs.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)