Data Integration on the (Semantic) Web with Rules and Rich Unification
Beschreibung
vor 14 Jahren
For the last decade a multitude of new data formats for the World
Wide Web have been developed, and a huge amount of heterogeneous
semi-structured data is flourishing online. With the ever
increasing number of documents on the Web, rules have been
identified as the means of choice for reasoning about this data,
transforming and integrating it. Query languages such as SPARQL and
rule languages such as Xcerpt use compound queries that are matched
or unified with semi-structured data. This notion of unification is
different from the one that is known from logic programming engines
in that it (i) provides constructs that allow queries to be
incomplete in several ways (ii) in that variables may have
different types, (iii) in that it results in sets of substitutions
for the variables in the query instead of a single substitution and
(iv) in that subsumption between queries is much harder to decide
than in logic programming. This thesis abstracts from Xcerpt query
term simulation, SPARQL graph pattern matching and XPath XML
document matching, and shows that all of them can be considered as
a form of rich unification. Given a set of mappings between
substitution sets of different languages, this abstraction opens up
the possibility for format-versatile querying, i.e. combination of
queries in different formats, or transformation of one format into
another format within a single rule. To show the superiority of
this approach, this thesis introduces an extension of Xcerpt called
Xcrdf, and describes use-cases for the combined querying and
integration of RDF and XML data. With XML being the predominant Web
format, and RDF the predominant Semantic Web format, Xcrdf extends
Xcerpt by a set of RDF query terms and construct terms, including
query primitives for RDF containers collections and reifications.
Moreover, Xcrdf includes an RDF path query language called RPL that
is more expressive than previously proposed polynomial-time RDF
path query languages, but can still be evaluated in polynomial time
combined complexity. Besides the introduction of this framework for
data integration based on rich unification, this thesis extends the
theoretical knowledge about Xcerpt in several ways: We show that
Xcerpt simulation unification is decidable, and give complexity
bounds for subsumption in several fragments of Xcerpt query terms.
The proof is based on a set of subsumption monotone query term
transformations, and is only feasible because of the injectivity
requirement on subterms of Xcerpt queries. The proof gives rise to
an algorithm for deciding Xcerpt query term simulation. Moreover,
we give a semantics to locally and weakly stratified Xcerpt
programs, but this semantics is applicable not only to Xcerpt, but
to any rule language with rich unification, including multi-rule
SPARQL programs. Finally, we show how Xcerpt grouping
stratification can be reduced to Xcerpt negation stratification,
thereby also introducing the notion of local grouping
stratification and weak grouping stratification.
Wide Web have been developed, and a huge amount of heterogeneous
semi-structured data is flourishing online. With the ever
increasing number of documents on the Web, rules have been
identified as the means of choice for reasoning about this data,
transforming and integrating it. Query languages such as SPARQL and
rule languages such as Xcerpt use compound queries that are matched
or unified with semi-structured data. This notion of unification is
different from the one that is known from logic programming engines
in that it (i) provides constructs that allow queries to be
incomplete in several ways (ii) in that variables may have
different types, (iii) in that it results in sets of substitutions
for the variables in the query instead of a single substitution and
(iv) in that subsumption between queries is much harder to decide
than in logic programming. This thesis abstracts from Xcerpt query
term simulation, SPARQL graph pattern matching and XPath XML
document matching, and shows that all of them can be considered as
a form of rich unification. Given a set of mappings between
substitution sets of different languages, this abstraction opens up
the possibility for format-versatile querying, i.e. combination of
queries in different formats, or transformation of one format into
another format within a single rule. To show the superiority of
this approach, this thesis introduces an extension of Xcerpt called
Xcrdf, and describes use-cases for the combined querying and
integration of RDF and XML data. With XML being the predominant Web
format, and RDF the predominant Semantic Web format, Xcrdf extends
Xcerpt by a set of RDF query terms and construct terms, including
query primitives for RDF containers collections and reifications.
Moreover, Xcrdf includes an RDF path query language called RPL that
is more expressive than previously proposed polynomial-time RDF
path query languages, but can still be evaluated in polynomial time
combined complexity. Besides the introduction of this framework for
data integration based on rich unification, this thesis extends the
theoretical knowledge about Xcerpt in several ways: We show that
Xcerpt simulation unification is decidable, and give complexity
bounds for subsumption in several fragments of Xcerpt query terms.
The proof is based on a set of subsumption monotone query term
transformations, and is only feasible because of the injectivity
requirement on subterms of Xcerpt queries. The proof gives rise to
an algorithm for deciding Xcerpt query term simulation. Moreover,
we give a semantics to locally and weakly stratified Xcerpt
programs, but this semantics is applicable not only to Xcerpt, but
to any rule language with rich unification, including multi-rule
SPARQL programs. Finally, we show how Xcerpt grouping
stratification can be reduced to Xcerpt negation stratification,
thereby also introducing the notion of local grouping
stratification and weak grouping stratification.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)