Beschreibung

vor 11 Jahren
Relational learning is concerned with learning from data where
information is primarily represented in form of relations between
entities. In recent years, this branch of machine learning has
become increasingly important, as relational data is generated in
an unprecedented amount and has become ubiquitous in many fields of
application such as bioinformatics, artificial intelligence and
social network analysis. However, relational learning is a very
challenging task, due to the network structure and the high
dimensionality of relational data. In this thesis we propose that
tensor factorization can be the basis for scalable solutions for
learning from relational data and present novel tensor
factorization algorithms that are particularly suited for this
task. In the first part of the thesis, we present the RESCAL model
-- a novel tensor factorization for relational learning -- and
discuss its capabilities for exploiting the idiosyncratic
properties of relational data. In particular, we show that, unlike
existing tensor factorizations, our proposed method is capable of
exploiting contextual information that is more distant in the
relational graph. Furthermore, we present an efficient algorithm
for computing the factorization. We show that our method achieves
better or on-par results on common benchmark data sets, when
compared to current state-of-the-art relational learning methods,
while being significantly faster to compute. In the second part of
the thesis, we focus on large-scale relational learning and its
applications to Linked Data. By exploiting the inherent sparsity of
relational data, an efficient computation of RESCAL can scale up to
the size of large knowledge bases, consisting of millions of
entities, hundreds of relations and billions of known facts. We
show this analytically via a thorough analysis of the runtime and
memory complexity of the algorithm as well as experimentally via
the factorization of the YAGO2 core ontology and the prediction of
relationships in this large knowledge base on a single desktop
computer. Furthermore, we derive a new procedure to reduce the
runtime complexity for regularized factorizations from O(r^5) to
O(r^3) -- where r denotes the number of latent components of the
factorization -- by exploiting special properties of the
factorization. We also present an efficient method for including
attributes of entities in the factorization through a novel coupled
tensor-matrix factorization. Experimentally, we show that RESCAL
allows us to approach several relational learning tasks that are
important to Linked Data. In the third part of this thesis, we
focus on the theoretical analysis of learning with tensor
factorizations. Although tensor factorizations have become
increasingly popular for solving machine learning tasks on various
forms of structured data, there exist only very few theoretical
results on the generalization abilities of these methods. Here, we
present the first known generalization error bounds for tensor
factorizations. To derive these bounds, we extend known bounds for
matrix factorizations to the tensor case. Furthermore, we analyze
how these bounds behave for learning on over- and understructured
representations, for instance, when matrix factorizations are
applied to tensor data. In the course of deriving generalization
bounds, we also discuss the tensor product as a principled way to
represent structured data in vector spaces for machine learning
tasks. In addition, we evaluate our theoretical discussion with
experiments on synthetic data, which support our analysis.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: