Representation learning has become an invaluable approach in machine learning and artificial intelligence. For instance, word embeddings such as word2vec, GloVe and fastText are widely used for tasks rangingfrom machine translation to sentiment analysis. Similarly, embeddings of (multi-)graphs such as RESCAL and node2vec have found important applications for learning in semantic and social networks.

In this project, we study a fundamental aspect of
representation learning, i.e., the **influence of the underlying geometry on
embedding structured data**. This spans research areas such as *machine learning in
non-Euclidean geometries*, *Riemannian geometry and optimization*, *graph theory*, as well as
*representation learning for structured data*.

For instance, finding a way to represent hierarchies explicitly in an embedding
space can be very beneficial to solve complex tasks in artificial intelligence
such as reasoning, lexical entailment, or few- and zero-shot learning. Moreover,
many complex symbolic datasets (e.g., text corpora and networks) are characterized
by *latent hierarchies*. However, modeling such hierarchical structures in
Euclidean space requires large embedding dimensions which, in turn, causes
significant problems with regard to the computational complexity and the
representation capacity of such embeddings.

In our work on hyperbolic embeddings, we
introduce a novel approach for **learning hierarchical representations** by
embedding entities into hyperbolic space. Due to its geometry, hyperbolic space
can be thought of as a *continuous version of trees* what allows us to learn
parsimonious representations that simultaneously capture hierarchy and
similarity. This a leads to significant improvements in terms of representation
capacity and generalization ability on data with latent hierarchies. For
instance, Figure 2 illustrates how efficient large hierarchies such
as the WordNet noun taxonomy can be embedded in hyperbolic space.

Moreover, due to the hierarchical nature of hyperbolic space we can identify hierarchical relationships directly from the embedding. In our ICML’18 paper, we used this property to discover hierarchies from similarity measurements. For instance, the image below shows an embedding of language similarities in hyperbolic space. It can be seen that the embedding does not only reflect the different language clusters nicely, but also captures the historical relationships between languages (where older languages are closer to the center of the disc).

Code to compute PoincarĂ© embeddings is available on Github. Code to comute embeddings in the Lorentz model, as proposed in our recent paper, will follow soon.