Interactive Visualisation

Sets of Embeddings

The Embedding object merely has support for matplotlib, but the EmbeddingSet has support for interactive tools. It is also more convenient. You can create an

Direct Creation

You can create these objects directly.

import spacy
from whatlies.embedding import Embedding
from whatlies.embeddingset import EmbeddingSet

nlp = spacy.load("en_core_web_md")

words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
         "cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
         "dog", "cat", "mouse", "red", "bluee", "green", "yellow", "water",
         "person", "family", "brother", "sister"]

emb = EmbeddingSet({t.text: Embedding(t.text, t.vector) for t in nlp.pipe(words)})

This can be especially useful if you're creating your own embeddings.

Via Languages

But odds are that you just want to grab a language model from elsewhere. We've added backends to our library and this can be a convenient method of getting sets of embeddings (typically more performant too).

from whatlies.language import SpacyLanguage

words = ["prince", "princess", "nurse", "doctor", "banker", "man", "woman",
         "cousin", "neice", "king", "queen", "dude", "guy", "gal", "fire",
         "dog", "cat", "mouse", "red", "bluee", "green", "yellow", "water",
         "person", "family", "brother", "sister"]

lang = SpacyLanguage("en_core_web_md")
emb = lang[words]

Plotting

Either way, with an EmbeddingSet you can create meaningful interactive charts.

emb.plot_interactive('man', 'woman')

We can also retreive embeddings from the embeddingset.

emb['king']

Remember the operations we did before? We can also do that on these sets!

new_emb = emb | (emb['king'] - emb['queen'])
new_emb.plot_interactive('man', 'woman')

Combining Charts

Often you'd like to compare the effect of a mapping. Since we make our interactive charts with altair we get a nice api to stack charts next to eachother.

orig_chart = emb.plot_interactive('man', 'woman')
new_chart = new_emb.plot_interactive('man', 'woman')
orig_chart | new_chart

You may have noticed that these charts appear in the documentation, fully interactively. This is another nice feature of Altair, the charts can be serialized in a json format and hosted on the web.

More Transformation

But there are more transformations that we might visualise. Let's demonstrate two here.

from whatlies.transformers import Pca, Umap

orig_chart = emb.plot_interactive('man', 'woman')
pca_emb = emb.transform(Pca(2))
umap_emb = emb.transform(Umap(2))

The transform method is able to take a transformation, let's say pca(2) and this will change the embeddings in the set. It might also create new embeddings. In case of pca(2) it will also add two embeddings which represent the principal components. This is nice because that means that we can plot along those axes.

plot_pca = pca_emb.plot_interactive()
plot_umap = umap_emb.plot_interactive()
plot_pca | plot_umap

Operators

Note that the operators that we've seen before can also be added to a transformation pipeline.

emb.transform(lambda e: e | (e["man"] - e["woman"]))
# (Emb | (Emb[man] - Emb[woman])).pca_2()

More Components

Suppose now that we'd like to visualise three principal components. We could do this.

pca_emb = emb.transform(Pca(3))
p1 = pca_emb.plot_interactive()
p2 = pca_emb.plot_interactive(2, 1)
p1 | p2

More Charts

Let's not draw two components at a time, let's draw all of them.

pca_emb.plot_interactive_matrix(0, 1, 2)

Zoom in on that chart. Don't forget to click and drag. Can we interpret the components?