This site allows you to create a variety of simple word embeddings, as a way to compare how these word embeddings work with small texts (a single book or so). The way to compare this is by a series of visualizations, both of the words vectors themselves, and of the results of the "most similar" and "analogy" comparisons. Visualizing these comparisons is not typical, but I find it useful to get a better sense of the behaviors. The "analogy" comparisons also let you choose different ways of making comparisons and what is included.
For both creating and visualizing word embeddings, several examples are provided so you can get going right away. The visualizations can also be used with word embeddings calculated in other programs, and in fact, one of the examples are from the Stanford GLoVe embeddings.
All the calculations are done in javascript on your computer — no files are uploaded. This means that creating word vectors should probably be limited to small to medium texts, such as one or a few books.
The SVD option is recommended, but it is fairly slow. For example, with a minimum word count of 5, the Wizard of Oz books took about 5 minutes to finish using Firefox on my laptop, Frankenstein took about 45 minutes, and the Three Musketeers (not included here) took about 4 hours. Note that Safari is 3 or more times as fast as Safari in calculating SVD, while Chrome is somewhere in between.
To speed up the dimensionality reduction of SVD, this tool allows you to do random projection first, to do an initial quick dimensionality reduction (say to 5-10 times the final number dimentons dimensions — but less than the number of vocabulary items). Then SVD can be done in a second step to get the desired number of dimensions. Doing random projection before SVD gives similar, though not identical, results to doing SVD alone. The advantage is speed. For example, instead of 45 minutes, the Frankenstein example took 10 minutes with random projection (and only 3 minutes in Safari instead of Firefox). See the blog post (TBD) for more information.
Loading pre-calculated vectors is a better: loading the included 100,000 GLoVe vectors takes just a few seconds. However, memory will be an issue with large numbers of vectors and/or dimensions.
Text
Select one or more text files:
OR: select a sample text:
|
Tokenization Case-insensitive Omit punctuation |
Matrix type Window Minimum number of word occurrences Pointwise Mutual Information Non-negative Smoothing |
Dimensionality reduction
Random projection:
|
Name:
Step 1: Load one or more embeddings, either your own or one of the samples.
Load embedding:
AND/OR: select one or more sample embeddings:
Step 2: Check the embeddings you actually want to visualize.
Step 3: Choose one of the visualizations and fill in the forms, then click show.
For example, in the "items" field you might put asked,answered.
Analogy: A is to B as C is to X
For example using the provide GLoVe vectors: italy is to rome as france is to X.
is to as is to X