Cosine Similarity

This function finds the cosine similarity between two vectors of words.

cs(a, b, word_embeddings)

Arguments

a, b	characters or character vectors containing words in `word_embeddings`.
word_embeddings	named list of word embeddings. See `formatWordEmbeddings`.

Value

a matrix of cosine similarities

Details

Consider 2 words with word embedding representations $a$ and $b$. Then the cosine similarity is defined as $$sim_cos(a,b)=\frac{a \cdot b}{|| a ||_2 \cdot || b ||_2}$$

If $A = (a_1,...,a_n)$ and $B = (b_1,...,b_m)$, then the result is a matrix of $m \times n$ dimension with each entry in cell (i, j) defined as $sim_cos(a_j, b_i)$.

References

Goldberg, Y. (2017) Neural Network Methods for Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers.

Examples

if (FALSE) {

word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE)

a <- "home"
b <- "house"
cs(a, b, word_embeddings)

a <- c("home", "apartment", "mansion")
b <- c("my", "dog", "sleeps", "in", "her", "dog", "house")
cs(a, b, word_embeddings)
}

Arguments

Value

Details

References

See also

Examples