This function finds the cosine similarity between two vectors of words.

cs(a, b, word_embeddings)

Arguments

a, b

characters or character vectors containing words in word_embeddings.

word_embeddings

named list of word embeddings. See formatWordEmbeddings.

Value

a matrix of cosine similarities

Details

Consider 2 words with word embedding representations \(a\) and \(b\). Then the cosine similarity is defined as $$sim_cos(a,b)=\frac{a \cdot b}{|| a ||_2 \cdot || b ||_2}$$

If \(A = (a_1,...,a_n)\) and \(B = (b_1,...,b_m)\), then the result is a matrix of \(m \times n\) dimension with each entry in cell (i, j) defined as \(sim_cos(a_j, b_i)\).

References

Goldberg, Y. (2017) Neural Network Methods for Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers.

See also

Examples

if (FALSE) { word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE) a <- "home" b <- "house" cs(a, b, word_embeddings) a <- c("home", "apartment", "mansion") b <- c("my", "dog", "sleeps", "in", "her", "dog", "house") cs(a, b, word_embeddings) }