This function formats the word embeddings.
formatWordEmbeddings(embedding_matrix, normalize = TRUE, verbose = TRUE)
embedding_matrix | word embedding matrix. For a matrix containing information on \(n\) words, with
each word being represented by a \(d\) dimensional vector, |
---|---|
normalize | logical; should the word embeddings be normalized. |
verbose | logical; should the function report on progress. |
A named list of word embeddings.
This function downloads GloVe (https://nlp.stanford.edu/projects/glove/)
and formats the word embeddings. The result is a named list of word embeddings. Each
entry in the list is a numeric vector of length dimension
representing the word
embedding for that entry's name (see examples).
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/.
if (FALSE) { # temp <- tempfile() # download.file("http://nlp.stanford.edu/data/wordvecs/glove.6B.zip", temp) # embedding_matrix <- read.table(unz(temp, "glove.6B.300d.txt"), quote = "", # comment.char = "", stringsAsFactors = FALSE) word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE, verbose = TRUE) # Extract the word embedding for "the" word_embeddings[["the"]] }