This function formats the word embeddings.

formatWordEmbeddings(embedding_matrix, normalize = TRUE, verbose = TRUE)

Arguments

embedding_matrix

word embedding matrix. For a matrix containing information on \(n\) words, with each word being represented by a \(d\) dimensional vector, embedding_matrix should have \(n\) rows and \(d+1\) columns where the first column contains the words.

normalize

logical; should the word embeddings be normalized.

verbose

logical; should the function report on progress.

Value

A named list of word embeddings.

Details

This function downloads GloVe (https://nlp.stanford.edu/projects/glove/) and formats the word embeddings. The result is a named list of word embeddings. Each entry in the list is a numeric vector of length dimension representing the word embedding for that entry's name (see examples).

References

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/.

Examples

if (FALSE) { # temp <- tempfile() # download.file("http://nlp.stanford.edu/data/wordvecs/glove.6B.zip", temp) # embedding_matrix <- read.table(unz(temp, "glove.6B.300d.txt"), quote = "", # comment.char = "", stringsAsFactors = FALSE) word_embeddings <- formatWordEmbeddings(embedding_matrix_example, normalize = TRUE, verbose = TRUE) # Extract the word embedding for "the" word_embeddings[["the"]] }