Match cheaters

catch_em(flist, n_grams = 10, time_lim = 1L, progress_bar = TRUE)

Arguments

flist

a list of documents (.doc/.docx/.pdf). A full/relative path must be provided.

n_grams

see ngram package.

time_lim

max time in seconds for each comparison. Defult is 1 second, had no problem comparing documents with 50K words.

progress_bar

Should a progress bar be printed to the console?

Value

A correlation matrix of class chtrs with each cell indicating the match (0-1) between two of the documents.

Examples

if (interactive()) { files <- choose.files() catch_em(files) }