Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' or the Java back end 'CoreNLP' . A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.

Maintainer: Taylor B. Arnold <taylor.arnold at acm.org>

Author(s): Taylor B. Arnold*

Install package and any missing dependencies by running this line in your R console:

install.packages("cleanNLP")

Depends R (>= 2.10)
Imports dplyr(>=0.5.0), readr(>=1.1.0), Matrix(>=1.2), stringi, stats, methods, utils
Suggests reticulate(>=0.7), rJava(>=0.9-8), tokenizers(>=0.1.4), RCurl(>=1.95), knitr(>=1.15), rmarkdown(>=1.4), testthat(>=1.0.1), covr(>=2.2.2)
Enhances
Linking to
Reverse
depends
Reverse
imports
Reverse
suggests
Reverse
enhances
Reverse
linking to

Package cleanNLP
Materials
URL https://statsmaths.github.io/cleanNLP/
Task Views
Version 1.10.0
Published 2017-07-01
License LGPL-2
BugReports http://github.com/statsmaths/cleanNLP/issues
SystemRequirements Python (>= 2.7.0); spaCy (>= 1.8); Java (>= 7.0); Stanford CoreNLP (>= 3.7.0)
NeedsCompilation no
Citation
CRAN checks cleanNLP check results
Package source cleanNLP_1.10.0.tar.gz