Download String Comparison in R

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
String
Comparisons
in R
Reuben
McCreanor
Motivation
R stringdist
String Comparison in R
An example
References
Reuben McCreanor
Stat 521 - Data Mining and Predictive Modeling
Thursday, September 2, 2015
Motivation: Why would you want to compare
strings?
String
Comparisons
in R
Reuben
McCreanor
Motivation
R stringdist
”No one should ever claim to be a data analyst until he or she
has done string manipulation” - Gaston Sanchez
Strings in R are largely lexicographic
An example
References
String comparisons can be used for:
Cleaning dirty data
Web search
Biomedical research
Matching in data frames
R stringdist: How do you compare strings?
String
Comparisons
in R
Reuben
McCreanor
Motivation
R stringdist
An example
References
Stringdist is a package that calculates distances between
strings
Adds functionality to R by allowing approximate string
matching
Very flexible - allows the user to set what should be
considered a match
Key Functions
amatch returns the position of the closest string match
aint indicates wether an element approximately matches
stringdist computes distances between different strings
phonetic translates text into phonetic codes
An example: Using stringdist to match similar
words
String
Comparisons
in R
Reuben
McCreanor
Motivation
R stringdist
An example
References
References and further reading
String
Comparisons
in R
Reuben
McCreanor
Motivation
R stringdist
Want to know more?
Handling and Processing Strings in R by Gaston Sanchez
http://gastonsanchez.com/Handling_and_Processing_
Strings_in_R.pdf
An example
References
References
Relational Operators in R https://stat.ethz.ch/
R-manual/R-devel/library/base/html/Comparison.html
R Tutorial - Characters http://www.r-tutor.com/
r-introduction/basic-data-types/character
Package stringdist https://cran.r-project.org/web/
packages/stringdist/stringdist.pdf