Download Train Machine Learning Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Online shaming wikipedia , lookup

Transcript
CASE STUDY
Train Machine Learning Models
The Company
Protecting Children from Online
Bullying
Today’s kids are more plugged into the
Internet than ever before and are exposed
to real dangers online. Artimys Language
Technologies helps parents protect their
children from bullies and sexual predators
and warns parents if it detects signs of
suicidal behavior from their children. With
Artimys’s services, parents have the peace
of mind that their children can safely be
online.
The Challenge
Machine Learning is Difficult with
Highly Subjective and Evolving
Language
Artimys monitors the Internet in real-time
using algorithms to detect language that
poses a threat to a child online. The
language patterns associated with suicide
risk and sexual predation are more easily
detected in online messages. However,
bullying language is highly subjective
and continuously evolving. The company
was having difficulty building algorithms
that could accurately and systematically
identify true online bullying. For example,
the use of profanity can lead to falsepositives that the algorithm thinks are
bullying, but upon review, are not genuine
indicators of bullying language.
Artimys needed pristine ground-truth
data around bullying language to train its
machine learning models. That is to say,
it needed pre-labeled data that required
human analysis and interpretation – at
a massive scale – so the machine could
learn and create a model to detect
patterns within certain ambiguous bullying
phrases.
The Solution
A Platform and On-demand
Workforce that Evaluates Massive
Data Sets to Train Algorithms
Artimys’s training-data creation process
starts with 2 million messages drawn
from Twitter and other online message
boards. Next, Artimys feeds 40,000 highlikelihood conversation snippets directly
into CrowdFlower’s platform, allowing
for the rapid creation of accurate labels
by thousands of online workers. In a
matter of hours, CrowdFlower workers
provide more than 150,000 responses in
qualifying the 40,000-message dataset.
Artimys uses the resulting dataset to
train its algorithmic bully-detection model.
The company looked at other solutions,
but chose CrowdFlower because it
could produce results on a larger scale,
faster and with greater ease than the
alternatives.
R
G
B
35
75
122
R
G
B
108
119
136
“The results are
fantastic. It’s
intoxicating to watch
150,000 judgments
finish in a matter of
hours. CrowdFlower
is to labeling data, as
Microsoft Word is to
document processing
and Excel is to
financial analysis.
The complexity of
loading, hosting
and serving up the
data, and gathering
and aggregating
responses is simple
using CrowdFlower’s
platform.”
- Bob Dillon, CEO