Download Dr. Janeja will guest lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DINAMIC
Data Analytics for Big Data
Vandana P. Janeja
Information Systems Department,
University of Maryland,
Baltimore County, MD, USA
DINAMIC
Big Data
• What is Big Data?
• Recently much good science, whether physical,
biological, or social, has been forced to confront and has often benefited from - the Big Data
phenomenon.
• Big Data refers to the explosion in the quantity
(and sometimes, quality) of available and
potentially relevant data, largely the result of
recent and unprecedented advancements in
data recording and storage technology. (p.
115)
Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurement
and Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewatripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics:
Theory and Applications, Eighth World Congress of the Econometric Society, Cambridge
University Press, 115-122
DINAMIC
Big data spans four
dimensions:
Volume, Velocity, Variety,
and Veracity
DINAMIC
• Volume: Enterprises are awash
with ever-growing data of all
types,
– Terabytes-petabytes-exabytes—of
information.
– Turn 12 terabytes of Tweets created each
day into improved product sentiment
analysis
– Convert 350 billion annual meter readings
to better predict power consumption
DINAMIC
• Velocity: Sometimes 2 minutes is
too late.
– For time-sensitive processes such as
catching fraud, big data must be used as it
streams into your enterprise in order to
maximize its value.
– Scrutinize 5 million trade events created
each day to identify potential fraud
– Analyze 500 million daily call detail records
in real-time to predict customer churn faster
DINAMIC
• Variety: Big data is any type of
data - structured and unstructured
data
– text, sensor data, audio, video, click
streams, log files and more. New insights
are found when analyzing these data types
together.
– Monitor 100’s of live video feeds from
surveillance cameras to target points of
interest
– Exploit the 80% data growth in images,
video and documents to improve customer
satisfaction
DINAMIC
• Veracity: 1 in 3 business leaders
don’t trust the information they
use to make decisions.
– How can you act upon information if you
don’t trust it?
– Establishing trust in big data presents a
huge challenge as the variety and number
of sources grows.
DINAMIC
Analytics
DINAMIC
Is it all about algorithms
DINAMIC
DINAMIC
Will it make a difference if some of this data is from France and some from
Maryland ?
Will it make a difference if some of this data is from LA and some from
Baltimore ?
Will it make a difference if some of this data is from Maryland and some
from D.C ?
Will it make a difference if some of this data is from Howard County, MD
and some from Montgomery County, MD ?
US HIGHWAYS
42,000 Americans Are Killed
DINAMIC
•
•
•
•
On Highways Each Year
Nearly one-third of all fatal
crashes each year are caused
by substandard road
conditions and roadside
hazards.
Motor vehicle crashes cost
the United States $231 billion
annually, including $21 billion
from Federal and State tax
revenue.
Americans Waste $67 Billion
Each Year Due To Congestion
According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32.1% of all state
highway fatalities, and ranks 12 in pedestrian fatalities with 17.7% of all state highway
fatalities (USDOT)
Ref: http://www.house.gov/transportation/press/press2005/release9.html
DINAMIC
LA Times 4/27/09 12pm
DINAMIC
CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak
May Worsen : ABC News 2/27/09 1pm
Dr. William Schaffner, chairman of Preventive Medicine at
Vanderbilt University Medical Center in Nashville, Tenn., said
doctors like him have been advised by the CDC and state
health department to set up a system that would test patients
with flu-like symptoms and help define how widespread this
outbreak is. He said the severity of the virus is hard to gauge
because of the wide discrepancy in how it has affected
Mexicans and Americans, and because it is occurring in places
that are warm, which is very unusual.
"The genetic make up of this virus has influenza experts
scratching their heads," he said. "One of the things that has us
worried is that could this be a virus that could continue to
make mischief during the warmest parts of the year. That
would be a big thing. For a respiratory virus to be active
during the summer months" would be very unique.
DINAMIC
Knowledge Discovery (KDD) Process
– Data mining—core of
knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
May 22, 2017
Databases
Data Mining: Concepts
and Techniques
15
DINAMIC
Big Data Framework
• Automatic Parallelization
• Run-time
–
–
–
–
Data partitioning
Task scheduling
Handling machine failures
Managing inter-machine communication
• Completely transparent to the
programmer/analyst/user
DINAMIC
Relevant IS Courses
• IS 410 Introduction to Database
Design
• IS 420 Database Application
Development
• IS 427 Introduction to Artificial
Intelligence: Concepts and Applications
• IS 428 Data Mining Techniques and
Applications
• IS 498 Special Topics
• Independent studies