Download 講座貢三元教授美國普林斯頓大學電機工程學系時間2015/12/24

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data model wikipedia , lookup

Data center wikipedia , lookup

Big data wikipedia , lookup

Forecasting wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data analysis wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
國際傑出學者講座
講題
Machine Learning Approach to Visualization
and Classification of Big Data
講座
貢三元 教授
美國普林斯頓大學 電機工程學系
時間
2015/12/24 (四) 14:00-16:00
地點
國立中山大學 電資大樓 EC6011室
摘要
In the big data era, we are experiencing a phenomenon of “digital everything”:
• Massive digital data are being rapidly captured in digital format, including digital
book/voice/image/video/commerce.
• They come from divergent types of sources, from physical (sensor/IoT) to social and cyber (web) types.
An implicit principle behind big data analysis is to make use of ALL of the available data, vectorial or
non-vectorial (i.e. the variety), whether they are messy, imprecise, or incomplete (i.e. the veracity). Due
to its quantitative (volume and velocity) and qualitative (variety) challenges, it is imperative to address
various computational aspects of big data. For big data, the curse of high feature dimensionality is causing
grave concerns on computational complexity and over-training. In this talk, we shall explore various
projection methods for dimension reduction – a prelude to visualization of big data. Note that
visualization tools are meant to supplement (instead of replace) the domain expertise (e.g. a cardiologist)
and provide a big picture to help users formulate critical questions and subsequently postulate heuristic
and insightful answers. A popular visualization tool for unsupervised learning is Principal Component
Analysis (PCA), which aims at best recoverability of the original data in the Euclidean Vector Space (EVS).
DCA can be viewed as a supervised PCA in a Canonical Vector Space (CVS), which aims at maximizing the
discriminant distance in CVS. For supervised application scenarios, our simulations have confirmed that
DCA far outperforms PCA, both numerically and visually.
Big data analysis will require novel machine learning tools to seamlessly integrate information from
diversified sources, in order to unravel information hidden in the big data. It usually involves nonlinear
data analysis, for which kernel machine learning (KML) and deep learning (DML) represent the two most
promising approaches. KML can be formulated in either the primal domain (in the original Euclidean
space) or in the dual domain (in the induced kernel space). Even though the two formulations are
mathematically equivalent, they require very different computational costs. This talk will address how to
exploits the such a dual–cost strategy to gain computational saving by orders of magnitude, both in the
learning and classification phase. The choice between the two domains could also result in different
ramification in privacy protection, which will be deferred to the second talk.
Ultimately, big data analysis will call for special hardware and software technologies with architectural
platform based on machine learning systems. KML and DML differs in the fact that KML is based on
pairwise quantification of any pair of targeted objects while DML extract feature vector for each of the
targeted objects, using a cascade of feature extraction layers. In this sense, the two learning approaches
complement each other very well. This points to a potentially fruitful system integration combining both
KML and DML technologies.
教育部「行動寬頻尖端技術跨校教學聯盟: 行動寬頻網路與應用-小細胞基站計畫」
聯絡人/ 國立中山大學電機系 電腦通訊網路實驗室 柯小姐 (07) 525-2000 Ext.4148