Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國際傑出學者講座 講題 Machine Learning Approach to Visualization and Classification of Big Data 講座 貢三元 教授 美國普林斯頓大學 電機工程學系 時間 2015/12/24 (四) 14:00-16:00 地點 國立中山大學 電資大樓 EC6011室 摘要 In the big data era, we are experiencing a phenomenon of “digital everything”: • Massive digital data are being rapidly captured in digital format, including digital book/voice/image/video/commerce. • They come from divergent types of sources, from physical (sensor/IoT) to social and cyber (web) types. An implicit principle behind big data analysis is to make use of ALL of the available data, vectorial or non-vectorial (i.e. the variety), whether they are messy, imprecise, or incomplete (i.e. the veracity). Due to its quantitative (volume and velocity) and qualitative (variety) challenges, it is imperative to address various computational aspects of big data. For big data, the curse of high feature dimensionality is causing grave concerns on computational complexity and over-training. In this talk, we shall explore various projection methods for dimension reduction – a prelude to visualization of big data. Note that visualization tools are meant to supplement (instead of replace) the domain expertise (e.g. a cardiologist) and provide a big picture to help users formulate critical questions and subsequently postulate heuristic and insightful answers. A popular visualization tool for unsupervised learning is Principal Component Analysis (PCA), which aims at best recoverability of the original data in the Euclidean Vector Space (EVS). DCA can be viewed as a supervised PCA in a Canonical Vector Space (CVS), which aims at maximizing the discriminant distance in CVS. For supervised application scenarios, our simulations have confirmed that DCA far outperforms PCA, both numerically and visually. Big data analysis will require novel machine learning tools to seamlessly integrate information from diversified sources, in order to unravel information hidden in the big data. It usually involves nonlinear data analysis, for which kernel machine learning (KML) and deep learning (DML) represent the two most promising approaches. KML can be formulated in either the primal domain (in the original Euclidean space) or in the dual domain (in the induced kernel space). Even though the two formulations are mathematically equivalent, they require very different computational costs. This talk will address how to exploits the such a dual–cost strategy to gain computational saving by orders of magnitude, both in the learning and classification phase. The choice between the two domains could also result in different ramification in privacy protection, which will be deferred to the second talk. Ultimately, big data analysis will call for special hardware and software technologies with architectural platform based on machine learning systems. KML and DML differs in the fact that KML is based on pairwise quantification of any pair of targeted objects while DML extract feature vector for each of the targeted objects, using a cascade of feature extraction layers. In this sense, the two learning approaches complement each other very well. This points to a potentially fruitful system integration combining both KML and DML technologies. 教育部「行動寬頻尖端技術跨校教學聯盟: 行動寬頻網路與應用-小細胞基站計畫」 聯絡人/ 國立中山大學電機系 電腦通訊網路實驗室 柯小姐 (07) 525-2000 Ext.4148