Download 2015 IEEE/ACIS 14th International Conference on Computer and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
2015 IEEE/ACIS 14th International Conference
on Computer and Information Science (ICIS)
June 28 – July 1, 2015
Las Vegas, USA
Editors:
Takayuki Ito
Yanggon Kim
Naoki Fukuta
Sponsored by
IEEE Computer Society
URL: http://www.computer.org
International Association for Computer & Information Science (ACIS)
URL: www.acisinternational.org
IEEE Catalog Number: CFP15CIS-USB
ISBN: 978-1-4799-8678-1
Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are
permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in
this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the
code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For
other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Operations
Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. Copyright ©2015 by IEEE.
Published by ACIS International
735 Meadowbrook
Mt Pleasant, MI 48858
Phone: 989-774-3811
Email: [email protected]
Web Site: www.acisinternational.org
IEEE Catalog Number: CFP15CIS-ART
ISBN: 978-1-4799-8679-8
Table of Contents
KEYNOTE
1
MOOCs, MOOE and MOOR In China
Wenai Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
COMMUNICATION SYSTEMS & NETWORKS
3
SONET over RPR
Ammar Hamad, Michel Kadoch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Quantifying Security Risk by Measuring Network Risk Conditions
Candace Suh-Lee, Juyeon Jo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Feasibility Analysis for Incorporating/Deploying SIEM for Forensics Evidence Collection in Cloud Environment
Muhammad Irfan, Haider Abbas, Waseem Iqbal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Mining Information Assurance Data with a Hybrid Intelligence/Multi-agent System
Charles Fowler, Robert, II Hammell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Secure in-vehicle Systems against Trojan Attacks
Masaya Yoshikawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Social Routing: A Novel Routing Protocol for Delay Tolerant Network based on Dynamic Connectivity
Viet Quoc Nguyen, Van Phuoc Pham, Quoc Son Trinh, Lung Vu Duc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
A Big Data Approach to Enhance the Integration of Access Control Policies for Web Services
Mohammed Alodib, Zaki Malik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
COPE: Cooperative Power and Energy-efficient Routing Protocol for Wireless Sensor Networks
Saima Jamil, Saqib Jamil, Sheeraz Ahmed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
Comparative Analysis on the Signature Algorithms to Validate AS Paths in BGPsec
Kyoungha Kim, Yanggon Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
Opportunistic Wireless Network Coding Based On Small-time Scale Traffic Prediction
Rui Zhang, Jie Li, Quan Qian, Wei Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Web-based Motion Detection System for Health Care
Ruiling Gao, Minghuan Zhao, Zhihui Qiu, Yingzhou Yu, C. Hwa Chang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A Novel Contention Window Backoff Algorithm for IEEE 802.11 Wireless Networks
Ikram Syed, Byeong-Hee Roh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
A Method for Secure RESTful Web Service
Sungchul Lee Lee, Ju-Yeon Jo, Yoohwan Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Deploying Agents in the Network to Detect Intrusions
Shankar Banik, Luis Pena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Empirical Evaluation of Designing Multicasting Network with Minimum Delay Variation
Nicklaus Rhodes, Shankar Banik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
An Autonomous Model to Enforce Security Policies Based on User’s Behavior
Kambiz Ghazinour, Mehdi Ghayoumi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
CONTROL SYSTEMS, INTELLIGENT SYSTEMS
101
Utilizing NFC to Secure Identification
Robert Gripentog, Yoohwan Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A Kernel based Atanassov’s Intuitionistic Fuzzy Clustering for Network Forensics and Intrusion Detection
Anupam Panwar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Regimdroid : Framework for Customize Android Platform to act as a Brain for Telepresence Robot
Nouha Ghribi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Communication System Based on Chaotic Delayed Feedback Oscillator with Switched Delay
Mikhail Prokhorov, Danil Kulminskiy, Anatoly Karavaev, Anatoly Karavaev, Vladimir Ponomarenko . . . . . . 119
Effective Gaze Writing with Support of Text Copy and Paste
Reo Kishi, Takahiro Hayashi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
A Review of Multimodal Biometric Systems: Fusion Methods and Their Applications
Mehdi Ghayoumi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131
An Adaptive Fuzzy Multimodal Biometric System for Identification and Verification
Mehdi Ghayoumi, Kambiz Ghazinour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Evaluating a GA-based Approach to Dynamic Query Approximation on an Inference-enabled SPARQL Endpoint
Yuji Yamagata, Naoki Fukuta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143
COMPUTER ARCHITECTURE AND VLSI
149
Fully Pipelined VLSI Architecture of a Real-Time Block-Based Object Detector for Intelligent Video Surveillance
Systems
Min-Chun Tuan, Shih-Lun Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149
A Compact Design of n-Bit Ripple Carry Adder Circuit using QCA Architecture
Nusrat Jahan Lisa, Tania Sultana Rimy, Rajon Bardhan, Tangina Firoz Bithee, Zinia Tabassum . . . . . . . . . . .155
Using SPIN to Check Simulink Stateflow Models
Chikatoshi Yamada, Michael Miller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Fast Bootstrapping Method for the Memory-Disk Integrated Memory System
Sangjae Nam, Su-Kyung Yoon, Shin-Dug Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167
Selective Data Buffering Module for Unified Hybrid Storage System
Kihyun Park, Kihyun Park, Shin-Dug Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173
DATA MINING, DATA WAREHOUSING & DATABASE
179
Automated Generation of Hierarchic Image Database with Hybrid Method of Ontology and GMM-based Image
Clustering
Ryosuke Yamanishi, Ryoya Fujimoto, Yuji Iwahori, Robert J Woodham . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .179
Multivariate Temporal Link Prediction in Evolving Social Networks
Alper Ozcan, ¸Sule Gunduz Ögüdücü . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .185
A Graph Model Based Author Attribution Technique for Single-Class Email Classification
A Novino Nirmal, Kyung-Ah Sohn, Tae-Sun Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191
Introducing the Concept of “Always-Welcome Recommendations”
Edson B. dos Santos Junior, Rafael M. D’Addio, Arthur F. da Costa, Marcelo G. Manzato, Rudinei Goularte 197
CBDIR: Fast and Effective Content Based Document Information Retrival System
Moon Soo Cha, So Yeon Kim, Jae Hee Ha, Min-June Lee, Young-June Choi, Kyung-Ah Sohn . . . . . . . . . . . . . 203
Mobile Phone Span Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words Image
Descriptor
So Yeon Kim, Kyung-Ah Sohn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209
Multi-purpose Adaptable Business Tier Components Based on Call Level Interfaces
Óscar Mortágua Pereira, Rui Aguiar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215
Implementation of Modified Overload Detection Technique with VM Selection Strategies Based on Heuristics and
Migration Control
Mohammad Rashedur Rahman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223
A Feature Selection Method for Comparision of Each Concept in Big Data
Takafumi Nakanishi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Searching Human Actions based on a Multi-dimensional Time Series Similarity Calculation Method
Yu Fang, Kosuke Sugano, Kenta Oku, Hung-Hsuan Huang, Kyoji Kawagoe . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
A Human Behavior Processes Database Prototype System for Surgery Support
Zhang Zuo, Kenta Oku, Hung-Hsuan Huang, Kyoji Kawagoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .241
Clustered Based VM Placement Strategies
Mohammad Rashedur Rahman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247
A Personalized Music Discovery Service based on Data Mining
Mahfuzur Rahman Siddiquee, Saifur Rahman, Naimul Haider, Shahnewaz Ul Islam Chowdhury, Mohammad
Rashedur Rahman, Sharnendu Banik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253
Generalized Entropy based Semi-Supervised Learning
Taocheng Hu, Jinhui Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Investigation of Localized Sentiment for a Given Product by Analyzing Tweets
Syed Akib Anwar Hridoy, Faysal Ahmed, Mohammad Samiul Islam, M. Tahmid Ekram,
Mohammad Rashedur Rahman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265
KNOWLEDGE DISCOVERY, NEURAL NETWORKS AND GENETIC ALGORITHMS
271
Enhancing the Impact of Science Data: Toward Data Discovery and Reuse
Alan Chappell, Jesse Weaver, Sumit Purohit, William Smith, Karen Schuchardt, Patrick West, Benno Lee,
Peter Fox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
A Pruning Algorithm for Reverse Nearest Neighbors in Directed Road Networks
Rizwan Qamar, Muhammad Attique, Tae-Sun Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
A Semantic Approach for Transforming XML Data to RDF triples
Mohamed Kharrat, Anis Jedidi, Faiez Gargouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
SPEECH AND SIGNAL PROCESSING
291
Syllable-based Myanmar Language Model for Speech Recognition
Wunna Soe, Yadana Thein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
In-house Alert Sounds Detection and Direction of Arrival Estimation to Assist People with Hearing Difficulties
Mohammad Daoud, Mahmoud Al-Ashi, Fares Abawi, Ala Khalifeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Nearest Multi-Prototype Based Music Mood Classification
Babu Baniya, Joonwhoan Lee, Choong Seon Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .303
IMAGE PROCESSING & PATTERN RECOGNITION
307
Decomposition of Partly Occluded Objects Based on Evaluation of Figural Goodness
Takahiro Hayashi, Tatsuya Ooi, Motoki Sasaki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .307
Using Ant’s Colony Algorithm for Improved Segementation for Number Plate Recognition
Shantanu Prakash, Sanchay Dewan, Shreyansh Bajaj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .313
FPGA Implementation of a Low Complexity Steganographic System for Digital Images
Williams Antonio Pantoja Laces, Jose Juan Garcia-Hernandez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Recognition of Offline Handwritten Hindi Text Using Middle Zone of the Words
Naresh Garg, Lakhwinder Kaur, Mansih Jindal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Automatic Extraction of Text Regions from Document Images by Multilevel Thresholding and K-means Clustering
Vu Hoai Nam, Tran Tuan Anh, Na In Seop, Kim Soo-Hyung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .329
Automated Thresholding of Lung CT Scan for Artificial Neural Network based Classification of Nodules
Sheeraz Akram, Muhammad Younus Javed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
INTELLIGENT AGENT TECHNOLOGY, AGENT BASED SYSTEMS
341
Single-Object Resource Allocation in Multiple Bid Declaration with Preferential Order
Kengo Saito, Toshiharu Sugawara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .341
HMM-Based Vietnamese Speech Synthesis
Son Trinh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .349
A Scoring Rule-based Truthful Demand Response Mechanism
Keisuke Hara, Takayuki Ito . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Multiagent-based Distributed Backup System for Individuals
Takahiro Uchiya, Motohiro Shibakawa, Tetsuo Kinoshita, Ichi Takumi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .361
INTERNET TECHNOLOGY AND APPLICATIONS, E-COMMERCE
367
eMedicalHelp: A Customized Medical Diagnostic Application: Is a single questionnaire enough to masure stress?
Hedieh Ranjbartabar, Amir Maddah, Manolya Kavakli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Development of Mobile Voice Navigation System Using User-Based Mobile Maps Annotations
Tomohiro Yanagi, Daisuke Yamamoto, Naohisa Takahashi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
3D Web Applications in E-Commerce: A Secondary Study on the Impact of 3D Product Presentations Created with
HTML5 and WebGL
Jens Geelhaar, Gabriel Rausch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
SQUED: A Novel Crowd-sourced System for Detection and Localization of Unexpected Events from SmartphoneSensor Data
Taishi Yamamoto, Kenta Oku, Hung-Hsuan Huang, Kyoji Kawagoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Some Observations On Online Advertising: A New Advertising System
Dapeng Liu, Simon Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
MANAGEMENT INFORMATION SYSTEMS
393
Objective Framework for Early-Stage Comparison of Software Development Project Types
Donghwoon Kwon, Robert Hammell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
MIDDLEWARE ARCHITECTURES & TECHNIQUES
399
Dual RAID Techniques for Ensuring High Reliability and Performance in SSD
Sohyun Koo, Sunsoo Kim, Tae-Sun Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
A Novel Architecture for Learner’s Profiles Interoperability
Leila Ghorbel, Corinne Amel Zayani, Ikram Amous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
A Multimedia-Oriented Digital Ecosystem: a New Collaborative Environment
Solomon Asres Kidanu, Yudith Cardinale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .411
Dynamic Binary Translation in a Type-II Hypervisor for CAVIUM MIPS64 Based Systems
Qurrat Ulain, Usama Anwar, Asad Raza, Abdul Qadeer, Ghulam Mustafa, Abdul Waheed . . . . . . . . . . . . . . . 417
MOBILE/WIRELESS COMPUTING
423
An Energy-saving Task Scheduler for Mobile Devices
Hao Qian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423
Challenges and Implementation of ad-hoc Water Gauge System for the Grasp of Internal Water Damage
Takanobu Otsuka, Yoshitaka Torii, Takayuki Ito . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .431
Energy-Efficient Distributed Computing Solutions for Internet of Things with ZigBee Devices
Grzegorz Chmaj, Henry Selvaraj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
PARALLEL AND DISTRIBUTED COMPUTING & SYSTEMS
443
Emerald: Enhance Scientific Workflow Performance with Computation Offloading to the Cloud
Hao Qian, Daniel Andresen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
PROGRAMMING LANGUAGES, COMPILERS, & OPERATING SYSTEMS
449
Accelerating Storage Access by Combining Block Storage with Memory Storage
Shuichi Oikawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .449
SOFTWARE SPECIFICATION TECHNIQUES
455
Formal Specification and Reasoning for Situated Multi-agent System
Zhuang Li, Huaikou Miao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .455
WEB ENGINEERING & APPLICATIONS
461
Automatic Generation of Programming Exercises for Learning Programming Language
Akiyoshi Wakatani, Toshiyuki Maeda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .461
SPECIAL SESSION 1
467
Robust Location Tracking Method for Mixed Reality Robots using a Rotation Search Method
Masahiro Yamamoto, Kazuhiro Suzuki, Ryosuke Ogawa, Nobuhiro Ito, Yoshinobu Kawabe . . . . . . . . . . . . . . .467
Verifying Ignition Timing of Gasoline Direct Injection Engine’s PCM
Masato Yamauchi, Nobuhiro Ito, Yoshinobu Kawabe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Analysis of Driving Behaviors based on GMM by using Driving Simulator with Navigation Plugin
Naoto Mukai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Analyzing Relationship between the Number of Errors in Review Processes for Embedded Software Development
Projects
Toyoshiro Nakashima, Kazunori Iwata, Yoshiyuki Anani, Naohiro Ishii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .485
Classification on Nonlinear Mapping of Reducts Based on Nearest Neighbor Relation
Naohiro Ishii, Ippei Torii, Naoto Mukai, Kazunori Iwata, Toyoshiro Nakashima . . . . . . . . . . . . . . . . . . . . . . . . 491
WORKSHOP I
497
Proposal of Programming Creation Application Using Road Signs by Smartphones
Reiko Kuwabara, Eigo Ito, Takayuki Fujimoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Proposal of Multiple Travel Scheduling System based on Inverse Operation Method
Murata Kazuya, Takayuki Fujimoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503
A proposal of Programming Education System using Mechanical Calculator Mechanism
Eigo Ito, Takayuki Fujimoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .509
SDSS: Proposal on Feeding Support Application Software which Enables the User to Create a State of “Mental
Alertness”
Motoichi Adachi, Takayuki Fujimoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
A Proposal of the System to Stop a Decline of the Interest of Great East Japan Earthquake
Koji Fujita, Takayuki Fujimoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .519
WORKSHOP II
525
Singing Voice Detection of Popular Music Using Beat Tracking and SVM
Fengyan Wu, Shutao Sun, Jianglong Zhang, Yongbin Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .525
Solving the Supermarket Shopping Route Planning Problem Based on Genetic Algorithm
Xiajia Chen, Ying Li, Tao Hu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .529
Online Advertising Demand-side Platform Business System Design Exploration
Tao Lei, Junpeng Gong, Yujun Wen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .535
A Feature Selection Algorithm of Music Genre Classification Based on ReliefF and SFS
Meimei Wu, Yongbin Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .539
Research and Implementation of Four-prime RSA Digital Signature Algorithm
Zhenjiu Xiao, Yongbin Wang, Zhengtao Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .545
The Design and Implementation of Personalized News Recommendation System
Xuejiao Han, Wenqian Shang, Shuchao Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
A Logic Model of Interest in Information Network
Shiping Zhou, Wei Zhang, Xiangrong Tong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Research on Public Opinion Based on Big Data
Songtao Shang, Minyong Shi, Wenqian Shang, Zhiguo Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
An Automatic Semantic Web Service Composition Method Based on Ontology
Ying Li, Yulong Li, Tao Hu, Zhisheng Lv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
The Maximal Operator Classifier
Yuqi Wang, Wenqian Shang, Shuchao Feng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .567
Visualization in Media Big Data Analysis
Yingjian Qi, Xinyan Yu, Guoliang Shi, Ying Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .571
Personalized two party key exchange protocol
Tong Yi, Minyong Shi, Wenqian Shang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .575
Personalized News Recommendation Based on Links of Web
Zhenzhong Li, Wenqian Shang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Interactive Virtual Theater Display System
Min Feng, Huaichang Du . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .585
An Improved Algorithm for Active Contour Extraction Based on Greedy Snake
Hui Ren, Zhibin Su, Chaohui Lv, Fangjv Zou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .589
Spread Influence Algorithm Of News Website Based on PageRank
GuoWei Chen, Fei Xie, Tao Lei, Yu Su . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .593
SERA 2015
597
Petri Nets-based Design of Real-Time Reconfigurable Networks on Chips
Hela Ben Salah, Adel Benzina, Mohamed Khalgui. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .597
Real-Time Reconfigurable Scheduling of Multiprocessor Embedded Systems Using Hybrid Genetic Based
Approach
Hamza Gharsellaoui, Ismail Ktata, Naoufel Kharroubi, Mohamed Khalgui . . . . . . . . . . . . . . . . . . . . . . . . . . . .605
New Adaptive Middleware for Real-Time Embedded Operating Systems
Fethi Jarray, Hamza Chniter, Mohamed Khalgui. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .610
SABPEL: Creating Self-Adaptive Business Processes
Sihem Cherif, Raoudha Ben Djemaa, Ikram Amous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .619
A Learning Semantic Web Service for Generating Learning Paths
Chaker Ben Mahmoud, Ikbel Azaiez, Fathia Bettahar, Marie-Hélène Abel, Faïez Gargouri . . . . . . . . . . . . . . . 627
2LPA-RTDW: A Two-Level Data Partitioning Approach for Real-time Data Warehouse
Issam Hamdi, Emna Bouazizi, Saleh Alshomrani, Jamel Feki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .632
Opus Framework: A Proof-of-Concept Implementation
Nahla Haddar, Mohamed Tmar, Faiez Gargouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .639
A Service-Oriented Architecture (SOA) Framework for Choreography Verification
Sirine Rebai, Hatem Hadj Kacem, Mohamed Karaa, Saul E. Pomares, Ahmed Hadj Kacem . . . . . . . . . . . . . . .642
Adaptive Security for Cloud Data Warehouse as a Service
Emna Guermazi, Mounir Ben Ayed, Hanêne Ben-Abdallah. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Enriching User Model Ontology for Handicraft domain by FOAF
Maha Maalej, Achraf Mtibaa, Faïez Gargouri. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Integrating semantics and structural information for BPMN model refactoring
Wiem Khlif, Hanêne Ben-Abdallah. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
Context Aware Criteria For The Evaluation Of Mobile Decision Support Systems
Emna Ben Ayed, Mounir Ben Ayed, Christophe Kolski, Houcine Ezzedine, Faiez Gargour. . . . . . . . . . . . . . . . 661
Ensemble Feature Selection of microRNAs and Human Cancer Classifications
Minghao Piao, Hyoung Woon Song, Keun Ho Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Author Index
673
Ensemble Feature Selection of microRNAs and
Human Cancer Classification
Minghao Piao1, Hyoung Woon Song2, Keun Ho Ryu*
1,*
Database/Bio informatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University
Cheongju, South Korea
2
Plant Engineering Center/Clean Energy Team, Institute for Advanced Engineering, Yongin, Korea
{1bluemhp, *khryu}@dblab.chungbuk.ac.kr, [email protected]
*
Abstract—For the selection of most significant microRNAs
and its use in human cancer classification, traditional feature
selection methods are widely used like filter approach, embedded
approach and wrapper approach. However, some studies report
that these methods would decrease the stability of biomarkers.
Recently, ensemble feature selection methods are very popular in
bioinformatics to improve the stability of biomarkers. In our
study, we describe a data diversity ensemble based feature
selection method for microRNAs based human cancer
classification. The results show that our approach can select most
significant microRNAs with high quality of classification.
Keywords—Ensemble feature selection, Cascading-andSharing, data diversity, microRNAs, Human cancer classification
I. INTRODUCTION
Many data mining techniques [1-3] have been applied to
the microRNA expression data for human cancer classification
since a class of small non-coding RNAs have been proved that
the abnormal expression data can indicate human cancer [4, 5].
However, there are still several issues [6] have to be solved:
(1) Curse of dimensionality: due to the large number of genes,
it is difficult to focus on informative genes. The high
dimensionality may cause a series of problems for cancer
classification, such as add noise, reduce the accuracy rate, and
increase the complexity. (2) Choosing the most appropriate
small number of genes is extremely difficult.
To solve these problems, we can use feature selection or
feature extraction methods to reduce the dimensionalities.
Feature selection is better than feature extraction for
microRNA expression data analysis because feature selection
methods simply choose number of appropriate microRNAs
and it can preserve original characteristics of microRNAs,
whereas feature extraction is aimed to create new features
using some transform functions of the original microRNAs,
but these new features maybe not able to explain the physical
aspect.
In Lu et al.’s work [7], they used bead-based flow
cytometric microRNA expression profiling method to analyze
the 217 mammalian microRNAs from 334 samples including
*
Corresponding author
978-1-4799-8679-8/15/$31.00 copyright 2015 IEEE
ICIS 2015, June 28-July 1 2015, Las Vegas, USA
human cancers. And the result showed the potential of
microRNA profiling in cancer diagnosis. Based on this data
resource, many works using different feature selection
methods and classification methods have been conducted to do
the cancer classification [8, 9, 10, 34, 35]. For most of feature
selection methods, it is difficult to define appropriate number
of high-ranked microRNAs. In such case, the most widely
used approach is to define the appropriate number of
microRNAs by comparing the classification performance of
different number of high-ranked microRNAs which gives
better performance. However, such kind of approach is not
useful when there are numerous high-ranked microRNAs or
the calculated measurement shows linear changes which
makes difficulty to define high-ranked microRNAs. For
improving the stability of feature selection methods in
bioinformatics, researchers have proposed new frameworks
such as ensemble feature selection methods.
In this study, we propose a data diversity ensemble based
feature selection method TOp-K Significant features from
Cross VAlidation (TOSCVA). The advantage of the proposed
method is that it can effectively solve the singleton and
fragmentation problem in classification. The experimental
results show that our method can find most significant features
and it is suitable for its intended use in classification.
II. ENSEMBEL FEATUR SELECTION
When looking for biomarkers from DNA, RNA and
microRNAs expression data, only a small subset of
biomarkers are selected which are related to specific diseases.
One of the most common approaches is through ordering or
ranking the genes by their importance. Ordering genes by their
importance is very similar to feature selection which is a
preprocessing step of data mining. The feature selection
methods can return a set of features that are most important to
the problem at hand. Feature selection methods can be applied
to several issues in biology and genetics: distinguishing
between healthy and diseased tissue [11, 12, 13, 33];
identification and classification of different types of cancer
[14, 15]; prediction of drug treatment [16, 17], etc.
In bioinformatics, data sets usually contain few samples
(often less than a hundred) and thousands of different genes
(curse of dimensionality). This will decrease the stability of
feature rankers and lead to generating different results after
slightly changing the data set [18].
In order to improve the stability of feature selection
techniques, researchers have proposed new frameworks such
as ensemble feature selection methods [19-25]. The idea of
ensemble feature selection is derived from ensemble learning
methods wherein different classifiers are applied to a dataset
and their results are aggregated. Ensemble feature selection
techniques apply feature selection algorithms multiple times
and combine the results into the decision making. Because
combining multiple results, the features which are frequently
chosen as the best performers will be marked as top-ranked
features, while features with poor performance will be lowranked features; thus, the final top-ranked features will be
more stable. There are three main types of ensemble feature
selection techniques [26]. (1) Data diversity consists of
applying a single feature selection method to a number of
differently sampled versions of the same dataset and then an
aggregation technique is used to aggregate the results. (2)
Functional diversity is performed by applying a set of different
feature selection techniques on the same dataset. (3) Hybrid
ensembles use both of these, applying different feature
selection techniques to different sampled versions.
III. DECISION TREE ENSEMBLE BASED FEATURE SELECTION
A. Problem Definition
In data mining, ensemble methods are used for improving
the classifier’s accuracy. Ensemble methods are used to
construct a set of base classifiers from training data set and
perform the classification work by voting on the predictions
made by each classifier. Since the idea of ensemble feature
selection is derived from ensemble learning methods, it is
possible to apply ensemble learning methods in feature
selection if the method can decide which features to construct
the set of classifiers.
The ensemble of classifiers can be constructed in many
ways [27] and most widely used is by manipulating the
training set like bagging and boosting. Three interesting
observations are described in [28] based on the study of many
ensemble methods: (1) Many ensembles constructed by the
Boosting method were singletons. Due to this constraint,
deriving classification rules have a limitation: decision trees
are not encouraged to derive many significant rules and they
are mutually exclusive and covering the entire of training
samples exactly only once. (2) Many top-ranked features
possess similar discriminating merits with little difference for
classification. This indicates that it is worthwhile to employ
different top-ranked features as the root nodes for building
multiple decision trees. (3) Fragmentation problem is another
problem that those ensemble methods have: as less and less
training data are used to search for root nodes of sub-trees.
Based on those observations, if we want to apply ensemble
learning method to feature selection in bioinformatics like
selecting most useful microRNAs, we need a method that can
break the singleton coverage constraint and solve the
fragmentation problem. Our previous study [34] has
mentioned that microRNAs selected from traditional feature
selection methods are not the most top-ranked features.
Therefore, in our study, we are going to introduce a method
that can produce most top-ranked features.
B. TOSCVA
Decision tree is commonly used in classification for the
purpose of decision making. Decision tree is attractive for 3
reasons: (1) Decision tree is a good generalization for
unobserved instance, only if the instances are described in
terms of features that are correlated with the target concept.
(2) The methods are efficient in computation that is
proportional to the number of observed training instances. (3)
The result of decision tree provides a representation of the
concept that is explainable to humans. Also, decision tree
could be used as a feature selection method since the
algorithm itself decides which features to construct the tree
structure.
Bagging and boosting are first approach they construct
multiple base trees, each time using a bootstrapped replication
of the original training data. Bagging [30] is a method for
generating multiple decision trees and using these trees to get
an aggregated predictor. The multiple decision trees are
formed by bootstrap aggregating which repeatedly samples
from a data set and the sampling is done with replacement. It
is that some instances may appear several times in the same
training set, while others may be omitted from the training set.
Unlike bagging, boosting [31] assigns a weight to each
training example and may adaptively change the weight at the
end of each boosting round.
However, it is impossible to select most significant
features since multiple decision trees have chance to use
different set of features. In this study, we propose an ensemble
feature selection method named TOp-K Significant features
from Cross VAlidation (TOSCVA) in order to efficiently
select most significant features. TOSCVA consists of three
phases: at first, by using the mechanism of cross validation, it
creates different K training data sets according to user given
parameter K; second, according to given parameter N, N
number of decision trees is constructed from each K training
data sets. The parameter N determines the number of top
ranked features which will be forced to be the root of a
decision tree [28, 29]; finally, the distinguishing power of K
decision tree committees is evaluated to detect final N most
significant top ranked features.
Input
K: number of training data set
N: number of top ranked features
M: denotes given data set
Output
N number of significant features
Data_sampling(K,M);
{
Produce K equal training data sets from M;
Return TrainingData[K];
}
Ensemble_Feature_Selection(TrainingData[K],N)
{
For each TrainingData[K]
{
Best N features are selected;
Tree_Construction()
{force each best feature to be the
root;}
Significance_Evaluation()
{the significance of each feature is
evaluated;}
}
Return most significant N number of
features;
}
Algorithm 1. The skeleton of TOSCVA
considering too many features during the process. In other
words, we are looking for most smaller number of features
with highest performance. From Table 2 and Table 3, we can
see that the performances on 55 ~ 64 and 73 ~ 80 microRNAs
are same. Therefore, it is better to choose 55 top-ranked
microRNAs as most significant microRNAs in human cancer
classification.
Figure 3 shows the running time cost of different number
of top-ranked features during three times of tests. Since we
have to build the tree committee based on the number of topranked features, the running time cost increases when the
number of top-ranked features are increasing.
IV. EXPERIMENTAL RESULTS
A. microRNA Dataset
The microRNA expression dataset was first published by
[32]. They used a bead-based method to present a systemic
expression analysis of 217 mammalian microRNAs from 186
samples including multiple human cancers. The used data set
in this paper is described in Table 1.
TABLE I.
THE NUMBER OF THE SAMPLES FOR EACH CANCER TYPE
Cancer Name
Colon
Pancreas
Uterus
Mesothelioma
Breast
B Cell ALL
T Cell ALL
Follicular Cleaved Lymphoma
Large B Cell Lymphoma
SUM
Fig. 1. Classification accuracy on different number of top-ranked
microRNAs.
No. of Tumor Samples
10
9
10
8
6
26
18
8
8
103
Fig. 2. Classification accuracy on 50~80 top-ranked microRNAs.
B. Feature Selection and Classification
Figure 1 shows the classification accuracy of our approach
on the 10 ~ 217 microRNAs with interval of 10. When the
number of given top-ranked microRNAs are in the interval of
10 ~ 60, the accuracy is increasing with bigger number of
microRNAs. And, the classifier shows the highest accuracy
when the given number of microRNAs is 60 and 80. Therefore,
we are going to choose top-ranked microRNAs as most
significant microRNAs in the interval of 50 ~ 80.
Figure 2 shows the classification accuracy of the method
on the 50 ~ 80 microRNAs. We can see that the classifier
shows highest accuracy when the number of given
microRNAs are in the interval of 55 ~ 64 and 73 ~ 80. When
the given number of top-ranked microRNAs is 65 ~ 72, the
accuracy is decreased. It means that there are some
microRNAs which are not suitable to build decision tree
committees when the number is bigger than 64 even it shows
highest accuracy in the interval of 73 ~ 80. Also, the cost of
decision tree induction will become expensive when
Fig. 3. Running time cost of different number of top-ranked
features.
After checking the microRNAs that exactly used to
construct decision trees, we have found most common
microRNAs that often used in decision tree induction which
are independent from given number of top-ranked microRNAs.
- no-miR151*:UCGAGGAGCUCACAGUCUAGUA:bead_160-C
- hsa-miR-125b:UCCCUGAGACCCUAACUUGUGA:bead_102-A
- hsa-miR-18:UAAGGUGCAUCUAGUGCAGAUA:bead_129-A
- mmu-miR-155:UUAAUGCUAAUUGUGAUAGGGG:bead_126-A
- hsa-miR-10a:UACCCUGUAGAUCCGAAUUUGUG:bead_120-A
TABLE II.
No. microRNAs
55 ~ 64
DETAILED CLASSIFICATION PERFORMANCE
Classes (No. of instances)
Precision
0.778
0.7
0.737
Pancreas (9)
0.778
0.778
0.778
Uterus (10)
0.7
0.7
0.7
Mesothelioma (8)
0.833
0.625
0.714
Breast (6)
0.75
1
0.857
B Cell ALL (26)
0.926
0.962
0.943
T Cell ALL (18)
0.947
1
0.973
Follicular Cleaved Lymphoma (8)
0.714
0.625
0.667
Large B Cell Lymphoma (8)
0.625
0.625
0.625
Colon (10)
0.778
0.7
0.737
Pancreas (9)
0.778
0.778
0.778
Uterus (10)
0.7
0.7
0.7
0.833
0.625
0.714
0.857
Breast (6)
0.75
1
B Cell ALL (26)
0.926
0.962
0.943
T Cell ALL (18)
0.947
1
0.973
Follicular Cleaved Lymphoma (8)
0.714
0.625
0.667
Large B Cell Lymphoma (8)
0.625
0.625
0.625
TABLE III.
No. microRNAs
55 ~ 64
73 ~ 80
F-measure
Colon (10)
Mesothelioma (8)
73 ~ 80
Recall
CONFUSION MATRIX
Classes (classified as)
a
b
c
d
e
f
g
h
i
Colon (a)
7
1
2
0
0
0
0
0
0
Pancreas(b)
1
7
1
0
0
0
0
0
0
Uterus(c)
1
1
7
1
0
0
0
0
0
Mesothelioma(d)
0
0
0
5
0
2
0
1
0
Breast(e)
0
0
0
0
6
0
0
0
0
B Cell ALL(f)
0
0
0
0
0
25
1
0
0
T Cell ALL(g)
0
0
0
0
0
0
18
0
0
Follicular Cleaved Lymphoma(h)
0
0
0
0
0
0
0
5
3
Large B Cell Lymphoma(i)
0
0
0
0
2
0
0
1
5
Colon (a)
7
1
2
0
0
0
0
0
0
Pancreas(b)
1
7
1
0
0
0
0
0
0
Uterus(c)
1
1
7
1
0
0
0
0
0
Mesothelioma(d)
0
0
0
5
0
2
0
1
0
Breast(e)
0
0
0
0
6
0
0
0
0
B Cell ALL(f)
0
0
0
0
0
25
1
0
0
T Cell ALL(g)
0
0
0
0
0
0
18
0
0
Follicular Cleaved Lymphoma(h)
0
0
0
0
0
0
0
5
3
Large B Cell Lymphoma(i)
0
0
0
0
2
0
0
1
5
V. CONCLUSION
In data mining, traditional feature selection methods can be
divided into filter approach, embedded approach and wrapper
approach. In bioinformatics, the weakness of these methods is
that they would decrease the stability of biomarkers. Recently,
ensemble feature selection methods are very popular in
bioinformatics to improve the stability of biomarkers. The
ensemble feature selection methods can be divided into three
types: Data diversity, Functional diversity and Hybrid
ensemble. However, there are still no studies about application
of ensemble feature selection in microRNAs based human
cancer classification. In our study, we described an ensemble
feature selection method TOSCVA for microRNAs and
human cancer classification. The experimental results show
that our approach is useful to define most significant
microRNAs by evaluating the classification performance of
top-ranked features. Also, we have found several most
common microRNAs which are most often used in decision
tree induction with different number of top-ranked.
Based on the experimental results, we believe that our
method can be used in various research areas which needs to
solve the curse of dimensionality problem. Also, by using
different feature ranking methods, our method can produce
different set of top-ranked features. It indicates that our
method has capability to be used in different data sets which
have different characteristics.
Our future work will be focusing on the application of
Functional diversity and Hybrid ensemble method on
microRNAs and trying to design new ensemble feature
selection method by considering different feature ranking
methods in our mechanism.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Acknowledgment
This research was supported by the MSIP (Ministry of
Science, ICT and Future Planning), Korea, under the ITRC
(Information Technology Research Center) support program
(2014-H0301-14-1022) supervised by the NIPA (National IT
Industry Promotion Agency), and by Basic Science Research
Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Science, ICT & Future
Planning (No.2013R1A2A2A01068923), and by Export
Promotion Technology Development Program, Ministry of
Agriculture, Food and Rural Affairs (No.114083-3).
[14]
[15]
[16]
[17]
References
[1]
[2]
[3]
[4]
X. Wang, Robust two-gene classifiers for cancer prediction, Genomics,
2011, pp. 90-95.
L. Li, W. Jiang, X. Li, K.L. Moser, Z. Guo, L. Du, Q. Wang, E.J. Topol,
Q. Wang, S. Rao, A robust hybrid between genetic algorithm and
support vector machine for extracting an optimal feature gene subset,
Genomics, 2005, pp. 16-23.
T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, Y. Saeys, Robust
biomarker identification for cancer diagnosis with ensemble feature
selection methods, Bioinformatics, 2010, pp. 392-398.
L. He, J. M. Thomson, M. T. Hemann, E. Hernando-Monge, D. Mu, S.
Goodson, Powers S, Cordon-Cardo C, Lowe SW, Hannon GJ,
[18]
[19]
[20]
Hammond SM, A microRNA polycistron as a potential human oncogene, Nature, 2005, 435, pp. 828-833.
M. Mraz, S. Pospisilova, K. Malinova, I. Slapak, J. Mayer, MicroRNAs
in chronic lymphocytic leukemia pathogenesis and disease subtypes,
Leuk Lymphoma, 2009, pp. 506-509.
N. Rosenfeld, R. Aharonov, E. Meiri, S. Rosenwald, Y. Spector, M.
Zepeniuk, H. Benjamin, N. Shabes, S. Tabak, A. Levy, MicroRNAs
accurately identify cancer tissue origin, Nat. Biotechnol, 2008, pp. 462469.
J. Lu, G. Getz, E. A. Miska, E. Alvarez-Saavedra, J. Lamb, D. Peck, A.
Sweet-Cordero, B. L. Ebert, R. H. Mak, A. A. Ferrando, J. R. Downing,
T. Jacks, H. R. Horvitz, T. R. Golub, MicroRNA expression profiles
classify human cancers, Nature, 2005, 435, pp. 834-838.
N. Rosenfeld, R. Aharonov, E. Meiri, S. Rosenwald, Y. Spector, M.
Zepeniuk, H. Benjamin, N. Shabes, S. Tabak, A. Levy, D. Lebanony, Y.
Goren, E. Silberschein, N. Targan, A. Ben-Ari, S. Gilad, N. Sion-Vardy,
A. Tobar, M. Feinmesser, O. Kharenko, O. Nativ, D. Nass, M.
Perelman, A. Yosepovich, B. Shalmon, S. Polak-Charcon, E. Fridman,
A. Avniel, I. Bentwich, Z. Bentwich, D. Cohen, A. Chajut, I. Barshack,
MicroRNAs accurately identify cancer tissue origin, Nat Biotechnol,
2008, 26, pp. 462-469.
R. Xu, J. Xu, D. C. Wunsch II, MicroRNA expression profile based
cancer classification using Default ARTMAP, Neural Networks, 2009,
22, pp. 774-780.
Kyung-Joong Kim, Sung-Bae Cho, Exploring features and classifiers to
classify microRNA ex-pression profiles of human cancer, Neural
Information Processing, 2010, 6444, pp. 234-241.
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and
A. J. Levine, “Broad patterns of gene expression revealed by clustering
analysis of tumor and normal colon tissues probed by oligonucleotide
arrays,” Proceedings of the National Academy of Sciences, 1999, vol.
96, no. 12, pp. 6745-6750.
S. Dudoit, J. Fridlyand, and T. P. Speed, Comparison of discrimination
methods for the classifi-cation of tumors using gene expression data,
Journal of the American Statistical Association, 2002, vol. 97, no. 457,
pp. 77-87.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for
cancer classification using support vector machines, Machine Learning,
2002, vol. 46, pp. 389-422.
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z.
Yakhini, Tissue classification with gene expression profiles, Journal of
Computational Biology, 2000, vol. 7, no. 3-4, pp. 559-583.
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P.
Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D.
Bloomfield, and E. S. Lander, Molecular classification of cancer:Class
discovery and class prediction by gene expression monitoring, Science,
1999, vol. 286, no. 5439, pp. 531-537.
D. Dittman, T. Khoshgoftaar, R. Wald, and A. Napolitano, “Random
forest: A reliable tool for patient response prediction,” Proceedings of
the IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) Workshops. BIBM, 2011, pp. 289-296.
G. Mulligan, C. Mitsiades, B. Bryant, F. Zhan, W. J. Chng, S. Roels, E.
Koenig, A. Fergus, Y. Huang, P. Richardson, W. L. Trepicchio, A.
Broyl, P. Sonneveld, J. Shaughnessy, John D., P. Leif Bergsagel, D.
Schenkein, D.-L. Esseltine, A. Boral, and K. C. Anderson, Gene
expression profiling and correlation with outcome in clinical trials of
the proteasome inhibitor bortezomib, Blood, 2007, pp. 3177-3188.
A. Kalousis, J. Prados, and M. Hilario, Stability of feature selection
algorithms: a study on high-dimensional spaces, Knowledge and
Information Systems, 2006, vol. 12, no. 1, pp. 95-116.
T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, and Y. Saeys,
Robust biomarker identification for cancer diagnosis with ensemble
feature selection methods, Bioinformatics, 2010, vol. 26, no. 3, pp. 392398.
A. C. Haury, P. Gestraud, and J. P. Vert, The influence of feature
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
selection methods on accuracy, stability and interpretability of
molecular signatures, PLoS ONE, 2011, vol. 6, no. 12, pp. e28210.
H. Liu, L. Liu, and H. Zhang, Ensemble gene selection by grouping for
microarray data classification, Journal of Biomedical Informatics, 2010,
vol. 43, no. 1, pp. 81-87.
Y. Saeys, T. Abeel, and Y. Peer, “Robust feature selection using
ensemble feature selection techniques,” Proceedings of the European
conference on Machine Learning and Knowledge Discovery in
Databases - Part II. Berlin, Heidelberg: Springer-Verlag, 2008, pp.313325.
P. Yang, J. Ho, Y. Yang, and B. Zhou, Gene-gene interaction filtering
with ensemble of filters, BMC Bioinformatics, 2011, vol. 12, no. Suppl
1, pp. S10.
P. Yang, Y. Hwa Yang, B. B Zhou, and A. Y Zomaya, A review of
ensemble methods in bioinformatics, Current Bioinformatics, 2010, vol.
5, no. 4, pp. 296–308.
L. Yu, Y. Han, and M. E. Berens, Stable gene selection from
microarray data via sample weighting, IEEE/ACM Trans. Comput. Biol.
Bioinformatics, 2012, vol. 9, no. 1, pp. 262–272.
Awada, Wael, et al. “A review of the stability of feature selection
techniques for bioinformatics data”, 2012 IEEE 13th International
Conference on Information Reuse and Integration (IRI), 2012, pp. 356363.
P. N. Tan, M. Steinbach, V. Kumar, Ensemble methods. Introduction to
data mining, Addision Wesley, 2006, pp. 278-280.
J. Y. Li, H. A. Liu, See-Kiong Ng, Limsoon Wong, Discovery of
significant rules for classifying cancer diagnosis data, Bioinformatics,
2003, vol. 19, pp. 93-102.
J. Li, H. Liu, “Ensembles of cascading trees”, Proceedings of Third
IEEE international conference on data mining, 2003, 585-588.
[30] L. Breiman, Bagging predictors, Machine Learning, 1996, vol. 24, pp.
123-140.
[31] Y. Freund, R. E. Schapire, “Experiments with a New Boosting
Algorithm”, Proceedings of the Thirteenth International Conference on
Machine Learning, 1996, pp. 148-156.
[32] E. Fridman, Z. Dotan, I. Barshack, M.B. David, A. Dov, S. Tabak, O.
Zion, S. Benjamin, H. Ben-jamin, H. Kuker, Accurate molecular
classification of renal tumors using microRNA expression, The Journal
of molecular diagnostics, 2010, pp. 687-696.
[33] Y. J. Piao, H. W. Park, C. H. Jin and K, H. Ryu, “Ensemble Method for
Classification of High Dimensional Data”, Proceedings of the
International Conference on Big Data and Smart Computing, 2014, pp.
245-249.
[34] F. F. Li, Y. J. Piao, M. J. Li, M. H. Piao and K, H. Ryu, “Positive
Impression of Low-Ranking microRNAs in Human Cancer
Classification”, Proceedings of The Third International conference on
Parallel, Distributed Computing technologies and Applications
(PDCTA-2014), 2014.
[35] Y. J. Piao, N. H. Choi, M. J. Li, M. H. Piao and K, H. Ryu, “Ensemble
Method for Prediction of Prostate Cancer from RNA-Seq Data”, 6th
International Joint Conference on Knowledge Discovery, Knowledge
Engineering and Knowledge Management, 2014, pp. 51-56.