Download An Approach to Find Missing Values in Medical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
An Approach to Find Missing Values in Medical Datasets
B.Mathura Bai
N.Mangathayaru
B.Padmaja Rani
Faculty of Information Technology
VNR VJIET
Hyderabad
INDIA
Professor
Dept. of Information Technology
VNR VJIET, Hyderabad
INDIA
Professor
Computer Science & Engg Dept.
JNT University, Hyderabad
INDIA
[email protected]
[email protected]
[email protected]
ABSTRACT
Mining medical datasets is a challenging problem before
data mining researchers as these datasets have several
hidden challenges compared to conventional datasets.
Starting from the collection of samples through field
experiments and clinical trials to performing classification,
there are numerous challenges at every stage in the mining
process. The preprocessing phase in the mining process
itself is a challenging issue when, we work on medical
datasets. One of the prime challenges in mining medical
datasets is handling missing values which is part of
preprocessing phase. In this paper, we address the issue of
handling missing values in medical dataset consisting of
categorical attribute values. The main contribution of this
research is to use the proposed imputation measure to
estimate and fix the missing values. We discuss a case
study to demonstrate the working of proposed measure.
Categories and Subject Descriptors:
I.2.6 Learning –Knowledge acquisition. H.2.8 Database
applications: Data mining I.2.6 Artificial Intelligence:
Learning - parameter learning
General Terms
analysis, dataset, record, attribute, noise, algorithm
Keywords
medical record, missing values, prediction, classification
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
ICEMIS '15, September 24-26, 2015, Istanbul, Turkey
© 2015 ACM. ISBN 978-1-4503-3418-1/15/09…$15.00
DOI: http://dx.doi.org/10.1145/2832987.2833083
1. INTRODUCTION
Disease Prediction and Classification is evolving as an
important research problem in the fields of medical and
health informatics.
Medical data has several hidden challenges when compared
to conventional datasets. The first step is the sample data
collection which needs defining the attributes which
functionally define the target specification. This phase of
sample data collection is not free from challenges. The
possibility of the presence of missing values for some
attributes itself initiates the first challenge before the data
mining researchers.
One simple method to handle the missing values is to
simply eliminate the medical record consisting of missing
values. But this does not solve the problem. This is
because, this medical record might contain the values of
attributes which might be the deciding factors in predicting
the diseases in some situations. If we do not eliminate such
records consisting missing values, then the next strategy is
to consider those records for the purpose of training. In
such cases, we must fix the missing values. This is carried
by several researchers in different ways.
The most common way is using the distance measure to
perform imputation of these missing values and replace
these missing values using the suitable values. This
requires proper estimation to be made. This is because any
failure to do so would give fault outcome.
The problem of mining medical records may be classified
into the problem of handling the missing values,
performing classification and clustering process which
requires that the class label of each record is available. This
is a supervised learning process. However clustering
medical data to group similar medical records or patients is
an un-supervised process of learning.
Most works in the literature make use of Euclidean distance
measure to fix the missing values. Some researchers try to
cluster the medical data and then fix the missing values
using k-means algorithm for clustering. Our findings
indicates that there is a huge scope for research towards
fixing missing values by designing new distance measures
which can satisfy the basic properties of distance measures.
If this is done then, we can estimate the missing values of
the records containing the class label. In this paper, the
section-2 discusses the related works. Section-3 points out
the research issues in medical data mining. Section-4
introduces the problem definition and a generalized
approach for handling the medical datasets.
2. RELATED WORKS
Missing Values are most common when various field
experiments and clinical trials are carried out. These
missing values throw hidden challenges to the data miner
and analyst. In this section we, discuss various research
works carried out recently in healthcare and medical data
mining. In [1], the authors discuss top-10 data mining
algorithms which are most promising. In [2, 3, 4, and 5] the
authors work involves predicting the heart disease using
data mining approaches.
The work in [6] deals with the knowledge discovery issues
in data mining. The work in [7] deals with evolutionary
approach in data mining and [8] deals with finding
interesting rules from the dataset. In [9], the authors discuss
the application of frequent items finding algorithm to find
the periodical frequent patterns from transaction data.
The authors [10, 14, and 15] propose an approach for
handling the missing values when the feature values are
continuous in nature using non-parametric discretization
technique. They try to impute the missing values. They use
the concept of z-score to handle the missing values in the
medical records. This requires the underlying data to be
continuous.
In [11, 13] the authors use the concept of imputation to
handle the missing values considering dengue fever dataset.
They design the procedure to impute the missing attribute
values for continuous attributes which may be numeric and
categorical. They also use the decision tree for generating
efficient decision rules for classification.
In [12], the authors compare the various approaches
available in the literature for handling the missing values
with the benchmark datasets. In [17, 18] the authors
perform classification of medical records. The research in
[26-40] includes contributions of various researchers
towards finding missing values, classification, prediction,
clustering medical records.
3. Research Challenges
3.1 Dimensionality Reduction
The process of dimensionality reduction w.r.t medical
datasets requires handling the data without any loss. This is
mainly because unlike the conventional approach where we
try to reduce the dataset by eliminating irrelevant , least
important features of the dataset through preprocessing
which includes stop word elimination, removing stemming
words, this approach does not hold good for the case of
medical datasets.
Dimensionality reduction is the major concern when
dealing with the large datasets and very large databases.
However in the view of mining medical records, this
problem becomes much more complex. This is because; we
must perform dimensionality reduction but at the same time
we must make sure that we retain all the information
without any loss which is not as simple as we view it to be.
Dimensionality reduction techniques have been discussed
extensively in various research works in the literature. The
preprocessing phase also requires normalizing the data to
make it feasible for handling efficiently. Once the data is
normalized, the next problem is with the missing values.
3.2 Missing Values
A peculiar problem which arises when handling medical
datasets is the presence of missing values in the dataset
which is very natural. This is because the data may have
been collected; sampled from various clinical trials, field
experiments or any other traditional mechanism. Proper
care must be taken to handle missing values suitably and
accurately.
A simple approach to handle missing values would be to
discard the whole record which essentially contains the
missing value of an attribute. But, this do not solve the
problem as, we are missing this record. There is a chance
that this record may be important and deciding factor when
performing prediction and classification.
The problem of handling missing values has been widely
dealt by data mining practitioners and researchers by
exploring various approaches to handle the missing values
[26-33]. Handling missing values requires the designing
suitable imputation measures or using the existing
measures. In case, we design new measure, it should satisfy
the typical properties of a distance measure. There is a
scope for research in this direction. The researchers may
come up with suitable measures or alternately validate the
use of mathematical functions to suit the need for fixing
and filling missing values.
3.3 Functional Relation among Attributes
Another important challenge and problem which cannot be
left unexplored is discovering the functional relation among
the attributes of the medical datasets. This must handled
very carefully as the features or attributes are the major
elements.
3.4 Imputation Measure
The choice of suitable imputation mechanism, approach
and measure is critical for handling medical data.
Estimation of missing values using an effective imputation
measure shall be the deciding stage when performing
classification and prediction. The training set, number of
samples is considered critical when handling medical
datasets.
One of the approaches to handle missing values is the use
of proper imputation measure which can efficiently fix the
missing values. Alternately, we may design a suitable
imputation measure which may be used to estimate the
missing value, and replace that specific missing value by an
equivalent value computed using this imputation measure.
Some of the works in this direction includes [10, 14].
3.5 Deciding on the Number of Attributes
One of the challenging problems when performing data
mining is the number of attributes. As the number of
attributes, increases, the complexity of handling the dataset
also increases. This is because of the increasing attribute
combination which makes the problem analysis NP-Hard.
The complexity shows non-linearity.
Dimensionality reduction techniques such as feature
selection, reduction, PCA, SVD are usually applied to
handle when the attributes are very large. In some
situations, domain experts and domain knowledge may help
reduce the attributes. The domain knowledge may also be
useful to obtain the dependency relation among the
attributes. Parameter tuning is also one of the problems
when handling the medical databases and datasets.
3.6 Update Anomaly
Medical data is not static. It is time variant. It is multidimensional. Since the medical data keeps updating from
the readings arriving from laboratory tests, the data mining
approaches, methods, methodologies, algorithms designed
should be able to dynamically learn, build and update the
knowledge database incrementally.
Incremental Data Mining approaches are better suitable for
handling medical databases as against to static methods and
approaches. Techniques such as, incremental frequent
pattern mining algorithms, incremental clustering, Dynamic
Item set finding algorithms, handle the time varying nature
of medical datasets.
3.7 Problem with Data Interpretation
The medical data usually comes from various
homogeneous and heterogeneous sources which bring the
problems such as the data inconsistency, data
incompleteness, data interpretation. This is because;
various sources may consider different terminologies,
standards, attributes. The attributes may be continuous or
not.
4. Problem Definition
Given a dataset of medical records, the problem is to
classify the medical records to one of the classes labeled.
Applying data mining techniques to medical datasets helps
in identifying the hidden knowledge which helps in
prediction and classification. There are various approaches
for classification and clustering the datasets. Each
algorithm has its own pros and cons. Also, the dataset over
which we apply these techniques may not be free from
missing values as all attribute values for every record
corresponding to a schema may not be recorded and as a
result of this, every value may not be present. The first step
in handling the dataset may itself be challenging because
this step requires handling missing values.
There is scope for research to design algorithms for
handling missing values in the medical records. Once we
handle the missing values in the medical dataset, the dataset
M may be transformed to M’. This M’ must then be
normalized to make the medical dataset feasible for
performing classification using the existing algorithms. The
third challenge is in identifying the outlier attributes or
features and then reducing this M’ to M’’. This process is
called as dimensionality reduction.
Finally, we may classify the existing record in to one of the
class labels available. To handle missing values, we must
know the corresponding class labels of each record.
Otherwise this process becomes much more complex. All
the existing algorithms handling missing values put a
constraint that the class labels need to be known Apriori
using which the missing values shall be filled. In case, the
class labels are missing, we must first try to identify the
nearest class label and then fill the missing values in the
medical records.
5. PROPOSED MEASURE
5.1 IMPUTATION ALGORITHM
Step1:
Place the dataset in record and attribute matrix format with
records as rows and attributes as columns
Step2:
Decompose the medical dataset into two parts. The first
part consists of records with no missing values and second
part consisting records with missing values.
Step3:
3.8 Imbalanced Nature of Medical Datasets
Medical datasets are usually imbalanced. In such a case, the
dataset must be preprocessed, before using the same. The
use of preprocessing techniques available must be done
suitably to handle such anomalies.
Normalize the datasets obtained in step-2 suitably. This
involves mapping the attribute values to suit the need for
applying the proposed measure to find missing values i.e
we fix the domain of attribute values.
Step-4:
Choose one record with missing attribute value or missing
values, each time to fix the missing values of the record.
Step-5:
6. CASE STUDY
From the record considered in above step, discard the
attribute column consisting missing values (second part of
dataset).Similarly, discard this attribute column when
considered remaining records with no missing values (first
part of dataset)
Consider the Table.1 which has nine medical records and
two records with missing values as shown below
Table 1. Sample Dataset, MD
A1
A2
A3
A4
R1
a11
5
a31
10
R2
a13
7
a31
5
Step-7:
The class label of new test record is the label of the record
to which the new test record shows maximum similarity.
R3
a11
7
?
7
R4
a12
5
a31
10
R5
a13
3
a32
?
Step-8:
R6
a12
9
a31
10
Fill the missing attribute value of the test record with the
corresponding attribute value of the record to which it has
maximum similarity. We may also consider the frequency
of occurrence in case of ambiguity.
R7
a11
5
a32
3
R8
a13
6
a32
7
R9
a12
6
a32
10
Step-6:
Apply the proposed measure to find the similarity between
the new record and existing records without missing values.
Step-9:
Repeat the above steps for each record having missing
values accordingly.
5.2 IMPUTATION MEASURE
In this section, we discuss the proposed imputation measure
which is used to fix the missing attribute values present in
the medical records.
The Proposed Imputation measure is defined as given by
Equation. 1
= ∗ 1+
∑
(
,
)
(1)
where
Sim (R
,R
) = 0.5 ∗ [1 +
(
.
)
( )
]
The parameters Vik, Vjk denotes attribute values of k
attribute w.r.t records, Ri, Rj .
Similarly,
.
( ) denotes the Standard deviation
of kth attribute w.r.t all records of dataset discarding
missing value attributes.
The distance function is represented as given by
= 1-IMMV
Table 2. Records with no missing values
A1
A2
A3
A4
R1
a11
5 a31
10
R2
a13
7 a31
5
R4
a12
5 a31
10
R6
a12
9 a31
10
R7
a11
5 a32
3
R8
a13
6 a32
7
R9
a12
6 a32
10
(2)
th
Distance,
Table.2 shows the records R1 , R2, R4, R6, R7, R8, R9 which
have all the attribute values defined. We can observe that
there are no missing values.
(3)
Table.3 shows the records R3 , R5 . We can observe that
there are missing values in these records w.r.t attributes A3
and A4 in records R3 , R5 respectively.
Table 3. Records with Missing Values
A1
A2
A3
R3
a11
7 ?
R5
a13
3 a32
A4
7
?
Table 4. Standard.deviation of attributes
A1
A2
A3
A4
0.816497
1.46385
0.534522
2.91139
Table 8. Mapping Records with Missing Values
In Table.4, the standard deviation of all attributes w.r.t the
records of Table.1 are computed and recorded. The
attributes values for A1 and A3 are mapped to a different
domain to make it suitable for processing by the algorithm
and proposed measure. Similarly, Table.8 shows mapping
of attribute values for records R5 and R3.
Table 5. Mapping A1 and A3 attribute values
A2
A3
A4
R1
1
5
1
10
R2
3
7
1
5
R4
2
5
1
10
R6
2
9
1
10
R7
1
5
2
3
R8
3
6
2
7
R9
2
6
2
10
Std.dev
0.8164
1.4638
0.5345
2.9113
Table 6. Similarity of Record, R3with Record in Table.2
Similarity with R3
0.750079
R2
0.771048
R4
0.6206
R6
0.6206
R7
0.717678
R8
0.771595
R9
0.699342
A2
A3
A4
R3
1
7
?
7
R5
3
3
2
?
The similarity values of records R3 , R5 w.r.t records R1 ,
R2, R4, R6, R7, R8, R9 is shown in Table.6 and Table.7
respectively. Table.10 shows the attribute values of records
after fixing missing values using the proposed measure.
Table 9. Records with Mapping Values fixed
A1
R1
A1
Table 7. Similarity of Record, R5with Record in Table.3
A1
A2
A3
A4
R3
1
7
2
7
R5
3
3
2
7
Table 10. Records with Missing Values fixed
A1
A2
A3
A4
R3
1
7
a 32
7
R5
3
3
2
7
The records R3 , R5 show maximum similarity with the
records of Table.6 and Table.7 respectively. Hence we fix
the missing attribute values of these records w.r.t record R8
7. CONCLUSIONS
Missing values are very common in medical data. Handling
missing values is challenging and also an interesting area of
research in perspective of data mining, information
retrieval. In this work, we propose imputation measure
which is used to fix missing values of attributes for the
records of dataset. This is done by finding similarity values
between records with missing values and records without
missing values. A case study is discussed which shows
procedure of fixing missing values.
8. ACKNOWLEDGEMENTS
R1
0.531219
R2
0.671795
R4
0.567994
R6
0.542221
We are thankful to anonymous reviewers who gave their
valuable suggestions in bringing this work successfully.
This work is not funded by any public, private or
government agencies and has been carried as part of
research work. We are also thankful to the Head of the
Department, IT and Principal, VNR VJIET for motivating
us towards research.
R7
0.692853
References
R8
0.835833
R9
0.706354
Similarity with , R5
[1]. Xindong Wu, Vipin Kumar, J. Ross Quinlan et.al. 2007. Top
10 algorithms in data mining. Knowledge Inf. Syst.14, 1, Pg
1-37. DOI=10.1007/s10115-007-0114-2.
http://dx.doi.org/10.1007/s10115-007-0114-2
[2]
M. A. Jabbar et.al. An Evolutionary algorithm for Heart
Disease Prediction. Communications in Computer and
Information Science. Springer Verlag. Volume 292, pp 378389. http://dx.doi.org/10.1007/978-3-642-31686-9_44
[3] M. A. Jabbar, B. L. Deekshatulu, Priti Chandra. Graph Based
Approach for Heart Disease Prediction. Proceedings of the
Third International Conference on Trends in Information,
Telecommunication and Computing, Volume 150 of the
series Lecture Notes in Electrical Engineering pp 465-474
[4]. V.Krishnaiah, G.Narsimha, N.Subhash Chandra. Heart
Disease Prediction System Using Data Mining Technique by
Fuzzy K-NN Approach. Emerging ICT for Bridging the
Future - Proceedings of the 49th Annual Convention of the
Computer Society of India (CSI). Series Advances in
Intelligent Systems and Computing, Volume 337, pp 371384
[5]. Sujata Joshi, Mydhili K. Nair. Prediction of Heart Disease
Using Classification Based Data Mining Techniques.
Computational Intelligence in Data Mining - Volume 2,
Volume 32 of the series Smart Innovation, Systems and
Technologies pp 503-511
[6]. John F. Roddick, Peter Fule, and Warwick J. Graco. 2003.
Exploratory medical knowledge discovery: experiences and
issues. SIGKDD Explor. Newsl. 5, 1 (July 2003), 94-99.
DOI=10.1145/959242.959243.http://doi.acm.org/10.1145/95
9242.959243
[7].
Michele Sebag, Jerome Aze, Noel Lucas. ROC-Based
Evolutionary Learning: Application to Medical Data Mining.
Artificial Evolution Volume 2936 of the series Lecture Notes
in Computer Science pp 384-396
[8] Miho Ohsakiet.al. Investigation of Rule Interestingness in
Medical Data Mining. Active Mining Volume 3430 of the
series Lecture Notes in Computer Science pp 174-189
[9] Mohammed Abdul Khaleel G. N. Dash, K. S. Choudhury,
Mohiuddin Ali Khan. Medical Data Mining for Discovering
Periodically Frequent Diseases from Transactional
Databases. Computational Intelligence in Data Mining Volume 1, Volume 31 of the series Smart Innovation,
Systems and Technologies pp 87-96
[10] G. Madhu et.al. A Non-Parametric Discretization Based
Imputation Algorithm for Continuous Attributes with
Missing Data Values. International Journal of Information
Processing, 8(1), 64-72, 2014
[11] Sreehari Rao et.al A New Intelligence-Based Approach for
Computer-Aided Diagnosis of Dengue Fever. IEEE
Transactions on Information Technology in Biomedicine,
Jan. 2012, 112 – 118.
[12] M. Naresh Kumar. Performance comparison of State-of-the
art Missing Value Imputation Algorithms on Some Bench
mark Datasets.
[13] V. Sree Hari Rao et.al .Novel Approaches for Predicting
Risk Factors of Atherosclerosis. IEEE Journal of Biomedical
and Health Informatics. Jan 2013, Pg 183 – 189
[14] G.Madhu et.al. A novel index measure imputation algorithm
for missing data values: A machine learning approach. IEEE
International Conference on Computational Intelligence &
Computing Research, Dec 2012.
[15] A Novel Discretization Method for Continuous Attributes:
A Machine Learning Approach. International Journal of Data
Mining and Emerging Technologies 4 (1), 34-43.
[16] G.Madhu et.al. Improve the Classifier Accuracy for
Continuous Attributes in Biomedical Datasets Using a New
Discretization Method. Journal Procedia Computer Science,
2014, Vol 31, 671-79.
[17] A. Casillas et.al. First approaches on Spanish medical record
classification using Diagnostic Term to class transduction.
Proceedings of the 10th International Workshop on Finite
State Methods and Natural Language Processing, pages 60–
64.
[18]. Tomasz ORCZYK. Influence of Missing data imputation
method on the classification accuracy of medical data.
JOURNAL
OF
MEDICAL
INFORMATICS
&
TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037.
[19] Krzysztof J. Cios and G. William Moore. 2002. Uniqueness
of medical data mining. Artif. Intell. Med. 26, 1-2 (September
2002), 1-24.
[20] George Tzanis. 2014. Biological and Medical Big Data
Mining. Int. J. Knowl. Disc. Bioinfo. 4, 1 (January 2014), 4256.DOI=10.4018/ijkdb.2014010104
[21] Ashkan Sami. 2006. Obstacles and misunderstandings facing
medical data mining. In Proceedings of the Second
international conference on Advanced Data Mining and
Applications(ADMA'06), Xue Li, Osmar R. Zaïane, and
Zhan-huai Li (Eds.). Springer-Verlag, Berlin, Heidelberg,
856-863
[22] Sean N. Ghazavi and Thunshun W. Liao. 2008. Medical data
mining by fuzzy modeling with selected features. Artif.
Intell. Med. 43, 3 (July 2008), 195-206.
[23] Tamas Sandor Gal. 2011. Privacy Preserving Data Mining
for Medical Data. Ph.D. Dissertation. University of
Maryland at Baltimore County, Catonsville, MD, USA.
Advisor(s) Zhiyuan Chen. AAI3490994.
[24] Andrea L. Houston, Hsinchun Chen, Susan M. Hubbard,
Bruce R. Schatz, Tobun D. Ng, Robin R. Sewell, and Kristin
M. Tolle. 1999. Medical Data Mining on the Internet:
Research on a Cancer Information System. Artif. Intell.
Rev. 13, 5-6 (December 1999), 437-466.
[25]
M. Akhil jabbar, B.L. Deekshatulu, Priti Chandra,
Classification of Heart Disease Using K- Nearest Neighbor
and Genetic Algorithm, Procedia Technology, Volume 10,
2013, Pages 85-94, ISSN 2212-0173
[26] Martti Juhola and Jorma Laurikkala. 2013. Missing values:
how many can they be to preserve classification
reliability? Artif. Intell. Rev. 40, 3 (October 2013), 231-245.
DOI=10.1007/s10462-011-9282-2
http://dx.doi.org/10.1007/s10462-011-9282-2
[27] Geaur Rahman and Zahidul Islam. 2011. A decision treebased missing value imputation technique for data preprocessing. In Proceedings of the Ninth Australasian Data
Mining Conference - Volume 121 (AusDM '11), Peter
Vamplew, Andrew Stranieri, Kok-Leong Ong, Peter
Christen, and Paul J. Kennedy (Eds.), Vol. 121. Australian
Computer Society, Inc., Darlinghurst, Australia, Australia,
41-50.
[28] Pilsung Kang. 2013. Locally linear reconstruction based
missing value imputation for supervised learning.
Neurocomput. 118, 65-78.
[29] Krzysztof Simiński. 2013. Clustering with Missing
Values. Fundam. Inf. 123, 3 (July 2013), 331-350.
DOI=10.3233/FI-2013-814
http://dx.doi.org/10.3233/FI2013-814
[30] Archana Purwar and Sandeep Kumar Singh. 2015. Hybrid
prediction model with missing value imputation for medical
data. Expert Syst. Appl. 42, 13 (August 2015), 5621-5631.
DOI=10.1016/j.eswa.2015.02.050
http://dx.doi.org/10.1016/j.eswa.2015.02.050
[31] Alireza Farhangfar, Lukasz Kurgan, and Jennifer Dy. 2008.
Impact of imputation of missing values on classification error
for discrete data. Pattern Recogn. 41, 12 (December 2008),
3692-3705.
[32] Shariq Bashir, Saad Razzaq, Umer Maqbool, Sonya Tahir,
and A. Rauf Baig. 2006. Using association rules for better
treatment of missing values. In Proceedings of the 10th
WSEAS
international
conference
on
Computers (ICCOMP'06), Zoran S. Bojkovic (Ed.). World
Scientific and Engineering Academy and Society (WSEAS),
Stevens Point, Wisconsin, USA, 1133-1138.
[33] Hai Wang and Shouhong Wang. 2007. A data mining
approach to assessing the extent of damage of missing values
in survey. Int. J. Bus. Intell. Data Min. 2, 3 (October 2007),
249-260.DOI=10.1504/IJBIDM.2007.015484
[34] Leila Ben Othman and Sadok Ben Yahia. 2006. Yet another
approach for completing missing values. In Proceedings of
the 4th international conference on Concept lattices and
their applications (CLA'06), Sadok Ben Yahia, Engelbert
Mephu Nguifo, and Radim Belohlavek (Eds.). SpringerVerlag, Berlin, Heidelberg, 155-169.
[35] Heiko Timm, Christian Döring, and Rudolf Kruse. 2003.
Differentiated treatment of missing values in fuzzy
clustering. In Proceedings of the 10th international fuzzy
systems association World Congress conference on Fuzzy
sets and systems (IFSA'03), Taner Bilgiç, Bernard De Baets,
and Okyay Kaynak (Eds.). Springer-Verlag, Berlin,
Heidelberg, 354-361.
[36] Estevam R. Hruschka, Jr. and Nelson F. F. Ebecken. 2002.
Missing values prediction with K2.Intell. Data Anal. 6, 6
(December 2002), 557-566.
[37] Selma T. Milagre, Carlos Dias Maciel, Jose Carlos Pereira,
and Adriano A. Pereira. 2009. Fuzzy cluster stability analysis
with missing values using resampling. Int. J. Bioinformatics
Res. Appl.5, 2 (March 2009), 207-223.
[38] Hai Wang and Shouhong Wang. 2007. Visualization of the
critical patterns of missing values in classification data.
In Proceedings of the 9th international conference on
Advances in visual information systems (VISUAL'07),
Guoping Qiu, Clement Leung, Xiangyang Xue, and Robert
Laurini (Eds.). Springer-Verlag, Berlin, Heidelberg, 267-274.
[39] Chi-Chun Huang and Hahn-Ming Lee. 2004. A Grey-Based
Nearest Neighbor Approach for Missing Attribute Value
Prediction. Applied Intelligence 20, 3 (May 2004), 239-252.
[40] Xiaodong Feng, Sen Wu, and Yanchi Liu. 2011. Imputing
missing values for mixed numeric and categorical attributes
based on incomplete data hierarchical clustering.
In Proceedings of the 5th international conference on
Knowledge
Science,
Engineering
and
Management (KSEM'11), Springer-Verlag, Heidelberg, 414424.
[41] Federico Cismondi, André S. Fialho, Susana M. Vieira,
Shane R. Reti, JoãO M. C. Sousa, and Stan N. Finkelstein.
2013. Missing data in medical databases: Impute, delete or
classify?. Artif. Intell. Med. 58, 1 (May 2013), 63-72.
DOI=10.1016/j.artmed.2013.01.003
http://dx.doi.org/10.1016/j.artmed.2013.01.003
[42] Atif Khan, John A. Doucette, and Robin Cohen. 2013.
Validation of an ontological medical decision support system
for patient treatment using a repository of patient data:
Insights into the value of machine learning. ACM Trans.
Intell. Syst. Technol. 4, 4, Article 68 (October 2013), 31
pages.DOI=10.1145/2508037.2508049
http://doi.acm.org/10.1145/2508037.2508049
[43] Jau-Huei Lin and Peter J. Haug. 2008. Exploiting missing
clinical data in Bayesian network modeling for predicting
medical problems. J. of Biomedical Informatics 41, 1
(February 2008), 1-14. DOI=10.1016/j.jbi.2007.06.001
http://dx.doi.org/10.1016/j.jbi.2007.06.001
[44] Karla L. Caballero Barajas and Ram Akella. 2015.
Dynamically Modeling Patient's Health State from Electronic
Medical Records: A Time Series Approach. In Proceedings
of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '15). 69-78.
DOI=http://dx.doi.org/10.1145/2783258.2783289
[45] Zhenxing Qin, Shichao Zhang, and Chengqi Zhang. 2006.
Missing or absent? A Question in Cost-sensitive Decision
Tree. In Proceedings of the 2006 conference on Advances in
Intelligent IT: Active Media Technology 2006, Yuefeng Li,
Mark Looi, and Ning Zhong (Eds.). IOS Press, 118-125.
[46] Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi,
and Kost Elisevich. 2010. Effect of classifiers in consensus
feature ranking for biomedical datasets. In Proceedings of
the ACM fourth international workshop on Data and text
mining in biomedical informatics (DTMBIO '10). 67-68.
[47] Bogdan Gabrys. 2009. Learning with Missing or Incomplete
Data. In Proceedings of the 15th International Conference on
Image Analysis and Processing (ICIAP '09), Pasquale
Foggia, Carlo Sansone, and Mario Vento (Eds.). SpringerVerlag, Berlin, Heidelberg, 1-4. DOI=10.1007/978-3-64204146-4_1 http://dx.doi.org/10.1007/978-3-642-04146-4_1