Download 2.1 Formation of ideal attribute set.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Analyzing Start-up Success Possibility Using Data Mining
Technique
Shreyansh Kakadiya
Computer Engineering
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Krish Moodbidri
Computer Engineering
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Prof. Sindhu Nair
Computer Engineering
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
[email protected]
[email protected]
[email protected]
ABSTRACT:
A large number of new ventures developing in the
market these days and it has now become difficult
to judge the reliability and scope of the product.
This growth though seems national development
has now led to multiplied problems with venture
capitalist to find potential startup that will lead to
a profit. This paper uses data mining techniques to
predict the possibility of success of a startup. The
technique discussed in the paper can be used in
business plan seminars when startups discuss their
idea and approach with an aim to find potential
investors. Such prediction will help a large group
of people to take wise decisions. The paper
proposes a method that analysis attributes of
successful startup and finds out the most important
attributes using data mining technique. The
attributes obtained as a result can then be
compared with the new ventures to find out its
success possibility.
Keywords:
Data Mining, Start-up,
Ventures, Classification.
BDM,
Algorithms,
1. INTRODUCTION
Millions of venture are launched every year across
the globe. New ideas are structured every year in
different areas. A successful startup benefits a
large group of population. People who benefit
include Venture capitalists who fund the startup
 Founder and employees
 People whom the startup serves
 New Ventures that follow their foot-steps.
A successful startup has variety of factors that
make it a successful one. These factors are the
attributes of the startup. These attributes are
associated to domains that lead the startup to
success. Thus, the attributes are associated to the
company and the product. Also, there are attributes
associated to the member of the startup.
Environmental attribute also play a major role in
startup success prediction. Thus, identifying these
attributes in all the successful start would give a
holistic view to analyze the major factors
responsible for the startup success.
In this paper a method is discussed that can help to
predict the success possibility of a startup.
1. We find out all the attributes that are associated
to various domains of a successful startup. This is
known as data collection.
2. After collecting the data various algorithms are
applied on the data to make the data clean. This
removes incomplete, inconsistent data from the
data set.
3. Finally a set of attributes that most affect the
success of the startup are listed. This serves as an
ideal attribute set for success in venture.
4. Now, attributes of all the new ventures are
gathered and balancing algorithms are applied on
them. These data sets are raw data sets.
5. The amount of similarity between ideal
attribute set and raw data set can give most
accurately the possibility of the new venture to
succeed.
Such analysis of the ideal attribute and raw data
can help predict success. It therefore reduces the
business risks by providing the factor that are
lacking in the venture. Thus, such technique can be
considered to be a Business Data Mining
technique (BDM).
2. METHODOLOGY
The method to be followed to find out the
possibility of startup success can be divided into
two phases as shown in fig 1:
2.1 Formation of ideal attribute sets: In this
phase ideal attribute set is formed after identifying
and applying algorithms on data sets of successful
startup.
2.2 Comparison of ideal attribute set with the
raw data set: Attributes of new venture are
gathered as raw data set and compared with ideal
attribute set. The degree of similarity in the two
sets gives the possibility of startup success.
The ideal attribute set need not be created every
time. However, it has to change over a certain
period of time.
A. Data collection
In this step, attributes of successful startup are
collected. These attributes may be associated with
the company as a whole or the members or may be
the environment of the company. The attributes of
the successful startup are not clean when they are
collected. Unclean attributes may be inconsistent
or incomplete. Many of these attributes maybe
completely irrelevant to the domain as well [5].
A.1. Company attributes: These are attributes
associated with the company as a whole. While
studying these attributes aspect of the company as
a whole are studied excluding attributes like
members motivation, competition [2]. These
attributes are shown in Table 1.
Attributes
Description
Uniqueness_of_Idea
0 to 1 rating.
Usability
Demand in the
market.
Company_marketing_effect How
much
popular is the
product.
Price and need
Fig. 1 Process Flow
2.1 FORMATION OF IDEAL ATTRIBUTE SET.
Formation of ideal attribute set is includes
following steps as shown in fig. 2. :
Is
the
price
greater than the
average salary of
targeted people?
Table 1. Company Attributes
A.2. Member attributes: These are attributes
associated with the member of the startup. The
members include the owner and the employees
working the company. There are many abstract
attributes associated with members such as
member motivation, hard work, persistence and
interest. However such attributes cannot be seen
and hence cannot be calculated individually. Such
attributes can be grouped into the term “Abstract
attribute” and can be rated.
Apart from these attributes there are few major
attributes to be seen in the member of a successful
startup.
Fig. 2 Flowchart for Attribute Set Formation
1. Experience: Experienced members can help to
solve infinite many problems in the startup. The
success rate also depends on the experience of the
members. Statistically, the success rate can be
decided as given in Table 2.
Experience
Success Percentage
First-timer
18%
Repeat-Player
20%
Veteran
30%
TABLE 2: Effect of members experience on startup
2. Familiarity and cognition level: Unlike
established companies, a start comes across new
problems every day. The problem becomes severe
if familiarity with these problems is low. Thus,
familiarity with unknown situation can diminish
the rate of startup success. A fuzzy function S can
be used to find the familiarity F(t) [3].
A.3. Environmental attributes: These attributes
have an indirect effect on the company’s success.
These are attributes that are not directly associated
with the company but can affect the success rate.
Table 3 shows examples of environmental
attributes.
Attributes
Description
Location
Location where product is
developed as well as selling
location is considered.
Competition
Other companies with similar
product in the same location.
Years in Market
Number of years since the
company is established.
Data pre-processing is a crucial step as quality of
data does affect the result. Various mining
techniques like clustering and classification gives
best results if data being used is cleaned by preprocessing.
Conventionally, data collection included entering
data into the file or entering then into excel sheets.
After which this data was pre-processed. Initially,
pre-processing would consist of removal of
incomplete, irrelevant data from the file or the
sheet.
While analyzing different startup, new attributes
can be seen, which may leave the data collected
unbalanced. Unbalanced data would consist of one
or more attributes more in one data set than the
other. To improve the performance of the data,
balancing has to be done before actually applying
the data mining concept. Various data mining tools
can perform this balancing of data sets. For
instance, Weka tool has SMOTE filter to remove
the unbalanced data.
After the complete pre-processing of the data,
algorithms can be applied on the processed data.
These algorithms will help in selecting and ranking
the attributes initially identified in the startup. A
table of these attributes is then made as per the
frequency of occurrence of the attribute as shown
below table 4.
Attributes
Frequency
Uniqueness_of_idea
15
Location
12
Competition
9
B. Data Pre-processing
The data or the attributes obtained as the variables
in the data collection phase are not clean. This
means the data is inconsistent or incomplete [1].
The data may be completely irrelevant too.
Inconsistent data is not suitable for data mining.
Cleaning the data renders the data that gives
efficient result on mining. Apart from cleaning the
data, Feature selection and variable transformation
are some other activities performed.
C. Data Mining
The techniques of gaining knowledge from the set
of data is known as data mining [4][8]. These
mining techniques involve various types of
algorithms. These algorithms that can be applied
are clustering, association, classification, etc [1].
Classification algorithm suites the best in this case
as attributes need to be classified. The prior
objective is to get a prediction model over the
identified attributes. Various data mining tools
such as Weka[1] can be used to apply the
algorithms on the data sets. To apply the
algorithms various data sets as gathered previously
can be used.
1. The attributes that were identified in the data
gathering tool are applied with the algorithms.
2. The attributes applied with selection algorithms
first and then selected set with the classification
algorithm.
3. The attributes obtained after balancing the set
and then applied with the classification algorithms.
After applying these algorithms, accuracy and the
cost effectiveness of the cases above can be
calculated to get the best results. The classification
algorithms is as follows:
C.1. Decision-tree algorithm: These are
algorithm to represent information of classification
algorithm classically. Tools of data mining provide
with built-in algorithms [1].
C.2. Rule based algorithm: These classifier
algorithms utilize a set of IF_THEN rules for
classification [1]. The rule can be written as:
IF <condition>
THEN <conclusion>
The IF part is the precondition or the antecedent
while the THEN part is rule consequent. The IF
part consists of condition that has the one or more
testing attributes that are then ANDed while class
prediction is done in consequent part.
Such classification provides with induction rules
and decision tree to determine the prediction
model.
By interpreting the decision trees and induction
rules, attributes that influence the classification
most are found rendering those attributes that do
not affect the prediction much. Thus, these
attributes that are obtained after data mining are
obtained in a cost efficient way and they form the
ideal attribute set.
2.2. Comparison of ideal attribute set with raw
attribute set
In this step, attributes from the newly registered
ventures are identified. This is similar to the data
collection in the step one. Data from the new
ventured are classified into three clusters as
company attributes, member attributes and
environment attributes. Then the attributes are
identified in all the three groups. These identified
attributes are not clean. They have similar problem
like inconsistency and incompleteness. Thus these
attributes are pre-processed before comparing
them with ideal attribute sets.
The attributes collected from all new ventures may
be unbalanced. This is there may be a few
attributes more or less than the ideal attribute set.
Balancing of the attributes is done and then these
attributes are ready for the comparison study. This
set forms the raw data set.
Thus, two data sets are available one is the ideal
attribute set and other is the raw data sets. Any of
the data set comparison technique used in statistics
can be applied to compare the given data. This
would thus determine the possibility of startup.
During comparison, the ideal data may be changed
as per requirement of statistical comparison
methods.
Standard error Method: Statistics consists of
many algorithms for comparison. One such
algorithm is standard error method. Here we
assume the ideal attribute set in 0 to 1 range
attributes and the raw data is given a 0 to 1 rating
for the attributes. Assuming the mean to be ideal
attribute set, standard deviation or variance of the
raw set with ideal set is calculated and then
prediction is made. The standard deviation is
inversely proportional to success possibility [9].
T-test: T-test is another such test that assumes the
mean of two data set E and O and helps in data
comparison [10].
Thus, by using this technique success possibility of
a startup can be determined.
3. CONCLUSION
Business and Statistics are very dynamic in nature
and hence it is very difficult task to predict the
possibility of startup success. Also, the process is
very long and time consuming. Data has to be
obtained by studying various successful startup
across the globe. Then this data is cleaned and
balancing the attributes is done. Selection
algorithms are applied in the preprocessing phase
to identify accurate comparative data. After
identifying the ideal attribute set, data from new
venture is collected. This data also goes through
preprocessing to form the raw attribute set. Once
two data sets are available statistical comparison
of two data sets is done so as predict the possibility
of success of new venture.
4. FUTURE SCOPE AND DISCUSSIONS
1. This method may be made more efficient by
identifying the accurate attributes pertaining to
startups. This can be done by clustering startup
with high similarity and then comparing them.
2. Machine learning techniques can be used to
learn the previous data and then comparison
techniques can be applied. This would future
reduce work and increase efficiency.
3. Efficiency of classification algorithms can be
improved by tuning the data.
5. REFERENCES
Hina Gulati, “Predictive analysis using data mining
concepts”, Institute of Electrical and Electronics
Engineers.
[2] Annabella Habinka Ejiri, Henk G. Sol, “Decision
Enhancement Services for Small and Medium
Enterprise Start-ups in Uganda”, IEEE 2012 45th
Hawaii International Conference on System
Sciences.
[3] Gancho
Vachkov,
Hidenori
Ishihara,
“Classification of Process Data and Images by
Human Assisted Fuzzy Similarity Analysis”,
ICROS-SICE International Joint Conference 2009
[4]
[5]
[6]
[7]
[1]
[8]
[9]
[10]
August 18-21, 2009, Fukuoka International
Congress Center, Japan
J.Refonaa, Dr. M. Lakshmi, V.Vivek, “Analysis
And Prediction Of Natural Disaster Using Spatial
Data Mining Technique”, 2015 International
Conference on Circuit, Power and Computing
Technologies.
W. Yathongchai, C. Yathongchai, K. Kerdprasop,
N. Kerdprasop, “Factor Analysis with Data Mining
Technique in Higher Educational Student Drop
Out”,
Latest
Advances
in
Educational
Technologies, 2003.
E. Yom-Tov, G.F. Inbar,“Feature Selection for the
Classification of Movements From Single
MovementRelated Potentials”, IEEE Transactions
on neural systems and rehabilitation engineering,
vol. 10, no. 3, September 2002,pp. 170-177.
E. Gharavi, M. J. Tarokh, “Predicting customers'
future demand using data mining analysis: A case
study of wireless communication customer”,
IEEE,5th Conference on Information and
Knowledge Technology,2013,pp.338- 343
M. S. Chen, J. Han, P. S, Yu, “Data Mining: An
Overview from a Database Perspective”, IEEE
Transactions on Knowledge and Data Engineering,
Vol. 8, No. 6, December 1996
http://www.surveystar.com/startips/jan2013.pdf
http://www.statisticallysignificantconsulting.com/T
test.html