Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analyzing Start-up Success Possibility Using Data Mining Technique Shreyansh Kakadiya Computer Engineering Dwarkadas J. Sanghvi College of Engineering Mumbai, India Krish Moodbidri Computer Engineering Dwarkadas J. Sanghvi College of Engineering Mumbai, India Prof. Sindhu Nair Computer Engineering Dwarkadas J. Sanghvi College of Engineering Mumbai, India [email protected] [email protected] [email protected] ABSTRACT: A large number of new ventures developing in the market these days and it has now become difficult to judge the reliability and scope of the product. This growth though seems national development has now led to multiplied problems with venture capitalist to find potential startup that will lead to a profit. This paper uses data mining techniques to predict the possibility of success of a startup. The technique discussed in the paper can be used in business plan seminars when startups discuss their idea and approach with an aim to find potential investors. Such prediction will help a large group of people to take wise decisions. The paper proposes a method that analysis attributes of successful startup and finds out the most important attributes using data mining technique. The attributes obtained as a result can then be compared with the new ventures to find out its success possibility. Keywords: Data Mining, Start-up, Ventures, Classification. BDM, Algorithms, 1. INTRODUCTION Millions of venture are launched every year across the globe. New ideas are structured every year in different areas. A successful startup benefits a large group of population. People who benefit include Venture capitalists who fund the startup Founder and employees People whom the startup serves New Ventures that follow their foot-steps. A successful startup has variety of factors that make it a successful one. These factors are the attributes of the startup. These attributes are associated to domains that lead the startup to success. Thus, the attributes are associated to the company and the product. Also, there are attributes associated to the member of the startup. Environmental attribute also play a major role in startup success prediction. Thus, identifying these attributes in all the successful start would give a holistic view to analyze the major factors responsible for the startup success. In this paper a method is discussed that can help to predict the success possibility of a startup. 1. We find out all the attributes that are associated to various domains of a successful startup. This is known as data collection. 2. After collecting the data various algorithms are applied on the data to make the data clean. This removes incomplete, inconsistent data from the data set. 3. Finally a set of attributes that most affect the success of the startup are listed. This serves as an ideal attribute set for success in venture. 4. Now, attributes of all the new ventures are gathered and balancing algorithms are applied on them. These data sets are raw data sets. 5. The amount of similarity between ideal attribute set and raw data set can give most accurately the possibility of the new venture to succeed. Such analysis of the ideal attribute and raw data can help predict success. It therefore reduces the business risks by providing the factor that are lacking in the venture. Thus, such technique can be considered to be a Business Data Mining technique (BDM). 2. METHODOLOGY The method to be followed to find out the possibility of startup success can be divided into two phases as shown in fig 1: 2.1 Formation of ideal attribute sets: In this phase ideal attribute set is formed after identifying and applying algorithms on data sets of successful startup. 2.2 Comparison of ideal attribute set with the raw data set: Attributes of new venture are gathered as raw data set and compared with ideal attribute set. The degree of similarity in the two sets gives the possibility of startup success. The ideal attribute set need not be created every time. However, it has to change over a certain period of time. A. Data collection In this step, attributes of successful startup are collected. These attributes may be associated with the company as a whole or the members or may be the environment of the company. The attributes of the successful startup are not clean when they are collected. Unclean attributes may be inconsistent or incomplete. Many of these attributes maybe completely irrelevant to the domain as well [5]. A.1. Company attributes: These are attributes associated with the company as a whole. While studying these attributes aspect of the company as a whole are studied excluding attributes like members motivation, competition [2]. These attributes are shown in Table 1. Attributes Description Uniqueness_of_Idea 0 to 1 rating. Usability Demand in the market. Company_marketing_effect How much popular is the product. Price and need Fig. 1 Process Flow 2.1 FORMATION OF IDEAL ATTRIBUTE SET. Formation of ideal attribute set is includes following steps as shown in fig. 2. : Is the price greater than the average salary of targeted people? Table 1. Company Attributes A.2. Member attributes: These are attributes associated with the member of the startup. The members include the owner and the employees working the company. There are many abstract attributes associated with members such as member motivation, hard work, persistence and interest. However such attributes cannot be seen and hence cannot be calculated individually. Such attributes can be grouped into the term “Abstract attribute” and can be rated. Apart from these attributes there are few major attributes to be seen in the member of a successful startup. Fig. 2 Flowchart for Attribute Set Formation 1. Experience: Experienced members can help to solve infinite many problems in the startup. The success rate also depends on the experience of the members. Statistically, the success rate can be decided as given in Table 2. Experience Success Percentage First-timer 18% Repeat-Player 20% Veteran 30% TABLE 2: Effect of members experience on startup 2. Familiarity and cognition level: Unlike established companies, a start comes across new problems every day. The problem becomes severe if familiarity with these problems is low. Thus, familiarity with unknown situation can diminish the rate of startup success. A fuzzy function S can be used to find the familiarity F(t) [3]. A.3. Environmental attributes: These attributes have an indirect effect on the company’s success. These are attributes that are not directly associated with the company but can affect the success rate. Table 3 shows examples of environmental attributes. Attributes Description Location Location where product is developed as well as selling location is considered. Competition Other companies with similar product in the same location. Years in Market Number of years since the company is established. Data pre-processing is a crucial step as quality of data does affect the result. Various mining techniques like clustering and classification gives best results if data being used is cleaned by preprocessing. Conventionally, data collection included entering data into the file or entering then into excel sheets. After which this data was pre-processed. Initially, pre-processing would consist of removal of incomplete, irrelevant data from the file or the sheet. While analyzing different startup, new attributes can be seen, which may leave the data collected unbalanced. Unbalanced data would consist of one or more attributes more in one data set than the other. To improve the performance of the data, balancing has to be done before actually applying the data mining concept. Various data mining tools can perform this balancing of data sets. For instance, Weka tool has SMOTE filter to remove the unbalanced data. After the complete pre-processing of the data, algorithms can be applied on the processed data. These algorithms will help in selecting and ranking the attributes initially identified in the startup. A table of these attributes is then made as per the frequency of occurrence of the attribute as shown below table 4. Attributes Frequency Uniqueness_of_idea 15 Location 12 Competition 9 B. Data Pre-processing The data or the attributes obtained as the variables in the data collection phase are not clean. This means the data is inconsistent or incomplete [1]. The data may be completely irrelevant too. Inconsistent data is not suitable for data mining. Cleaning the data renders the data that gives efficient result on mining. Apart from cleaning the data, Feature selection and variable transformation are some other activities performed. C. Data Mining The techniques of gaining knowledge from the set of data is known as data mining [4][8]. These mining techniques involve various types of algorithms. These algorithms that can be applied are clustering, association, classification, etc [1]. Classification algorithm suites the best in this case as attributes need to be classified. The prior objective is to get a prediction model over the identified attributes. Various data mining tools such as Weka[1] can be used to apply the algorithms on the data sets. To apply the algorithms various data sets as gathered previously can be used. 1. The attributes that were identified in the data gathering tool are applied with the algorithms. 2. The attributes applied with selection algorithms first and then selected set with the classification algorithm. 3. The attributes obtained after balancing the set and then applied with the classification algorithms. After applying these algorithms, accuracy and the cost effectiveness of the cases above can be calculated to get the best results. The classification algorithms is as follows: C.1. Decision-tree algorithm: These are algorithm to represent information of classification algorithm classically. Tools of data mining provide with built-in algorithms [1]. C.2. Rule based algorithm: These classifier algorithms utilize a set of IF_THEN rules for classification [1]. The rule can be written as: IF <condition> THEN <conclusion> The IF part is the precondition or the antecedent while the THEN part is rule consequent. The IF part consists of condition that has the one or more testing attributes that are then ANDed while class prediction is done in consequent part. Such classification provides with induction rules and decision tree to determine the prediction model. By interpreting the decision trees and induction rules, attributes that influence the classification most are found rendering those attributes that do not affect the prediction much. Thus, these attributes that are obtained after data mining are obtained in a cost efficient way and they form the ideal attribute set. 2.2. Comparison of ideal attribute set with raw attribute set In this step, attributes from the newly registered ventures are identified. This is similar to the data collection in the step one. Data from the new ventured are classified into three clusters as company attributes, member attributes and environment attributes. Then the attributes are identified in all the three groups. These identified attributes are not clean. They have similar problem like inconsistency and incompleteness. Thus these attributes are pre-processed before comparing them with ideal attribute sets. The attributes collected from all new ventures may be unbalanced. This is there may be a few attributes more or less than the ideal attribute set. Balancing of the attributes is done and then these attributes are ready for the comparison study. This set forms the raw data set. Thus, two data sets are available one is the ideal attribute set and other is the raw data sets. Any of the data set comparison technique used in statistics can be applied to compare the given data. This would thus determine the possibility of startup. During comparison, the ideal data may be changed as per requirement of statistical comparison methods. Standard error Method: Statistics consists of many algorithms for comparison. One such algorithm is standard error method. Here we assume the ideal attribute set in 0 to 1 range attributes and the raw data is given a 0 to 1 rating for the attributes. Assuming the mean to be ideal attribute set, standard deviation or variance of the raw set with ideal set is calculated and then prediction is made. The standard deviation is inversely proportional to success possibility [9]. T-test: T-test is another such test that assumes the mean of two data set E and O and helps in data comparison [10]. Thus, by using this technique success possibility of a startup can be determined. 3. CONCLUSION Business and Statistics are very dynamic in nature and hence it is very difficult task to predict the possibility of startup success. Also, the process is very long and time consuming. Data has to be obtained by studying various successful startup across the globe. Then this data is cleaned and balancing the attributes is done. Selection algorithms are applied in the preprocessing phase to identify accurate comparative data. After identifying the ideal attribute set, data from new venture is collected. This data also goes through preprocessing to form the raw attribute set. Once two data sets are available statistical comparison of two data sets is done so as predict the possibility of success of new venture. 4. FUTURE SCOPE AND DISCUSSIONS 1. This method may be made more efficient by identifying the accurate attributes pertaining to startups. This can be done by clustering startup with high similarity and then comparing them. 2. Machine learning techniques can be used to learn the previous data and then comparison techniques can be applied. This would future reduce work and increase efficiency. 3. Efficiency of classification algorithms can be improved by tuning the data. 5. REFERENCES Hina Gulati, “Predictive analysis using data mining concepts”, Institute of Electrical and Electronics Engineers. [2] Annabella Habinka Ejiri, Henk G. Sol, “Decision Enhancement Services for Small and Medium Enterprise Start-ups in Uganda”, IEEE 2012 45th Hawaii International Conference on System Sciences. [3] Gancho Vachkov, Hidenori Ishihara, “Classification of Process Data and Images by Human Assisted Fuzzy Similarity Analysis”, ICROS-SICE International Joint Conference 2009 [4] [5] [6] [7] [1] [8] [9] [10] August 18-21, 2009, Fukuoka International Congress Center, Japan J.Refonaa, Dr. M. Lakshmi, V.Vivek, “Analysis And Prediction Of Natural Disaster Using Spatial Data Mining Technique”, 2015 International Conference on Circuit, Power and Computing Technologies. W. Yathongchai, C. Yathongchai, K. Kerdprasop, N. Kerdprasop, “Factor Analysis with Data Mining Technique in Higher Educational Student Drop Out”, Latest Advances in Educational Technologies, 2003. E. Yom-Tov, G.F. Inbar,“Feature Selection for the Classification of Movements From Single MovementRelated Potentials”, IEEE Transactions on neural systems and rehabilitation engineering, vol. 10, no. 3, September 2002,pp. 170-177. E. Gharavi, M. J. Tarokh, “Predicting customers' future demand using data mining analysis: A case study of wireless communication customer”, IEEE,5th Conference on Information and Knowledge Technology,2013,pp.338- 343 M. S. Chen, J. Han, P. S, Yu, “Data Mining: An Overview from a Database Perspective”, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, December 1996 http://www.surveystar.com/startips/jan2013.pdf http://www.statisticallysignificantconsulting.com/T test.html