Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Journal of Intellectual Property Rights Vol 21, July 2016, pp 211-225 Landscape Analysis of Patent Dataset Deepti Mehrotra, Sai Sabitha†, Renuka Nagpal and Nisha Mattas Amity University, Sector 125, Noida, Uttar Pradesh - 201 313, India Received 04 January 2016; accepted 15 June 2016 With the advancement of technology in almost all sectors of industry and decreasing span of product, inventing new ideas are required for any industry. These ideas need to be properly guarded through patents to provide inventor due economical reward and right to control his creation. The patents are stored in large databases. The analysis of these databases will help to get an insight into the technology sector, competitor and chronological development in the field of technology. It also helps the inventor to understand how his invention can cater to the need of the current market, so that viable industry collaboration can be done. Landscape analysis of patent is done to get a comprehensive view about all these information. Various computational approaches are used to analyse the patent dataset. These approaches and their objectives are discussed in this literature. By converging all of them on a single platform will provide complete insight at a single point which will aid the inventor and business investor. In this research work an extensive literature on existing approaches is discussed. A framework for a landscape analysis is proposed along with tools and techniques that can be suitably used for a complete technological growth and patent data. Keywords: Technology road-mapping analysis, research and development (R&D), data set attributes, preprocessing, mining techniques In this world of k-economy, the survival of a business organization exists only by using the latest and innovative technology, thereby providing them an edge over their competitor. Encouragement of creating innovative ideas and protecting it under intellectual property rights (IPR)1,64 will be achieved. A patent represents an innovation in the technological domain that represents detailed information about the invention, including inventors, an area to which it belongs, publication details, assignee information, international patent classification code (IPC code).2,3 Today scientists and technocrats are filing a large number of patents. These patents are stored in various databases like-United States Patent and Trademark Office (USPTO), European Patent Office (EPO), World Intellectual Property Organization (WIPO), Japan Patent Office (JPO), State Intellectual Property Office (SIPO) etc.4,5 These databases contain terabytes of information in the unstructured form. This information can be in text, images, numbers, equations, special characters, etc., which can cause hurdle in the extraction of meaningful information. With advent of each day, this increased size of information and different data formats of the patent ——————— † Corresponding author: Email: [email protected] document makes it difficult for users to take full advantage of it. One of the prime objectives of designing these databases is to check the originality of innovation and to assign the research in the name of a researcher or its organization. Apart from novelty check, a researcher is keenly interested in retrieving the patents that have high industrial relevance. Patent landscape is the comprehensive analysis that focuses on identifying IP trends, technology leaders and technological positioning of the companies. Thus, there is a need to maintain and analyse this database so that one can project on competitiveness and also on domains where organizations can invest their Research and Development (R&D) capital to advance technologically.6 There is a need for an organization to perform landscape analysis of a patent dataset before starting any research oriented project. This analysis will give a comprehensive view of the project. Since technology is changing at a fast pace, it is required to forecast technological trends, to gain quick insights into research areas. Business units today are focusing on technology-driven business, and are planning based on their technology capabilities. Analysing valid patents from the repository could save a lot of time J INTELLEC PROP RIGHTS, JULY 2016 212 and effort and would keep organizations, abreast of technology-driven business opportunities, thus more competitive. Researchers have done a wide analysis of patent datasets to find the technological trend, patent classification, clustering similar patents, comparing the technical trend with market analysis etc. To analyse and visualize patent information effectively, researchers extensively utilize text and data mining techniques and tools.7,9 The novel and unknown patterns obtained from these techniques provides necessary decision support to organizations to improve innovation and efficiency. Patent information can provide an approximate description of the innovation activity occurring in most fields of technology in developing countries. It is accepted as the only viable quantitative measure since, it accumulates information over an extended period10 of time. Monitoring of competitors, technological trend analysis, organization share in technology, patent infringement calls for ideas and time-efficient patent analysis. Analysis of patent data can be broadly divided into four types as follows: Technology Roadmap Analysis11 Technology Classification Analysis Originality/R&D Support Analysis Technology Distribution Analysis To do any of the above analysis, one needs a suitable dataset. As these repositories store information corresponding to various attributes, selection of suitable attributes, mining technique and appropriate preprocessing task need to be identified. Data mining techniques like text mining and visualization approaches can be suitably used to get the desired result.12 The Organization of this research work is as follows:a b c Detailing of various types of analysis of patent database; Patent analysis that includes, data collection, preprocessing and dataset attribute selection for specific analysis;and The comparative analysis of various approaches of patent analysis. This study provides the useful information for gaining insight to explore hidden and helpful intelligent knowledge from patents. Analysis on Patents Patent analysis has attracted the researchers in the last decade and enough literature is available in which patent data set is considered to get the desired analysis. In this study, more than 100 papers are refereed and approximately 71 papers are reviewed. The discussion is broadly based on four patent analysis approaches like, technology roadmap, classification, originality/R&D and distribution analysis. Technology Road-Mapping Analysis The patented technology is an emerging field for establishing the relationship between company plan and current technology. For strategic planning, it is required to identify and invest in the right technology. With reduced product life cycle, it is becoming necessary for the company to develop new products for their survival. This analysis provides a commercial perspective to patent analysis. This dimension of analysis can be useful for business units, researchers and educational institutions find new areas and trends in patent. This information is vital for understanding prospects in the field of technology, its prior and current use. The technology roadmap analysis can be retrospective and prospective, i.e., analysis of patent filed in previous decades, present and in the future direction like time series analysis, patent trend analysis and citation analysis. To perform road map analysis, researchers have used different techniques like statistical, data mining analysis and currently text mining is one of popular approach used.2,4 Time Series Analysis The pull of the market encourages a push for innovation as most of the technology innovation is carried out due to competitive need. In time series analysis, the growth of technology in the past and its market implementation are studied. Based on the current market need the future innovations are forecasted for a particular technology. This is plotted against the time distribution in the years.10 It helps in understanding the growth/decline rate of various technologies over time.13 Also, this information requires observing the frequency and quality of R&D activities in the particular geographical area.14 Technology Lifecycle/ Patent Trend Analysis Companies analyse the patent document to ascertain the latest trend in a given technology. Designing of technology driven roadmap for the industry using portfolio affinity map establishes the relationship between technology and industry.11 The characteristics of patent distribution based on the technology development15,16 can be understood. It SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET helps in the development of data processing technologies and also for evaluation of novelty performances of different countries.17 Social impact and cross-impact analysis is performed to judge the impact of innovation on society.18-20 Citation Analysis Citation refers to the number of times the research works of different assignees get cited. Citations are of two kinds- Forward and Backward Citation. If a patent cites another patent, then it is called Forward Citation and a patent being cited by another patent is called Backward Citation. Countries owning patents with more forward citation have a strong technological impact in comparison to a backward citation that depicts countries with mature technologies. It further indicates the R&D ability of assignees. Yujin and Byungum21 designed the patent citation network where patents are the nodes and arcs representing a citation link between citing patent and cited patent. Citation analysis provides the relative importance of the domain and the patent was proposed on the basis of understanding the technical assessment of the patent.62 Technology Classification Analysis Classifying patents into groups help in smoothening the search for the required patent. The IPC (International Patent Classification) code is commonly used to classify patents based on the technology domain.2,22 The data mining tool is used for classifying patents based on similar characteristics. Classification helps in analysing the prospects of key technologies, research, development trends and changes, in particular focuses on concerned technical field.10 IPC Analysis The IPC standard is a hierarchical classification system developed by WIPO for organizing the subject matter of published patents.17 It can help enterprises understand the major categories for technological classification for planning R&D activities.23 Besides, IPC code, different research work uses other codes like Derwent Assignee Code (merging same assignee),24 Corporative Patent Classification (CPC code)14,25 and the Unified Patent Court (UPC).14 Family Analysis In order to protect their invention, inventors file their patent in different patent offices, thus creating a family of the same patent which is known as patent 213 family. Such analysis of patent family helps to organize the database to avoid double counting problem to restrict R&D budget to novel work only and also provide an estimation of patent filling in different offices. Authorized members assign priority to the members usually by date of filing patent. The patent portfolio is then built using the relationship between priority claiming and priority 26 claimed in the patent family. This concept is further refined in the extended family to find the possible link between two patent document and bring them into single family.27 Cluster Analysis Clustering is the task of grouping objects having several attributes into different clusters such that the objects belonging to the same cluster have high similarity in comparison to other patents.28 This approach is specially used to reduce the size of the dataset by grouping them into the subset of similar object, thus helps in searching object in the class.70 Text mining techniques are used for cluster creation based on evaluations done on the similarity score between patents. This approach has been extended by designing the patent map which is created by the semantic network of keywords retrieved from cluster.61 Also, Fuzzy clustering is used to identify the overlapping patent documents and interdisciplinary patent.20 Originality/R&D Support Analysis The basic conditions for invention must be statutory, must be useful, be non-obvious. Apart from these conditions for granting a patent the most basic condition as per patent law, the patent can be granted exclusively if invented technology is novel. This analysis compares new patent with existing patents in the closely related domain. Further, the patents are ranked based on its quality and market usage [6]. The prior art is the technique that contains information about patent's claims of originality. Identification of patent conflicts is made using TRIZ-Led patent mapping techniques.7 Infringement Analysis Patents have been in existence since a long time back and continue to populate databases with exceptional works. Due to the increasing size of the database, it becomes necessary to check for infringements in work. This analysis is useful in determining any similarities among patents to ensure that a patent has not been copied in any sense. Also, it 214 J INTELLEC PROP RIGHTS, JULY 2016 is important to check for infringement before allocating R&D budget.2 Novelty Analysis A work which has not been done in the past is remarked as a novel or Patent in technical terms. Patents are a huge repository of new technology, which provides new development directions in any domain. Thus, it is required to analyse the novelty of work in the field of technology. It is determined by generating the patent maps and measuring the technological distances among patents by constructing semantic matrix, to identify inventions in patents that are highly novel.2 White Spot Analysis It provides direction for research work through the identification of gaps or white spots for problems, for which no or better solutions are required.2 Technology Distribution Analysis This analysis provides information about technology regarding the prevailing domain area, competitors, assignees, etc. It helps in analysing countries with good R&D support.25 Also, it provides information about the proper allocation of resources in a particular technology domain, to enhance niche technology and country as well.16 Patent Trend of Technology Distribution by Country Distribution of patents about same technology with respect to different countries. It helps in knowing countries with most patents registered in a particular technical field.14,15,25 Along with citation this patent analysis helps us to identify the knowledge flow in the different countries. Association Rule Analysis It analyzes objects that associate with each other based on following performance measures like support, confidence and distance.29 It is used to find the interesting hidden patterns in the patent information; thus identifying the relationship among technologies and patentees in a certain technical field.10 Common association rule algorithms are C4.5 decision trees, Apriori algorithms. Rivals/Competitor Analysis This analysis searches all patents of the competitors to analyse the status, scope of technology and R&D competence to prepare guidelines for research and development.30 Assignee Analysis This analysis evaluates Technological innovation capability of different organizations. It ranks the assignees based on their patent quantity.31 Fig. 1 indicates the number of papers reviewed for each type of analysis and their subcategories. The technology road map analysis is most discussed in literature. Cluster analysis is also widely used for subdividing the dataset based on similarity. Fig. 1− Papers reviewed – Patent analysis approaches SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET Methodology for Patent Analysis The methodology adopted consists of the sequence of steps as shown in Figure 2. The first step involves finding relevant patent data. For this objective, the 215 data collection is done through commercial or public databases. Then, patent data are pre-processed to remove irrelevant and incorrect data. The resulting data are in structured form. Mining techniques are Fig. 2−Methodology for landscape patent analysis J INTELLEC PROP RIGHTS, JULY 2016 216 applied to a specific data set attribute. The results help in visualizing R&D innovation and in intelligent business decision making. Data Collection from Different Sources Data collection is the initial stage in analysis process where the collection of patent data is made from commercial or public databases like – USPTO, WIPO, JPO, EPO etc as shown in Table 1 below. The task of searching the patent databases to find relevant patents is supported by various data and text mining tools.2 Methods like- keywords, multi-agents technology for information extraction based on XML,32 patent classification methods like-IPC or CPC Code25 are commonly used for patent information search or patent categorization. A detailed study on patent database and their importance in prior art documentation and patent search is discussed by Singh et al.65 The commercial database is very expensive and public database often do not provide complete information. Once a database is selected the searching in these databases started with broad area which is further refined later as per need of search. suitable. As done in any information retrieval system, it requires compiling this information in the form of a text dataset for conducting data mining analysis. The preprocessing approaches are used for cleaning and transforming the data, without which it is hard and time-consuming to process collected data. Different steps involved in preprocessing are: 1. Removal of duplicate patents 2. Cleaning – removing irrelevant and incorrect data 3. Keyword extraction 4. Text representation1 5. Data transformation and data reduction Many authors have combined preprocessing and text mining techniques for categorization of patents. The popular text mining methods used are correlation analysis, neural networks, clustering,31 that converts unstructured text to numerical data or some structured text data for further analysis.34 Also, as this preprocessing reduces the unwanted and redundant information, hence it increases accuracy, scalability and reliability of the classification and clustering algorithms.35 Data Preprocessing Data Sets Attributes Patent documents contain a variety of information in the form of images, text, diagrams, numbers, dates, equations and special symbols, hence directly using the collection of patent as an input for data mining tool or machine learning algorithm may not be Different attributes are considered in selecting relevant data sets for appropriate analysis. The chosen attributes provide trend analysis, technological contribution with respect to country and organization, citation analysis and many other types of analysis. Based on the need of the organization, attributes are chosen. Table 2 shows attributes, analysis based on data sets, research area, methods and tools used in different case studies. Table 1—Commercial and public databases Commercial Thomas Reuter LexisNexis Mine SoftPatBase PatSear ProQuest Questel Orbit SciFinder Public databases Universal Country USPTO (United States Patent and TradeMark Office) Google Patents CPD (Canadian Patent Office) JPO (Japan Patent Office) INPAIRS (India) WIPO(World Intellectual Property Organisation) EAPO (Eurasian Patent Organisation) Europe patent organization (EPO) Patent.com IP.com IPONZ (New Zealand patent) IPOS (Singapore) AusPAT(Australia) TIPO (Taiwan Intellectual Property Office) SIPO(Chinese Patent Office) Data Mining Techniques These datasets of patents are represented as structured and unstructured data. The classification techniques used for the analysis are broadly classified into two categories:1. Text mining techniques for extracting information from structured or unstructured text. 2. Visualization techniques provide a visual analysis of the patent, so that decision makers or technology experts can easily interpret.43 For data analysis and extracting useful information some commonly used techniques are natural language processing, text mining and data mining techniques. Table 3 presents some popular techniques used for processing data sets. Approximately 20 papers SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET 217 Table 2—Showing attributes and analysis based on them in different research fields S. No. Attributes of data set chosen Analysis based on chosen data sets Research work Method & Tool/ Model used 1 Search period, Item, Country, No. of patents25 Trend analysis, Technology distribution by country, organization, Level of patent Green car trend analysis Keyword, Y-code, IPC code classification 2 Bibliography, assignee, inventor, abstract, annotation, patent family, citation, citation assignee, and citation inventor 4,36 Trend analysis, Technology distribution by country, organization, Level of patent, Citation Analysis Web mining based patent analysis and Citation visualization Keyword, Text mining approach. Tool - Patent Spider (to get original pages of patents), VantagePoint, Aureka and Omniviz 3 Summary, Title, Claim2,15 Trend analysis, Technology distribution by country, organization, Level of patent, Citation analysis Technological trend analysis of Silicon Solar Cell Keyword, TrendPerceptor (Text mining approach based on property-function technique) 4 Claims, Title of invention, Title of document, Technical field, Background art, Summary of invention, Problem, Solution to problem, Effects of invention, Industrial application 37 Technology specific patents retrieved Extraction of the effect and the technology terms from a patent document Extraction of technology specific keywords, String matching 5 Abstract , Title, Claim, Inventors and applicants names, citation etc.37,38 Clustering of similar patents Multilingual text mining approach Hierarchical clustering, Selforganizing maps (SOM) methods Model - Unified space vector (patents mapped into a document) Tool- PatViz38 6 Applicant name, Number of patent applications34 Assignees analysis Forecasting emerging technologies of Low Emission Vehicle SVM classifier applied on data sets Tool - RapidMiner for text processing 7 Patent no., Citation, Publication date 39,40 Patents having same text are grouped into one class related to same technology SIMPLE: A strategic information mining Platform. Analyzing linkage between Industry and technologies Nearest Neighbour(NN) analysis Model - SIMPLE analytics Network analysis using graphical and matricial methods 8 Patent no., Search period, assignee, etc.41,42 Analyzes important patents by calculating citation weight of each patent Analysis of patents in MEMS-related technologies Payek's Search Path Count algorithm for calculating weight of patent41 Tool - software 'Pajek' 42 9 Patent assignee, Application date, Abstract, Publication number, Claims, Detailed problems, etc.9 White spot analysis; Extracting bibliographic data and text information (in form of problems and its solutions) using keywords or short phrases Software-based patent analysis Tool - Patent Skill Cartridge Luxid 10 Assignee, Filing date, Claims Infringement analysis DNA chip technology domain by Lee et al. 43 Tool - WordNet using MDS and Clustering algorithm Hierarchical keyword based approach (Tree-matching algorithm) (Contd.) J INTELLEC PROP RIGHTS, JULY 2016 218 Table 2—Showing attributes and analysis based on them in different research fields (Contd.) S. No. Attributes of data set chosen Analysis based on chosen data sets Research work Method & Tool/ Model used 11 Patent no., Applicant name, Filing date, IPC, Citation, etc.2 Novelty analysis Automotive industry Tool - Knowledgist software42 12 Patent Quantity (PQ), Revealed Patent Advantage (RPA), Patent Activity (PA), Be Cited Rate (BCA), and Relative Citation Index (RCI) Technological analysis using Association Rule Mining Analysis using inference rule based technique Patent analysis -based fuzzy inference system by Yu and Lo44 Kohonen learning algorithm45 and first nearest neighbor heuristic46 13 Citation index, Originality, generality, and technology cycle time Trend analysis using association rule mining Mining changes in patent trends by Shih et al.47 Patent Trend Change Mining (PTCM) 14 Filing date, Assignee, IPC codes, Titles, Abstracts, Claims, and Description of invention Technological forecasting/ Trend analysis Patent analysis by Wang and Cheung22 Semantic Intellectual Property Management System (SIPMS) uses NLP based on semantic analysis Naïve Bayesian algorithm Back propagation neural network algorithm53 Table 3—Mining techniques and Associated analysis Mining Technique Purpose Applicability for analysis MDS (Multidimensional Scaling) in NLP2,43 To discover similarities and dissimilarities in data. Association Rule Mining using apriori algorithm Classification (k- Nearest neighbor/ Naïve Bayes/etc.)2,48 Clustering (k-means/ Hierarchical)43,49 Tree matching algorithm 2,43 Forming meaningful associations among structures extracted from patent documents Classifies patents based on similar characteristics and thus helps in patent class identification Groups instances into cluster, reducing search time Analysis of claims made in patents Back propagation neural network algorithm2,50 Self-Organization Map (SOM) technique51 To determine quality of patents Time-Series Analysis, Trend Analysis Technology Distribution, Citation Analysis White spot analysis, Infringement analysis Association Analysis, Family Analysis, Trend Analysis Assignee Analysis Technology Distribution by country/domain Trend Analysis IPC Analysis, Infringement Analysis Cluster Analysis Infringement Analysis, Novelty Analysis White spot Analysis Patent Level Analysis, Novelty Analysis Classification Analysis Patent Trend Analysis have been referred that mining techniques. To identify new research directions have been used in Patent Analysis Patent analysis is based on:• Technology Roadmap • Technology Classification • Originality/R&D Support • Technology Distribution Different tools and methodologies help in analysing and visualizing patent trends, technological innovation among countries, forecast development of technology, citation analysis, etc. The analysis of the information hidden in patents can provide a clear view of the current trends of a specific technologicalscientific innovation.52-54 It helps in exploiting potentially useful knowledge in which organization is interested and providing a right direction for R&D to improve research activities. Summative tables for each type of analysis are discussed in this section. These tables help to understand how a particular approach is used in literature, on which data set the technique is applied, what attributes were considered for that particular analysis and also the computational approach used for analysis of the collected data. Each of the techniques is classified in one of the four categories. For upgrading any technology, it is very much required to understand how the technologies of the SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET past had improved and upgraded to the current state of technology. The road map analysis involves arranging the patents related to the core technology in chronological order, to visualize its evolution. In 219 Table 4 different approaches used for road map analysis are discussed. 12 papers cited in literature were road map analysed which is done for different domains and also the technology is discussed. It is Table 4—Technology road-mapping analysis Type of analysis Result of analysis Time series analysis Analysing the distribution The Analysers can learn of the patent quantity changing the future trend of various over time technologies by analysing historical data related to the number of patent applications during different time periods. This information reflects the degree of technological development with time60 Technology life cycle / Patent Trend analysis The graph represents a comparison among patent application and granted patent publications in respect of time distribution. Purpose served to organization It indicates the rate of growth/ decline rate in patent deployment. It helps in judging the frequency and quality of R&D activities. Research areas Attributes / Database chosen Mininig technique Patent analysis for technology forecasting5 Community innovation survey (CIS) and patent data from the United States Patent, Trade Organization (USPTO) Attributes chosen year of filing, patent no, assignee etc. USPTO and pub MED dataset Technology field and investment analysis IPC code Non-linear regression (Bass Model) Bibliography, assignee, inventor, abstract, annotation, patent family, citation, citation assignee, and citation inventor Search Period, Item, Country, No. of patents Patent no, search period, assignee etc. Citation graph structure The year of filing, assignee, plan no, pub_appln no etch Number of patents issued, patents filed, assignee etc. Patent no, IPC code, citation etc. Statistical analysis Technology field of health13 New Energy, Auto Industry23 Technology forecasting in Bio-industry55 Web Mining based Patent Analysis4 Green Car Trend Analysis25 Analysis of MEMS-related technologies41 Analysis of China versus US patents56 Patent analysis Biotech Industry57 Citation analysis41,33 It indicates countries with more forward citation have a stronger technological impact. Having more backward citation depicts that countries have mature technologies. Patent citation determines intensity of technology , linkage between technology and industry20 Patent analysis for technology forecasting5 Silicon Cell Trend Analysis15 Inventory management with Patent analysis17 Integrating patent family and patent citation27 The linkage between industries and technologies40 Biotech industry analysis57 Analysing industry convergence60 Summary, title, claim No. of patents issued, patents filed, assignee, IPC no, citation, patent no. Title, abstract, citation, publishing authority, pub. Date Patent no, Citation, Publication date USPTO database USPTO database Neural Network Statistical analysis Clustering Statistical analysis Statistical analysis Statistical analysis Nonlinear Regression, Correlation, t-test Statistical analysis done Statistical analysis done TRIZ parameter chart method Statistical analysis done Statistical analysis done Statistical analysis J INTELLEC PROP RIGHTS, JULY 2016 220 very important to identify the particular technologies which are of commercial interest. This analysis helps to evaluate the patent portfolios using time and magnitude indicator.63 The relevance of a particular patent with current prospective can be interpreted with citation analysis. Table 4 manifests that researchers have widely used statistical analysis, but as data mining techniques (Table 5) provide more knowledge of hidden patterns that exists in dataset hence the choice of suitable data mining technique to enhance the overall quality of analysis. The more advanced approaches apply nonlinear models based mainly on artificial neural networks (NNs), support vector machine (SVM), and other machine learning methods.25,32,36,39 It reports that NNs are nonlinear structures, capable of taking into account more complex relations existing among the analysed data, thus making prediction more accurate.13,25 Usage of Natural language processing and other text mining approaches further reduces the efforts involved in searching the required patent. Almost all datasets classify the patent as it reduces the searching task and also makes the understanding of patent simpler. Discussing the patent dataset after classification, will definitely reduce the complexity of the research paper and make it easier for inventor to understand the patent in its domain. Inventors are filing a patent in different agencies for various reasons may lead to duplication of the database, the patent family analysis may help to handle these redundant datasets. Table 5—Technology Classification Analysis Type of analysis Result of analysis Purpose served to organization Research area where analysis technique is applied IPC Analysis Organizes groups or subgroups into categories, making it easy for identification It can help enterprises understand the major categories for technological classification for planning R&D activities. Inventory management with Patent analysis17 It helps in removing redundant patents in specific technology domain. It is vital for organizations doing R&D work. It helps in knowing redundant research works and thus, provides decision support while allocating R&D budget. Family Analysis57 On the basis of certain attributes like- IPC category, title and abstract of the patent document, classification of patents is done, to identify interesting correlations. Time reduced in finding patents that are similar in context to particular technologies. Mining technique Patent no., Direct citation, indirect citation. Classify the research paper IPC Code, according to IPC classification relevance score to understand the research trend in particular technology67 Analysis of innovative IPC code, “patent rehabilitation technologies52 no Patent no, A Model for Measuring the R&D Projects Similarity “abstract, IPC no Using patent information18 Statistical analysis done Patent analysis by integrating patent family and patent citation27 Citation, title, abstract, “IPC code, assignee/ applicant name LexisNexis Database Text mining Multilingual Patent Text-Mining Approach37 Patent no, type of patent, the patent age, number of claims. IPC code, abstract, claim Ontology-based patent network analysis48 Patent based analysis of technologies52 Technology forecasting in Bio-industry55 Naïve Bayes Classifier, selforganizing mapping algorithm Hierarchical clustering (single link) and self-organizing Maps (SOM) methods (for text mining) Statistical analysis Patent no, IPC code Hierarchical Patent no, IPC code, title, abstract clustering Patent no, IPC Statistical analysis code Patent priority network26 Cluster Analysis Attributes / Database chosen Forecasting Dental Implant Technologies Using Patent Analysis14 K-NN based for classification of document and SVM for retrieval SLINK hierarchical clustering KNN (K-Nearest neighbors) Statistical analysis SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET In a recent study Y.N. Choi et al.66 has discussed about technology convergence to a common unity of technology. It is the need of the day as most of the technology are heterogeneous in nature and cover a wide range of interdisciplinary domain. The originality analysis (Table 6) is a complex mechanism as it involves a high level of technical knowledge to justify how new invention is different from the previous one. This study is dependent on the dataset and in many database the older inventions are not stored which may create unnecessary conflicts. The patent search is widely conducted before starting any R&D based project.64 The objective of landscape analysis is providing a complete innovation management. An effort is required to understand all the social, economic and requirement of industrial partner for any new emergent technology so that new technology can reach the mass. Understanding the emergent technology with respect to countries, understanding how new technologies differ from existing patents are the objective of the study. Also for making emergent 221 technology to be practical rather just a theoretical research, it is required for analysis, the competitive business rivals and also assignee evaluation. Conclusion The patent dataset is a large source of information which has both technical as well as commercial value. Apart from utilizing the patent records for just checking the novelty of new research work, patent mining is highly advantageous to perform complete landscape analysis of the patent data set. This analysis helps to analyse the patents across a particular technology field which includes the study of scientific literature, its changing importance with respect to time and market, thus contributes to forecast the industry requirement which is very useful input in business intelligence. This study may help to identify the hotspots and get an opinion about validity and other legal issues related to patent. There are various approaches in which the patent data are searched and whole reservoir is analysed using suitable data mining and statistical techniques. Table 6—Originality/R&D Support Analysis Type of analysis Result of analysis Purpose served to organization Research area where analysis technique is applied Attributes/Database Mining technique chosen A literature review on the state-of-the-art in patent analysis2 Risk analysis of patent infringement68 Patent number, Hierarchical search period, keyword vectors, assignee etc. tree matching Metrics were algorithm proposed which Statistic and data involve the cost mining involved in litigation, estimated settlement and judgment Similarity Patent records to calculation using evaluate the text mining and semantic similarity developing product for technologies patent map Analyses similarities among patents. A patent having content matching with already filed/published patents, falls under Infringement. This identification helps organizations in approaching only novel works, eliminating infringed works in the research domain. Novelty Analysis57 It identifies novel work in order to determine the quality of patents. Organizations can easily A literature review on the identify new research state-of-the-art in patent works done in particular analysis2 domains. This directly enhances the economic position of organizations and country in a broader sense. Patent no, applicant Subject –action – name, filing date, object based IPC, citation etc. similarity matrix generation White Spot Analysis Extraction and analysis of problems and solutions. It helps users to Software-based Patent understand white spots Analysis9 and hence, provide direction in R&D work. Researchers look for problems with a better solution. Patent assignee, Text mining application date, abstract, publication number, claims, detailed problems etc. Infringement Analysis Use semantic analysis to design a product patent amp that will help to identify any infringement of existing patents69 J INTELLEC PROP RIGHTS, JULY 2016 222 This paper presents the different ways in which patent data set is analysed literature. The different analyses were grouped in four main categories namely technology road-mapping analysis, technology classification analysis, originality/R&D support analysis and technology distribution analysis (Table 7). Further analysis is done to review the attributes and techniques to complete analysis. One of the major concerns with landscape analysis is to design a visualization tool that project the multiple analysis discussed in the paper. Developing such a tool will help technocrats and inventors will get an Table 7—Technology Distribution Analysis Type of analysis Result of analysis Purpose served to organization Research area where analysis Attributes / technique is applied Database chosen Mining technique Technology distribution by country It helps to know which country has registered more patents in which year with respect to other countries. Search time reduced by directly finding patents from database of countries registering largest patent documents in the respective technology / domain. Empirical Research on Technology Share based on Hybrid Approach31 Ordinary least squares (OLS) regression This kind of analysis helps in predicting patent trends which make research work of organization easier and cost effective. Patent information analysis “for Company10 Discovering competitive intelligence by mining “changes in patent trends47 Association Rule Analysis Analyses patents which are related to each other Technologies of Low “Emission Vehicle34 Fuzzy inference system for technological, strategic planning50 Rival/Competitor analysis Assignee Analysis58 Patent number, classification, inventor name, assignee name, citation IPC no, assignee, patent number Statistical analysis Patent no, IPC code Apriori algorithm Assignee, IPC code, Association rule and four patent mining indicators: citation index, originality, generality, and technology cycle time Patent Quantity Kohonen (PQ), Revealed learning Patent Advantage algorithm and (RPA), Patent first nearest Activity (PA), neighbour Be Cited Rate heuristic (BCA), and algorithm Relative Citation Index (RCI) It helps in knowing the network among countries by taking into account the proportion of international research work of each country This mechanism is vital Research on Technology for analysing technical Selection for Enterprises16 intelligence, scope of technology and R&D competence Analysis of China vs US patents in NEDD Race56 IPC no, assignee, patent count etc. Statistical analysis Patent no, IPC code, summary, title Statistical analysis This helps in knowing which organization possess most patent applications relative to the technology The technological innovative capacity of a corporation can be evaluated Green Car Trend Analysis25 Search Period, Item, Country, No. of Patents Statistical analysis Silicon Cell Trend Analysis15 Analysis of China vs US patents in NEDD Race56 Summary, Title, Claim USPTO and Chinese patent database Bibliography, assignee, inventor, abstract, annotation, patent family, citation, citation assignee, and citation inventor Statistical analysis Statistical analysis Web Mining based Patent Analysis4 Statistical analysis SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET overview of the need of invention in particular technology, also it help to develop a business model that supports the inventor get a commercial value for their creation. This business model will assess collaborators to identify the inventor and vice versa for joint development of product or attaining suitable licenses etc. 14 15 References 1 2 3 4 5 6 7 8 9 10 11 12 13 Candelin- Palmqvist H, Sandberg B & Mylly U M, Intellectual property rights in innovation management research: A review, Technovation, 32 (9) (2012) 502-512. Abbas A, Zhang L & Khan S U, A literature review on the state-of-the-art in patent analysis, World Patent Information, 37 (2014) 3-13. Hall B, Jaffe A & Trajtenberg M, The NBER patent citations data file: Lessons, insights and methodological tools (NBER working paper no. 8498) 2001. Liu Z & Zhu D, Web Mining Based Patent Analysis and Citation Visualization, In Web Mining and Web-based Application, 2009. WMWA'09. Second Pacific-Asia Conference on IEEE (June, 2009) 19-23. Yoon B & Lee S, Patent analysis for technology forecasting: Sector-specific applications. In Engineering Management Conference, 2008. IEMC Europe 2008. IEEE International, IEEE (June, 2008) 1-5. Kang D S, Nah I W, Chun H Y, Shin Y S, Lee D H. & Chung Y C, A case study of using informetric methods in R&D strategic planning and research performance analysis in KIST. In Management of Engineering & Technology, 2009. PICMET 2009, Portland International Conference on IEEE (August, 2009) 158-172. Moehrle M G, What is TRIZ? From conceptual basics to a framework for research, Creativity and Innovation Management, 14 (1) (2005) 3-13. Moehrle M G, Walter L, Bergmann I, Bobe S & Skrzipale S, Patinformatics as a business process: A guideline through patent research tasks and tools, World Patent Information, 32 (4) (2010) 291-299. Siwczyk Y, Warschat J & Spath D, Software-based patent analysis: How to leverage a text-mining tool, In Technology Management for Emerging Technologies (PICMET), 2012 Proceedings of PICMET'12: IEEE (July, 2012) 1006-1013. Lucheng H, Yanhua Y & Zhihua Z, A study on the application of data mining in the patent information analysis for company, In Education Technology and Computer Science (ETCS), 2010 Second International Workshop on IEEE (March, 2010) (1) 618-622. Lee S, Yoon B, Lee C & Park J, Business planning based on technological capabilities: Patent analysis for technologydriven road mapping, Technological Forecasting and Social Change, 76 (6) (2009) 769-786. Mattas N & Mehrotra D, Comparing data mining techniques for mining patents, In Advanced Computing & Communication Technologies (ACCT), 2015 Fifth International Conference on IEEE (February, 2015) 217-221. Widodo A, Fanani M I & Budi I, Enriching time series datasets using nonparametric kernel regression to improve 16 17 18 19 20 21 22 23 24 25 26 27 223 forecasting accuracy, In Advanced Computer Science and Information System (ICACSIS), 2011 International Conference on IEEE (December, 2011) 227-232. Chang S W, Trappey C V, Trappey A J & Wu S C Y, Forecasting dental implant technologies using patent analysis, In Management of Engineering & Technology (PICMET), 2014 Portland International Conference on IEEE, (July, 2014) 1483-1491. Suh M H, Kwon Y I & Lee I H, Using patent data to analyze the technological trends of the silicon solar cell, In Digital Content, Multimedia Technology and its Applications (IDCTA), 2011 7th International Conference on IEEE (August, 2011) 177-180. Yan-ling W, Research on technology selection for enterprises with tools of patent analysis, In Management Science and Engineering (ICMSE), 2012 International Conference on IEEE (September, 2012) 1651-1657. Shen C W & Cheng C C, Assessing the data processing innovations for inventory management with patent analysis, In Information Science and Service Science (NISS), 2011, 5th International Conference on New Trends in IEEE (October, 2011) (2) 324-327. Kim J B & Byun J W, A model for measuring the R&D projects similarity using patent information, In 2014 International Conference on Information Science and Applications (ICISA) IEEE (May, 2014) 1-3. [19]Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22 (2) 207-216. Rodrigues M M & Sacks L, A scalable hierarchical fuzzy clustering algorithm for text mining, In Proceedings of the 5th international conference on recent advances in soft computing (December, 2004) 269-274. Jeong Y & Yoon B, Technology road mapping based on patent citation network considering technology life cycle, In Technology Management Conference (ITMC), 2011 IEEE International IEEE (June, 2011) 731-738. Wang W M & Cheung C F A Semantic-based Intellectual Property Management System (SIPMS) for supporting patent analysis, Engineering Applications of Artificial Intelligence, 24 (8) (2011) 1510-1520. Jing Z, Patent analysis on new energy auto industry, In Computer Science and Information Processing (CSIP), 2012 International Conference on IEEE (August, 2012) 930-932. Wang H & Huang M, Modularity and discontinuous innovation: A patent data analysis in automobile industry, In Technology Management for Global Economic Growth (PICMET), 2010 Proceedings of PICMET'10: IEEE (July, 2010) 1-7. Kwon Y I, Green car trend analysis using patent information, In Information Science and Digital Content Technology (ICIDT), 2012 8th International Conference on IEEE, (June, 2012) (2) 344-347. Chang Y H, Lai K K, Yang W G & Yang M C, Note on a heuristic procedure to identify the most valuable chain of patent priority network, International Journal of Innovation and Technology Management, 12 (03) (2015) 1540002. Lai K K, Yang K O, Weng C S & Yang W G, Patent analysis of technology-performance by integrating patent family and patent citation, In Management of Engineering & 224 28 29 30 31 32 33 34 35 36 37 38 39 40 J INTELLEC PROP RIGHTS, JULY 2016 Technology, 2009. PICMET 2009, Portland International Conference on IEEE (August, 2009) 1432-1446. Kovács F, Legány C & Babos A, Cluster validity measurement techniques, In Proceedings of the 6th International Symposium of Hungarian Researchers on Computational Intelligence, Budapest (November, 2005) 18-19. Karaolis M, Moutiris J A, Papaconstantinou L & Pattichis C S, Association rule analysis for the assessment of the risk of coronary heart events, In Engineering in Medicine and Biology Society, 2009, EMBC 2009, Annual International Conference of the IEEE (September, 2009) 6238-6241. Chang M, Quantum computation patent mapping-a strategic view for the information technique of tomorrow, In Services Systems and Services Management, 2005. Proceedings of ICSSSM'05. 2005 International Conference on IEEE (June, 2005) (2) 1177-1181. Huang L & Li J, Empirical research on technology share based on hybrid approach for morphology analysis and conjoint analysis of patent information, In Computer Modelling and Simulation, 2009. UKSIM'09, 11th International Conference on IEEE (March, 2009) 293-298. Zhai D, Kang N & Yang Y, Research on USPTO Patent Information Acquisition System Based on Two-Tiered Scheduling Multi-Agent System, In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on IEEE (November, 2010) 1-4. Tsai B H, Analysis of patent and profitability in Taiwan semiconductor firms, In Technology Management for Global Economic Growth (PICMET), 2010 Proceedings of PICMET'10: IEEE (July, 2010) 1-6. Ranaei S, Karvonen M, Suominen A & Kassi T, Forecasting emerging technologies of low emission vehicle, In Management of Engineering & Technology (PICMET), 2014 Portland International Conference on IEEE (July, 2014) 2924-2937. Kaur M & Sapra R, Classification of patents by using the text mining approach based on PCA and logistics, International Journal of Engineering and Advanced Technology, 2 (4) April 2013. Kongthon A, A Text Mining Framework for Discovering Technological Intelligence to Support Science and Technology Management (Doctoral dissertation, Georgia Institute of Technology) 2004. Lee C H, Yang H C, Wu C H & Li Y J, A multilingual patent text-mining approach for computing relatedness evaluation of patent documents, In Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIHMSP'09, Fifth International Conference on IEEE (September, 2009) 612-615. Koch S, Bosch H, Giereth M & Ertl T, Iterative integration of visual insights during scalable patent search and analysis, Visualization and Computer Graphics, IEEE Transactions , 17 (5) (2011) 557-569. Chen Y, Spangler S, Kreulen J, Boyer S, Griffin T D, Alba A, ... & Kieliszewski C, SIMPLE: A strategic information mining platform for licensing and execution, In Data Mining Workshops, 2009, ICDMW'09, IEEE International Conference on IEEE (December, 2009) 270-275. Choi C W, Shin J S, Yoon B G, Lee W Y & Park Y T, On the linkage between industries and technologies: Patent 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 citation analysis, In Engineering Management Conference, 2004, Proceedings, 2004 IEEE International, (2) (October, 2004) 576-578. Meister C & Meister M, Trends and trajectories in MEMSrelated technologies: an analysis on the basis of patent application data, In Semiconductor Conference, 2005, CAS 2005 Proceedings, 2005 International, 1 (October, 2005) 187-190. De Nooy W, Mrvar A & Batagelj V, Exploratory social network analysis with Pajek, Cambridge University Press, 27, 2011. Lee C, Song B & Park Y, How to assess patent infringement risks: A semantic patent claim analysis using dependency relationships, Technology Analysis & Strategic Management, 25 (1) (2013) 23-38. Yu W D & Lo S S, Patent analysis-based fuzzy inference system for technological strategy planning, Automation in Construction, 18 (6) (2009) 770-776. Kohonen T, Self-organization and associative memory, Springer Science & Business Media, 8, 2012. Lin C T & Lee C G, Neural-network-based fuzzy logic control and decision system, Computers, IEEE Transactions, 40 (12) (1991) 1320-1336. Shih M J, Liu D R & Hsu M L, Discovering competitive intelligence by mining changes in patent trends, Expert Systems with Applications, 37 (4) (2010) 2882-2890. Shih M J & Liu D R, Patent Classification Using OntologyBased Patent Network Analysis, In PACIS (July, 2010) 95. Kim Y G, Suh J H & Park S C, Visualization of patent analysis for emerging technology, Expert Systems with Applications, 34 (3) (2008) 1804-1812. Yu W D & Lo S S, Patent analysis-based fuzzy inference system for technological strategy planning, Automation in Construction, 18 (6) (2009) 770-776. Segev A & Kantola J, Identification of trends from patents using self-organizing maps, Expert Systems with Applications, 39 (18) (2012) 13235-13242. Sani E, Frisoli A & Bergamasco M, Patent based analysis of innovative rehabilitation technologies, In Virtual Rehabilitation, 2007 (September, 2007) 96-101. Comanor W S & Scherer F M, Patent statistics as a measure of technical change, The Journal of Political Economy, 1969, 392-398. Narin F, Patents as indicators for the evaluation of industrial research output, Scientometrics, 34 (3) (1995) 489-496. Jun S, Park S S & Jang D S, Patent management for Technology Forecasting: A case study of the Bio-Industry, Journal of Intellectual Property Rights, 17 (2012) 539-546. Guo Y, Porter A L, Zhou X & Robinson D, A comparative analysis of China vs. US: Two important players in the Nano-enhanced Drug Delivery (NEDD) Race, In Technology Management in the IT-Driven Services (PICMET), 2013 Proceedings of PICMET'13 (July, 2013) 2575-2589. Wang M Y, Lo H C, Liao Y Y & Lin P Y, Determinants of patent renewal decisions by patent indicators and social network analysis: The case of the biotech industry in Taiwan and Korea, In Technology Management for Emerging Technologies (PICMET), 2012, Proceedings of PICMET'12: IEEE (July, 2012) 1060-1065. SABITHA et al.: LANDSCAPE ANALYSIS OF PATENT DATASET 58 Harhoff D, Scherer F M & Vopel K, Citations, family size, opposition and the value of patent rights, Research Policy, 32 (8) (2003) 1343-1363. 59 Jung S, Importance of using patent information. WIPO— Most intermediate training course on practical intellectual property issues in business, organized by the World Intellectual Property Organization (WIPO), Geneva, 2003. 60 Karvonen M & Kässi T, Patent citation analysis as a tool for analysing industry convergence, In Technology Management in the Energy Smart World (PICMET), 2011, Proceedings of PICMET'11, (July, 2011) 1-13. 61 Suh J H & Park S C, A new visualization method for patent map: Application to ubiquitous computing technology, In Advanced Data Mining and Applications, Springer Berlin Heidelberg, 2006, 566-573. 62 Shrivastava S, Verma H N & Saha R, Strategies for technical assessment via patent analysis − A case study, Journal of Intellectual Property Rights, 20 (2015) 104-111. 63 Eusebi C A & Silberglitt R, Identification and analysis of technology emergence using patent classification, RAND NATIONAL DEFENSE RESEARCH INST SANTA MONICA CA, 2014. 64 Manap A Nazura et al., Protecting R & D inventions through intellectual property rights, Journal of Intellectual Property Rights, 21 (2) (2016) 110-116. 225 65 Singh V, Chakraborty K & Vincent L, Patent database: Their importance in prior art documentation and patent search, Journal of Intellectual Property Rights, 21(1) (2016) 42-56. 66 Choi J Y, Jeong S & Kim K, A study on diffusion pattern of technology convergence: Patent analysis for Korea, Sustainability, 7 (9) (2015) 11546-11569. 67 Nanba H & Takezawa T, Classification of research papers into a patent classification system using two translation models, In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Association for Computational Linguistics, (August, 2009) 27-35. 68 Micaelian F, Huey M, Schank R & Prasad S, Patent infringement risk exposure analysis, Nouvelles-Journal of the Licensing Executives Society, 46 (4) (2011) 334. 69 Park I & Yoon B, A semantic analysis approach for identifying patent infringement based on a product–patent map, Technology Analysis & Strategic Management, 26 (8) (2014) 855-874. 70 Mattas N, Kalra P & Mehrotra D, Agglomerative hierarchical Clustering technique for partitioning patent dataset, In Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), 2015 4th International Conference on IEEE (September, 2015) pp. 1-4.