Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014 ICCIN-2K14 | January 03-04, 2014 Bhagwan Parshuram Institute of Technology, New Delhi, India Border Security up Gradation Using Data Mining S.S. Prasad, Sonali Mathur, Sonal Singh Secure and efficient border security is best achieved through an integrated and intelligence-led approach with effective interventions pre-arrival, at the border and incountry. The best border security is delivered through integrated technology enabling effective sharing of data, assured identity management and strong inter-agency cooperation and related processes [3]. While monitoring millions of border crossings each year, Border Security personnel are required to balance operational efficiency and security concerns. As vehicles or individuals enter the country, it is necessary to record each license plate and each passport number with a crossing date and time. It is also necessary to search vehicles and individuals for drugs and other contraband. The value of the collected data could be enhanced by combining it with the millions of relationships recorded between people, places, vehicles, and incidents in local law enforcement records management systems, but usefully integrating the data is a difficult task. In spite of an obvious connection between criminal and border crossing activities, such data is only combined in special cases when investigators call each other and ask for help. This kind of sharing is timeconsuming and therefore not very frequent [8]. Data sharing has important implications both for homeland security and local law enforcement. Suspicious activity and other reports from locations near critical infrastructure sites can have national security implications. Local law enforcement officials may have data related to terrorists without knowing that which individuals are terrorists. Border agencies are interested in certain individuals but have no efficient way to check with local authorities. In this research work a software tool has been developed that can be delivered at the border check post. The main objective of this research work is to design a system that can fill the void between local law enforcement departments and border agencies by data sharing, using data mining techniques for collecting information of crimes and criminals from various departments, integrating that information and make it available for all the departments and border check posts in order to check it against every passenger and vehicles crossing the border. This research work develops a methodology that helps in identifying important investigative leads with the help of Criminal Activity Network so as to uncover important and unnoticed patterns and links between people, vehicles, criminal incidences and border crossing activities. This system also helps to outline a graph of criminal activity rate so as to draw more attention at borders in the month having highest percentage of criminal activities Abstract— Homeland security concerns have identified border and transportation security as critical areas. The strategy for homeland security calls for enhancement of the security checks at borders without inordinate delay. This system is designed for creating a Criminal Activity Network (CAN)s, which is a framework of effectively integrated information collected from local, state and central sources, and for outlining a crime rate graph month wise, all using Data Mining as a useful tool for pattern analyzing, tracking, detecting and preventing terrorism. This work develops a methodology for identifying important investigative leads by analyzing relationships between people, vehicles, criminal incidents and border crossing activities and thus sharing as well as updating the information among all departments. The experimental results showed our system performs sufficiently well to be used in real world settings, including as an aid for counter terrorism, which is our overall aim. Keywords— Fuzzy C- Mean algorithm, Fuzzy Clustering, Clustering, Data mining, Pattern analysis. I. INTRODUCTION Data mining has become one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining can be a potential means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records. Data mining is defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Once these patterns are identified they can be used to enhance decision making in a number of areas. “Data mining” is one technique that has significant potential for use in countering terrorism. Data-mining and automated data-analysis techniques are not new; they are already being used effectively for different applications. Recently there has been much interest on exploring the use of data mining for counter terrorism applications. For example, data mining can be used to detect unusual patterns, terrorist activities and fraudulent behavior. Data mining is increasingly becoming a useful tool for tracking, detecting and preventing terrorism. Protecting our borders from the illegal movement of weapons, drugs, contraband, and people, while promoting lawful entry and exit, is essential to homeland security, economic prosperity, and national sovereignty. II. LITERATURE REVIEW Manuscript received March 2014 S.S. Prasad, Department of MCA, Department of Computer Science, Department Of Computer Science, J.S.S Academy of Technical Education, Noida, Mahamaya Technical University C-22, Sector62,Noida , Uttar Pradesh, India. Sonali Mathur, Department of MCA, Department of Computer Science, Department Of Computer Science, J.S.S Academy of Technical Education, Noida, Mahamaya Technical University C-22, Sector62,Noida , Uttar Pradesh, India. Sonal Singh, Department of MCA, Department of Computer Science, Department Of Computer Science, J.S.S Academy of Technical Education, Noida, Mahamaya Technical University C-22, Sector62,Noida , Uttar Pradesh, India. A. Data Collection, Integration and Sharing For strengthening border security the most important things are data collection, data integration and data sharing. To identify whether a person or vehicle is involved in any illegal activities and whether to allow them to cross the border or not, it’s very important to utilize the information from multiple sources. Firstly it is hard to collect this data from several departments since all the data of criminals and vehicles involved in criminal activities, is highly 55 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. Border Security Up Gradation Using Data Mining sensitive and confidential. Secondly combining that collected data from independently- developed sources is a challenging task [10]. Commonly acknowledged problems in integrating data collected from several departments include [10][8]: (1) Name Differences: same name, different entity, (2) Missing Data: incomplete sources or different data available from different sources, and (3) Object Identification: no global ID values and no interdatabase ID tables. After successfully collecting and integrating data of criminals and vehicles, most important task is to make a system which can make integrated data available for all the departments i.e. making a common management system which can be accessed by all departments, problems acknowledged here is that such data is very vast and thus making a system which can efficiently work on large data and keep all data safe is a robust task. By utilizing the information available in the system, it is possible to identify, whether individual or vehicle crossing the border is having any criminal record and thus whether to allow them to cross the border or not. jurisdictions with multiple object and relation types, (2) high volume but relatively simple supplementary data to enhance CAN information content, and (3) case specific or ad-hoc query-specific data expressing important relationships or features. Given these classes of data, integration should proceed in three steps: schema-level transformation of base data, entity-matching to align objects across data sets, and normalization and matching of supplementary data. Base data should be semantically aligned and mapped to support CAN generation. When there is a lot of overlap between datasets, there is a lot of value to be gained. This is a classic data integration task requiring reconciliation of legacy data into a common schema and instance-level entity matching. Police records are the prime example of this kind of data because multiple jurisdictions keep similar types of data about an overlapping set of objects. Standardized data dictionaries may eventually encourage development of interoperable systems, but for now data sharing initiatives generally begin by mapping to a global schema and then move on to entity matching. Base data integration should be a repeatable transformation process so that the combined datasets can be refreshed frequently. Entity matching in this domain will tend to rely on heuristics. Primary objects will include people, location and vehicles. Input from domain experts suggests an initial match for people using Passport number. Other alternatives may be useful but are not consistently available and vehicles can be matched by license plate and/or vehicle identification number (VIN). License plate data has some interesting and useful characteristics. Plate numbers can be recorded in an unobtrusive fashion and, while criminals frequently avoid identification by lying about their names in routine interactions with law enforcement officials, license plate numbers are directly observed. In addition, vehicles used by criminals are often registered in someone else’s name. Even if a criminal uses an alias in incidents involving a particular vehicle, the resulting person-vehicle data implicitly links the incidents. License plate numbers also are occasionally transferred to different cars: illegally when a car or plate is stolen or legally when it is sold. For many applications these characteristics make plate numbers more useful than vehicle identification numbers. In addition to the base data, investigators use many additional supplementary or query-specific information resources to identify criminals’ activities and associations. This additional data may not be readily available for a variety of reasons. • Specialization: Frequently, useful data is not directly accounted for in the global schema. For example, police systems do not usually store border-crossing events. • Availability: Frequently, information like jail visitation histories and motor vehicle registration records are important and could be, but haven’t been, included in an agency’s data system. • Sensitivity: Investigators do not want many bits of information included in widely used sources. In some cases it is feared that information would be leaked to the criminals involved. In some cases data has been demanded legally and can be used only in a single investigation. • Contextual usefulness: Background information and rumors identify some relationships between individual criminals, for example, “Ashok and Karan are brothers” or “Ram and Raghav were friends in high school”. This kind of information is not collected in large quantities, applies only to specific cases, and should not be included in police records because of privacy and security concerns. B. Fuzzy C- Mean Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar in some sense to each other than to those in other groups (clusters) [8]. It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Clustering can be considered as the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Out of many algorithms of clustering we choose Fuzzy C-Mean (an overlapping clustering algorithm) algorithm for this research work. Since it gives best result for overlapped dataset and comparatively better than K-means algorithm. Unlike K-means where data point must exclusively belong to one cluster centre, here data point is assigned membership to each cluster centre as a result of which data point may belong to more than one cluster centre. Fuzzy C-Means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition. In fuzzy clustering (also referred to as soft clustering), data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster [9][10]. Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters. Using Fuzzy C-Mean algorithm system created Criminal Activity Network (CAN). C. Cross-Jurisdictional Integration Framework The key to the framework is identification of three classes of data: (1) base data with overlapping data from multiple 56 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014 ICCIN-2K14 | January 03-04, 2014 Bhagwan Parshuram Institute of Technology, New Delhi, India the authentic person at the border enters the ‘passport number’ for person and ‘vehicle number’ for vehicle in the software and then software verifies it with the available database. If a match is not found in the database, software will display “happy journey” message and new record gets registered in the border security database. If a match is found in the database, software will display “record found” message with providers information i.e. the department and place from where the record was found e.g. “Delhi criminal found call Delhi police” or “Delhi vehicle found call Delhi police” Once the criminal record is found, the border security’s authentic user retrieves criminal’s information including names, age, sex, address, driving license number, passport number, criminal activity they were involved in and their current status or vehicle’s information including vehicle number, chassis number, vehicle type, color, criminal activities they were involved in and their current status with the help of “manage” option provided in the software. After receiving all information regarding criminal or vehicle involved in criminal activities, authentic user updates the particular department from where the record was found using “send update” option which includes “caught at, passport number or vehicle number”. This update sent by Border Security agency will be notified to the Investigative Agency and Police department whenever they will access their respective login ids’. Depending upon all the data collected from both the departments (Police, Investigative Agency) and its own records and findings, this software will help authentic person of border security agencies to draw a Criminal Activity Network(CAN) using Fuzy C-Mean algorithm of Clustering with a “CAN” option provided in the software. CAN will help border security agencies in identifying important investigative leads by analyzing known relationships between people, vehicles, criminal incidents and border crossing activities. This software, with the help of CAN also facilitates authentic person of border security agencies to outline a graph representing percentage of criminal activities in each month using “Bar Chart” option provided in the software. III. RESEARCH DATASET A. Dataset For strengthening border security the most important tasks are data collection, data integration and data sharing. To identify whether a person or vehicle is involved in any illegal activities and whether to allow them to cross the border or not, it’s very important to utilize the information from multiple sources. In this research work information from many data sources has been utilized. These data sources are: Police record of criminals which includes names, age, sex, address, driving license number, passport number, criminal activity they were involved in and their current status. Record of vehicles involved in criminal activities which include vehicle number, chassis number, vehicle type, color, criminal activities they were involved in and their current status. Border crossing data records which includes information of all passengers and vehicles crossing the border. Since all this data which was required in this research work was highly sensitive and such data is confidential, so to carry out this research work we have created dummy databases. All the research work is based on dummy data base. B. Research Design Our framework allows for the inclusion of this kind of data by treating it as supplementary data or as query-specific data. A data source is appropriate for supplementary integration when (1) it is available in quantity and can be appropriately organized, (2) its sensitivity level allows for it to be shared across multiple investigations, and (3) it is contextually appropriate outside of a single investigation. Data can be used as supplementary data if it can be reduced to one or more lists of features or events directly associated with identifiable objects in the base data set. For example, border crossing records, or jail visitations can all be recorded associated with particular individuals already contained in a base data set of criminal incidents. Queryspecific data can be used to guide CAN building. For example if phone records indicate a suspect called 19 different people, a CAN network could query for relationships involving any of the 20 people to arrive at a more context-specific result without storing legally demanded data in the general investigation data set. Both supplementary and query-specific data has to be normalized and matched to the objects and entities from the base data. Fig. 1 shows the design of the software tool proposed in this research work. In this research work a software tool has been implemented which will be delivered to border management agencies which will enhance the security system at borders by discovering and revealing unusual patterns of individuals and vehicles involved in criminal activities. In addition to this, system enables sharing of information among border agencies and local law enforcement departments [1]. Thus it provides border agencies a more efficient way to decide whether to allow people or vehicle to cross the border or not. For security concerns, this software tool provides an authentic login id for border agencies that ensures only authentic users at the border can access the system. This software provides border agencies with the database that includes data collected and integrated from various departments. This database is easily accessible by all departments using their respective authentic login ids’ provided by the software. This database is dynamic in nature since it gets updated every time any changes are made by Police or investigative or Border security departments. Every time any person or vehicle crosses the border land, Figure 1 System Design C. Research Testbed We integrated Police Department (PD) and Investigation Department (ID) datasets with each other and with a dataset 57 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. Border Security Up Gradation Using Data Mining made available for this research by Border Security Department. The Border Security dataset was handled as a supplementary source that identifies border crossing vehicles and criminals. 1) Integrating the Data: Integration of the data sets proceeded in three steps: 1. PD/ID records were mapped to a common schema. 2. Cross-jurisdictional identities were matched. 3. Border Security Agency data was imported as a supplementary source. Because the PD and ID data come from cooperative agencies with closely related activities, it was appropriate to invest in a significant integration effort. People were matched on Passport Number. Each vehicle plate found in the police records was matched to the Border Security crossing data to establish a border crossing history and help identify potentially interesting vehicles. We began analyzing our data by evaluating the overlap between datasets. We expect that future versions of our system will allow inclusion of other supplemental and query-specific data such as family associations, phone records, and jail visitations. These sources could be added as part of an interactive link-chart drawing process. We began analyzing our data by evaluating the overlap between datasets. 2) CAN Evaluation: Next we measured the impact of crossjurisdictional information on activity networks traced in the criminal activity records[2]. We randomly choose people from a combined list of wanted suspects and known drug traffickers. We selected only people appearing in both Police and ID records (the large majority did appear in both data sets). We used the associations or links that occur when individuals or vehicles are listed together in an incident report to trace CANs. For each person we followed all known person-to-person associations and compiled a list of people. Links for each person in the new list were also followed. We then followed person-to-vehicle links to identify plate numbers. The result was a network of all people within two “hops” of the focus individual and all associated vehicles known to have crossborder activity. We created three networks for each person: one with links from the Police dataset, one with links found in the ID dataset, and one using the links in both datasets. This system reports the average number of associated people, associated vehicles, and associational links found for the selected individuals. Combining the data sets allowed us to connect more people and border crossing vehicles for this list of known criminals. Next we created a set of CAN visualizations for review by law enforcement personnel. We limited our networks to 50 nodes at most, because more nodes would overwhelm the viewer (Experimentation helps establish appropriate initial network sizes for display). While the size of a network may “converge” quickly if little information is available, networks frequently become unmanageable in just a few iterations. A variety of visual cues were used in this preliminary implementation. We differentiated entity types by shape, key attributes by node color, degree of activity as node size, connection source by link color, and some details in link text or roll-over tool tips. Figure 2 shows a network connecting narcotics traffickers and border crossing plates. • Associations found in the Police Department data are blue, ID links are green, and associations noted in both sets are red. • Node size indicates the extent of criminal activity. All incidents involving a person are counted. Violent, narcoticsrelated, and gang-related activities are counted twice. The activity scores are normalized to identify the relative activity levels of the individuals in the network. Future work will explore various methods of determining appropriate node size. • Nodes in the display are initially arranged by algorithm which “pushes” and “pulls” nodes using the links in the network. This algorithm needs further development but generally tends to place closely related people near each other in the display area. Choice of these features was: high levels of criminal activity and frequent border crossings signal useful investigative leads, crime types and person roles are important for association evaluation, and longer associative paths are less interesting. We implemented a system to enhance visualizations of Police and ID data with border crossing information and created networks of people and/or vehicles which all included at least one vehicle with recorded border crossings. Figure 2 and 3 were analyzed and comments which are summarized below. These networks can be easily analyzed to reveal links between relatively unknown subjects who are routinely crossing the border and known participants in Police and ID databases. The information generated by networks could subsequently be used to focus and direct law enforcement resources and investigations. Automatically generated activity networks for wanted individuals would save a lot of time and effort. Indications of cross-border activity would be very useful in focusing certain investigations. Correlating stolen vehicle reports with border crossings and targeted individuals could help in many investigations, but finding correlations manually was very time consuming. 58 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-4, Issue-ICCIN-2K14, March 2014 ICCIN-2K14 | January 03-04, 2014 Bhagwan Parshuram Institute of Technology, New Delhi, India Figure 2 Visualization of Criminal Activity Network Figure 3 A complex network connecting border crossing plates and known drug trafficker 59 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd. Border Security Up Gradation Using Data Mining make better operational decisions and the flow of information back to local agencies helping them pursuing related investigations. This software helps border check post authorities to get more accurate, and relevant information related to every vehicles and individual crossing the border, thus helping border check post authorities to more efficiently decide whether to allow a vehicle or individual to cross border or not. As we know that border and transportation security is very important for country’s security, in the future work more security features can be implemented. In the system developed, information from law enforcement sources has been utilized; hence in future information from different countries can be collected and utilized to enhance the security at borders. IV. EXPERIMENTAL RESULT A. Result based on information integration and sharing While conducting the research it was found that integrating the information from multiple sources and sharing it with all the departments, helps in identifying many vehicles and individuals crossing the border have some criminal incident associated with them. At the border check posts the system makes it easier to identify people/vehicle crossing with criminal record, people having illegal driving license or having illegal passport, vehicle with illegal registration, stolen vehicles, vehicle carrying any contraband item. It is a positive sign that this work will provide more relevant and accurate information to the border check post authorities. B. Result based on Fuzzy C- Mean algorithm The Fuzzy C-Mean algorithm works by assigning membership to each data point corresponding to each cluster centre on the basis of distance between the cluster centre and the data point. More the data is near to the cluster centre, more is its membership towards the particular cluster centre. The system is able to draw criminal activity network defining unknown relations among criminals and vehicles involved and recorded for different criminal activities but somehow linked with the same source of crime. ACKNOWLEDGMENT I would like to add few heartfelt words for the people who were part of this project in numerous ways, people who gave support in completing this report. Completion of the project would not have been possible without mutual efforts and integrated thoughts. First of all I wish to express my gratitude and hearty appreciation to my Project Supervisors and Mentors Prof.S.S Prasad and Mrs. Sonali Mathur for providing a self learning and congenial environment and for their continuous encouragement and enthusiastic guidance in the successful completion of the project work. They have been instrumental in providing constant encouragement, guidance and a stimulating and challenging environment in which I have been welcomed, inspired and assisted. I also extend my thanks to my internal panel members who also gave their valuable advices on how to improve the feasibility of the project during the viva-voce and presentations. I would also like to thank other faculty members and my colleagues here at JSSATE Noida for providing me the optimal learning environment, which enabled me to undertake this project. C. Pattern analysis The system is also able to evaluate a month wise crime rate graph shown in figure 4. As per records entered till then graph is showing more criminal activities at the border in the months of May, June, July, August, September and October whereas no criminal activities in rest of the months. REFERENCES [1] [2] Figure 4 Chart explaining criminal activities at borders each month [3] V. CONCLUSION AND FUTURE WORK In the proposed system emphasis has been placed on creating Criminal Activity Networks which are analyzed to reveal links between relative obscure subjects who are routinely crossing the border and known participants in criminal activities recorded in Police and ID departments. This system also helps border agents to plot a crime rate graph so as to draw more attention on borders in months having highest number of criminal behaviors. The information obtained with the help of CAN and crime rate graph could subsequently be used to focus and to direct local law enforcements’ and border security agents’ resources and investigations. Automatically generated activity networks for wanted individuals would save a lot of time since finding correlations manually was very time consuming. We conclude that the system worked successfully on the data that is used for the analysis. In this software integration and sharing of information has been encouraged to draw useful and unknown links between criminals and vehicles as compared to previous research work. It increases the information flow to the border agencies, allowing them to [4] [5] [6] [7] [8] [9] [10] 60 S. Kaza, P. J.-H. Hu, H.-F. Hu, H. Chen. "Designing, Implementing, and Evaluating Information Systems for Law Enforcement—A Long-Term Design-Science Research Project," Communications of the AIS, Volume 29, Issue 1, 2011. Chau, M., Schroeder, J., Xu, J., Chen, H., "Automated Criminal Link Analysis Based on Domain Knowledge," Journal of the American Society for Information Science and Technology, Volume 58, Number 6, Pages 842-855, 2007. S. Kaza, Y. Wang, and H. Chen, "Enhancing Border Security: Mutual Information Analysis to Identify Suspect Vehicles," Decision Support Systems, Volume 43, Number 1, Pages 199-210, 2007. Sinchun chen, fei-yue wang, Daniel zeng, “Intelligence and security informatics for homeland security:information, communication, and transportation”, IEEE transactions on intelligent transportation systems vol.5,no.4, 2004. Byron Marshall, Siddharth Kaza, Jennifer Xu, Homa Atabakhsh, Tim Petersen, Chuck Violette, and Hsinchun Chen, “cross jurisdictional crime activity networks to support border and transportation security”, 2004. Siddharth kaza, Tao wag, Hemanth Gowda, Hsinchun chen, “Target vehicle identification for border safety using mutual information” in proc. of 8th international conference on intelligent transportation system Vienna, Austrias, 2005. Jiawei Han and Micheline kamber, “Data Mining: Concepts and Techniques”, Second edition, Morgan Kaufmann, 2006. Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering Data”, Prentice Hall, 1988. A. Baraldi, and P. Blonda, "A survey of fuzzy clustering algorithms for pattern recognition” IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 1998. Frank Hoppner, Frank Klawonn, Rudolf Kruse, and Thomas Runkler, “Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition”, John Wiley & Sons Ltd., Chinchester New York Weinheim, 1999. Published By: Blue Eyes Intelligence Engineering & Sciences Publication Pvt. Ltd.