Download 3.Data mining

Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20 Information Sciences, vol.179, no.12, pp. 2002-2013, 2009 南台科技大學資訊工程系 Outline 2 1 Introduction 2 Literature review 3 Data mining 4 Experiments and results 5 Conclusions 1.Introduction Calibrating speed–density relationship parameters using data mining techniques, and proposes a novel hierarchical clustering algorithm based on K-means clustering Mesoscopic simulators aim to model either a single vehicle or a group of vehicles in order to depict any responsive actions of different vehicles to the traffic information. 3 2.Literature review  In the mesoscopic models which are used in DTA systems k  k 0, v 0,   vu    k  k0   )  , others, v 0 1  ( kjam    vu v0 0 k 0 4 kjam k 3.Data mining  LWR(Locally weighted regression) Step 1: Take x (densities or both densities and flows make up the x) as a center to form a space. The width of the space isdescribed by the q = fn Step 2: Define the weights of all points in specific sectors. The weight of any point is the height of a weight function. The common weight function is selected: (1  u 3 )3 , 0  u  1, W (u)   otherwise, 0, The weight for the observation (xi, yi) is: wi  W ( p( x, xi) d ( x)), Step 3: Fit a polynomial for each point in an independent variable space by using the weighted least square algorithm Step 4: Acquire the value of yi. 5 3.Data mining x q=fn (1  u 3 )3 , 0  u  1, W (u)   otherwise, 0, (1  u 3 )3 , 0  u  1, W (u)   otherwise, 0, wi  W ( p( x, xi) d ( x)), p(x,xi)＜d(x)→W(u)=(1-u3)3 p(x,xi) ≧d(x)→W(u)=0 6 3.Data mining  Agglomerative hierarchical clustering algorithm based on Kmeans  The proposed algorithm can be summarized as follows Step 1: Use K-means to cluster the sensor data which is taken as training instances, and these k clusters are named as constraint- clusters. Densities, flows and speeds contain abundant information about the traffic status, so they are chosen to cluster. 7 3.Data mining K-means is executed in the following steps:  1. Randomly select k clustering centers from n training cases.  2. Find the nearest clustering center to each xi (density or both density and flow), then put xi in it.  3. Compute the objective function E. If the value of E is unchanged, we should consider that the results of the clustering are also unchanged. Then the iteration stops.  4. Otherwise, it will return to 2. 8 3.Data mining k=3 年齡與平均月收入散佈圖 50 40 40 平均月收入(千) 平均月收入(千) 年齡與平均月收入散佈圖 50 30 20 10 0 30 20 10 0 0 10 20 30 年齡 40 50 60 0 10 20 40 50 60 50 60 (b) (a) 年齡與平均月收入散佈圖年齡與平均月收入散佈圖 50 50 40 40 平均月收入(千) 平均月收入(千) 30 年齡 30 20 10 0 30 20 10 0 0 10 9 20 30 年齡 (c) 40 50 60 0 10 20 30 年齡 (d) 40 3.Data mining 年齡與平均月收入散佈圖 50 40 40 平均月收入(千) 平均月收入(千) 年齡與平均月收入散佈圖 50 30 20 10 0 30 20 10 0 0 10 20 30 年齡 (e) 10 40 50 60 0 10 20 30 年齡 (f) 40 50 60 3.Data mining  Step 2:For each constraint-cluster, use the agglomerative hierarchical clustering to build a clustering tree.  The basic steps of the complete-link algorithm are: 1. Place each instance in its own cluster. Then, compute the distances between these points. 2. Step thorough the sorted list of distances, forming for each distinct threshold value dk a graph of the samples where pairs of samples closer than dk are connected into a new cluster by a graph edge. If all the samples are members of a connects graph, stop. Otherwise, repeat this step. 3. The output of the algorithm is a nested hierarchy of graphs, which can be cut at the desired dissimilarity level forming a partition (clusters) identified by simple connected components in the corresponding subgraph. 11 3.Data mining 12 3.Data mining 樣本1 13 樣本2 樣本3 樣本4 樣本5 樣本6 樣本7 3.Data mining  Step 3: These k clustering trees are combined as an integrated clustering tree by using the complete-link algorithm. After all samples are clustered, a separate local regression will be run for the observation in each cluster.  Step 4: The new densities and flows are classified to the most appropriate cluster by using k-nearest neighbors. The k- nearest neighbor sorter uses Euclidean distance to search k densities and flows samples completed clustering. 14 4. Experiments and results  The sensor data are preprocessed to eliminate erroneous data and repair missing ones.  Step1: Define data in some cycles as data it is in some phase and scan the sending time of data one by one to find out the missing ones. Check it is erroneous or not according to the criteria in Table. Step 2: Repair the missing data and the erroneous data. The average value in the neighboring phase is used to repair these data. 15 4. Experiments and results 16 4. Experiments and results 17 4. Experiments and results Estimated speed by the classical speed–density relationship 18 4. Experiments and results 19  Table presents the RMSPE obtained by each approach. 20 5. Conclusions  The proposed method overcomes the limitations of classic models of speed–density relationships.  KHCA obtained the highest precision in capturing traffic dynamics compared to other existing clustering algorithms. 21 南台科技大學資訊工程系

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 3.Data mining