Download Fa: A System for Automating Failure Diagnosis

Fa: A System for Automating Failure Diagnosis Songyun Duan, Shivnath Babu, Kamesh Munagala Department of Computer Science, Duke University (ICDE09) 1 Outline  Motive  Introduction  Anomaly-based clustering  Diagnose(F,H)  Diagnose(F,L)  Fa to generate signature DB  Conclusion 2 Introduction  A tools that can diagnose the cause of failures quickly and automatically from system-monitoring data.  Fa uses monitoring data to construct a database of failure signatures against which data from undiagnosed failures can be matched.  Fa uses a new technique called anomaly-based clustering when the signature database has no highconfidence match for an undiagnosed failure 3 Anomaly-based clustering  Fa system for mining the large volumes of high-dimensional and noisy monitoring data generated by databases  Q=Diagnose (F, H ∪L ∪ U):  F is monitoring data from the system during the failure (or just before the failure in the case of a system crash).  H ∪ L ∪ U is the historic data collected so far 4 Anomaly-based clustering 5 Anomaly-based clustering 6 Diagnose(F,H) -- Anomaly-based clustering   anomaly-based clustering will place two instances into the same cluster iff they have similar deviations from F.  This strategy gives the right answer for the example in right figure, generating a single cluster for H, and linking the failure to attribute x only. 7 Diagnose(F,H) –Margin Classifiers  Diagnosis Vectors and Margin Classifiers -- Computing the Diagnosis Vector Fa processes a Diagnose (F, H) query by first clustering the healthy data H into a set of clusters C1, C2,... outputting the deviation: <W1,C1> <W2,C2>...  8  n that produces the maximum separation between C and F. This maximum separation is called the margin j1 wix j Diagnose(F,H) -- Margin Classifiers 9 Diagnose(F,H) -- MAC  Margin-based Agglomerative Clustering  MAC is an agglomerative hierarchical clustering  10 // dilute the “clusterdness” Diagnose(F,H) -- PCM  Partition-Check-Merge (PCM)結合Margin-based Agglomerative Clustering( accurate, not efficient),其 O(|H|^2)和Distance-based partition( efficient, but less accuracy)  PCM: DPC->part do MAC  If good enough, then possibly consolidate several small clusters into a minimal set of clusters  If not good enough, then increasing the input parameter k to the DPC algorithm that specifies the number of clusters to generate. 11 Diagnose(F,L)  four distinct annotations  Clustering -- blue point is centroid, to set f1 = <32,41>  f1 can be matched with SD1 to find the centroid (signature) nearest to f1 12 Diagnose(F,L)  Separating function -- Signature Database2 is a metrix with each row representing the signature of some failure  Using the Hamming distance  1000 0100 0010 0001 (annotation) + 0100 0100 0100 0100 (f1) = 2 0 2 2 13 Diagnose(F,L)  If f2 = <39,41>,using SD2 <0000> => distance 均為1  Handling error -- S5, S6 -- for f2 = <000010>  Why did SD3 diagnose f2 correctly, while SD2 did not?  14 to transmit some selected extra bits along with regular data so that the receiver can reconstruct the original data in the presence of errors caused by noise or other impairments during transmission. Fa to generate signature DB  Generating the Binary Matrix -- To random gave M threshold Rt, if (r<Rt) (I) Each row should be distinct since no two failures can have the same signature. (II) Remove columns containing all 0s or 1s(no differentiation among failures) (III) Two columns cannot be the same or complementary (IV) The radius r of M, defined as half the minimum Hamming distance, the higher the radius, the higher the error-correction ability of M.  Generating the Separating Functions -- fa learns the separating function as a binary classification tree (CART) is best 15 Fa to generate signature DB  Weighting the Separating Functions -- machine-learning ,holds  Fa uses SupportVector Machines (SVMs) to learn the weight  Confidence estimate: 16 Fa to generate signature DB  Setting the confidence threshold ( Ct ) a low Ct can lead to incorrect diagnosis, while a high Ct can invoke the more expensive Diagnose(F;H) more often than needed.  The main idea is to generate an accuracy-confidence curve (AC- Curve) for the signature database.  confidence threshold is Ct = x, the signature database has an expected accuracy of y% for matches having confidence>= x. 17 Conclusion  Diagnose(F,L)中, signature database越大越好嗎? 18

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Fa: A System for Automating Failure Diagnosis