Download Document

Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant Face Detection -HSK A Quantum Programming Language and Compiler Katherine Heller, Krysta Svore, Maryam Kamvar (Al Aho) What is -HSK? Quantum Computation Simulation Language Quantum Compiler Q-HSK enables simplified programming of quantum algorithms with built-in graphics Many Worlds Interpretation One formulation of quantum theory Each universe has a corresponding amplitude (i.e. complex number) |amplitude|2 = probability of existence x u3 u1 u2 u4 Qubits Quantum analogue of a classical bit Takes on values 0, 1, or superposition of states: › › › where |α|2 + |β|2 = 1 |ω = α|0 + β |1 › › › | ω = cos(θ / 2) | 0 + eiφ sin(θ / 2) | 1 Quantum Gates Reversible – all unitary operators (U† U=I) Universal quantum gates – {U2,XOR}, Toffoli Some common gates – Hadamard, QFT, CNOT › |1 H H › + | 1›) 1/√2 ( | 0 › |0 Key Features of the Q-HSK Compiler Familiar C-style syntax Matrix operations via CBLAS Complex and real data types A quantum type qreg A graphical view of quantum algorithms Lucid representation of quantum qubits, registers, and gates Interactive user options (start, stop, pause, change animation rate) Detailed text output to trace algorithm A Simple Example int main( ) { int a, i; qreg *q; q=create(5); i = 0; q while (i < 5) { q[i] = (0.0, 0.0); i = i + 1; } q = computeHadamard(q); a = Measure(q); printf(“This is the measure: return 0; } 0 0 0 0 0 H %d”, a); M Shor’s Algorithm Factors large numbers n - number to factorize x – random number a – ranges from 0 to q-1 n2<=q<=2n2 r – period of xa (mod n) – exp. classically one factor of n is gcd(xr/2-1,n) – fast classically Graphical Interface Architecture of Q-HSK Compiler lex.yy.c Program.q Lexical Analyzer Program.cpp y.tab.c Syntax Analyzer translate.c Semantic Analyzer g++ Executable Java javac Graphics Translator One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses Collaborators: Krysta Svore, Angelos Keromytis, Sal Stolfo Host Based Intrusion Detection Systems Microsoft Windows – most often attacked Current method to combat attacks Virus Scanners and Security Patches Problem: These do not combat unknown attacks so frequent updates are needed Host based IDS Monitor system accesses to detect intrusions Application of data mining techniques The Windows Registry and RAD Windows Registry Stores configuration settings for system parameters – security information, programs, etc. Programs query the registry for information Process: EXPLORER.EXE Query: OpenKey Key: HKCR\CKSUD\{B41DB860-8EE4-11D2-9906-EA9FADC173CA}\shellex\MayChangeDefaultMenu Response: SUCCESS ResultValue: NOTFOUND Registry Anomaly Detection audit sensor model generator anomaly detector Probabilistic Anomaly Detection Algorithm Computes 25 consistency checks: P(Xi) and P(Xi|Xj) Multinomial with Hierarchical Prior For observed elements i: P(X = i) = C*(Ni + α)/(k0α+N) where N - total number of observations Ni - number of observations of symbol I α – “pseudo count” for each observed symbol k0 – number of observed symbols L – number of possible symbols For unobserved elements i: P(X = i) = (1-C)*1/(L-k0) C= N/(N+L-k0 ) One Class SVMs Analogous to two class SVM where all data lies in the first class and the origin is sole member of second class Solve optimization problem to find rule f with maximal margin f(x)=‹w,x›+b Equivalent to solving the dual quadratic programming problem: minα (1/2) ∑I,j αiαjK(xi,xj) s.t. 0≤αi≤1/(νl) , ∑i αi = 0 Kernel function projects input vectors into a feature space allowing for non-linear decision boundaries Φ: X → RN K(xi,xj) = ‹Φ(xi), Φ(xj)› Experiments Kernels: Linear: K(x,y) = (x·y) Polynomial: K(x,y) = (x·y+1)d Gaussian: K(x,y) = e -║x-y║2/(2σ2) Feature Vectors: Binary Frequency-based Results Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification Collaborators: Xiang Zhang, Ilana Hefter, Christina Leslie, Larry Chasin What Is Splicing? Donor Branch Acceptor DNA Exon1 Intron Exon1 Exon2 Exon2 mRNA Exon1 Exon2 Pseudo Exons Consensus Sequences Donor Site: MAG|gtragt (M=A/C, r=a/g) Acceptor Site: (y)10ncag|G (y=c/t, n=a/c/g/t) Donor and acceptor sites scored based on closeness to consensus Identifying Pseudo Exons Intronic segments Have high scoring “donor” and “acceptor” sites We look for discriminative signals in intronic regions near real and pseudo exons String Kernels Feature map: number of times each k-length (contiguous) string occurs in sequence Dimension of feature space is Nk Example: k=2 0 Sequence = ACCTGGTG 1 0 0 0 1 0 1 0 0 1 1 0 0 2 0 AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT Splice Kernels Hypothesis: False splice sites are intrinsically defective due to bad internal nt combinations All possible size k internal nt combinations are features Example (k=2): If the internal combination (3g,5a) occurs, that feature value is 1, otherwise it is 0 Recursive Feature Selection Normal vector to the hyperplane: w=∑i=1..m yiαixi If |wj| large in absolute value, the jth feature is important for SVM discrimination Approximation due to degree 2 polynomial kernel – calculate wup and wdown separately, then eliminate bottom 50% of features for each Stop when ROC score drops below 90% of original value on untouched test set Results Flanks US Splice Sites DS 3’ Exon Body ROC Specificitya 5’ CVb Splice Sites Flanks Exon Bodies True positives detected 32/37 35/37 37/37 0.609 0.484 - - - 1225 1225 1225 - + - 164 259 668 - - + 108 232 383 + - + 58 111 180 + + + 19 53 90 + – – – – 0.791 0.638 – + – – – 0.784 0.618 + + – – – 0.855 0.695 – – + – – 0.823 0.672 – – – + – 0.837 0.698 – – + + – 0.907 0.777 + + + + – 0.932 0.825 – – – – + 0.946 0.841 + + – – + 0.984 0.956 – – + + + 0.987 0.964 + + + + + 0.991 0.976 Rotationally Invariant Face Detection Using Multi-Resolution Histograms Collaborators: Shikher Bisaria, Tony Jebara Face Detection Given a picture with faces, how do we determine where the faces are in the image? Which pixels are face pixels? We would like to determine this with a system that: Runs in real time Recognizes rotations of faces (e.g. when someone tilts their head to one side) Gaussian Blurring Face images are greyscale (.pgms) Successive levels of blur are obtained by reconvolving previous level of blur images with a 2 dimensional gaussian function Mathematically equivalent to two passes of a one dimensional gaussian function g(i,j) = 1/(2πσ2) ∑m∑n e -(m2+n2)/(2σ2) · f(i-m,j-n) = 1/(2πσ2) ∑m e -m2/(2σ2) · ∑n e -n2/(2σ2) · f(i-m,j-n) Multi-Resolution Histograms Histogram equalize the image Concatenate histograms of image together after successive levels of gaussian blurring Average Histograms Compute average face and non-face multi-resolution histograms from training set Average Non-Face Histogram Average Face Histogram Optimization Problem C(α) = minα ║H FAVG Where – h ║2 + ║H F NFAVG – h ║2 NF h = (1/∑i αi) ∑i αihi h = (1/∑i (1- αi)) ∑i (1-αi)hi 0≤ αi ≤ 1 , ∑i αi = 1 F NF such that Let βi = (1- αi) Q = ‹hi,hj› cα = ‹hi,H › · constant cβ = ‹hi,H › · constant FAVG NFAVG = minα,β αTQα + 1/(N-1)2 βTQβ – 2cαTα – 2/(N-1)cβTβ Solve Using SMO αiNEW = [ 1/(N-1)2 Qii - 1/(N-1)2 ∑k≠i,jαk Qjj + (1- ∑k≠i,jαk ) Qjj - (1- ∑k≠i,jαk ) Qij + 1/(N-1)2 ∑k≠i,jαk Qij - 1/(N-1)2 Qij - cαi + cβi + cαj - cβj + ∑k≠i,j(αk Qik) - ∑k≠i,j(αk Qjk) - 1/(N-1)2 ∑k≠i,j(αk Qik) + 1/(N-1)2 ∑k≠i,j(αk Qjk)] / [Qii + Qjj - 2Qij + 1/(N-1)2 Qii + 1/(N-1)2 Qjj - 2/(N-1)2 Qij] Bounds for αiNEW : L=0 H = 1 - ∑k≠i,jαk αjNEW = (1 - ∑k≠i,jαk ) - αiNEW Results

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document