Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Progetto S.Co.P.E. – WP4 The Virtual Observatory and the PON-SCOPE The VO-Neural Team G. Longo (Principal Investigator) S. Cavuoti (applications) The R. D’Abrusco (applications) N. Deniskina (GRID – VO interface) O. Laurino (System, Applications) M. Brescia (Project Manager) A. Corazza (models and algorithms) VONeural G. d’Angelo team (documentation, GRID) M. Garofalo (applications) A. Nocella (UML software engineering) G. Riccio (Applications) S. Pardi External Members C. Donalek (Caltech) G. Djorgovski (Caltech) Summary 1. What is the Virtual Observatory & its international background 2. Why the V.Obs. is so important for the future of cosmology 3. Applications already ported under SCOPE Astronomy has become an immensely data rich field • Detector evolution (plates to digital to mosaics) • Telescope evolution • Space instruments 1000 100 10 From 1MB/night to 1TB/night 1 Heterogeneous Data + Metadata 0.1 1970 1975 1980 1985 1990 1995 2000 CCDs Glass The VLT Survey Telescope 2.6 meter 0.021”/pxl 16 k x 16 k 100 GB/night Secondary Data Providers Follow-Up Telescopes and Missions Data Services --------------Data Mining and Analysis, Target Selection Results Digital libraries V.O The Virtual Observatory Users: >>1000 Total data ca. 1 PByte Data Gathering (e.g., from sensor networks, telescopes…) Data Farming: Storage/Archiving Indexing, Searchability Data Fusion, Interoperability Database technologies Data Mining (or Knowledge Discovery in Databases): Pattern or correlation search Clustering analysis, automated classification Outlier / anomaly searches Hyperdimensional visualization Key mathematical issues Data understanding Computer aided understanding KDD Etc. New Knowledge Ongoing research Data Mining algorithms scale very badly: – Clustering ~ N log N N2, ~ D2 – Correlations ~ N log N N2, ~ Dk (k ≥ 1) – Likelihood, Bayesian ~ Nm (m ≥ 3), ~ Dk (k ≥ 1) V.S.T. RA , , t , , , f Cf. isophotal, petrosian, aperture magnitudes concentration indexes, shape parameters, etc. ,..., , , f Band 3 Band 2 Band 1 p1 RA1 , 1 , t , 1 , 1 , f11,1 , f11,1 ,..., f11,m , f11,m ,..., n , n , f n1,1 , f n1,1 ,..., f n1,m , f n1,m p2 2 2 1 ......................... 1 2 ,1 1 , f12,1 ,..., f12,m , f12,m n p N RA N , N , t , 1 , 1 , f1N ,1 , f1N ,1 ,..., f1N ,m , f1N ,m ,... D 3 mn The scientific exploitation of a multi band, multiepoch (K epochs) survey implies to search for patterns, trends, etc. among N points in a DxK dimensional parameter space N >109, D>>100, K>10 n 2 ,1 n , f n2,1 ,..., f n2,m , f n2,m Tools in the VONeural Middleware • Astrogrid Model (Nocella) • Interface between Virtual Observatory and GRID computing (GRID-launcher; Deniskina, D’Angelo) Models • Multi Layer Perceptron (VONeural_MLP; Donalek, Cavuoti, Skordovski) • Support Vector Machines (VONeural_SVM; Cavuoti, Russo) • Probabilistic Principal Surfaces (VONeural_PPS; Garofalo) Tools • Segmentation of Astronomical images (VONeural_Ext; Laurino) Scientific Applications • Data mining in multiparametric spaces (supervised and unsupervised) • Photometric redshifts (MLP, SVM) • Search for candidate quasars and AGN (PPS, NEC) • Galaxy groups and clusters • CMB simulations of cosmic string signatures • In collaboration with Moscow University • Extraction of catalogues from astronomical images • INAF + Caltech • VST pipeline for distant clusters • INAF + Caltech Application 1 – VONeural _MLP photometric redshifts Phot z are an alternative way, less accurate than spectroscopic but much more convenient in terms of computing power and observing time, to derive redshifts (i.e. distances) of extragalactic objects SDSS-DR4/5 – GG training validation Phot Z for SDSS General Galaxy sample at least 30 experiments (10-12 h/each) training on 350.000 objects 12 features results for 32.000.000 objects Test set 60%, 20%, 20% MLP, 1(5), 1(18) 0.01<Z<0.25 0.25<Z<0.50 MLP, 1(5), 1(23) MLP, 1(5), 1(24) Interpolation of systematic errors Interpolation of systematic errors s rob = 0.206 s rob = 0.234 99.6 % accuracy Photometric redshifts for 30 million SDSS galaxies σz = 0.02 Redshifts for 30 million galaxies Two types of compact groups • Spatial clustering in phot_z space: two types of groups: • • • Compact and isolated Loose and non embebbed into larger structures 95% of SKG has large fraction of E-type galaxies f150 (E) ≥ 0.5. Looking for AGN candidates Different orientations Different parameters become significant Different clusters in parameter space BUT, STILL THE SAME OBJECT ! Dimensionality reduction (classification of correlated non linear data) 3-D PCA PPS Negative entropy clustering Negative entropy clustering NEC: a matter of Gaussians Clustering method based on the “neg-entropy” NegE, a measure of non gaussianity of a variable. If A is gaussian, then NegE(A) = 0. Given a threshold d: If NegE(A U B) < d, then clusters A and B are replaced by cluster A U B Not replaced! NegE=750 Replaced! NegE=4 UKIDSS SDSS PPS preprocessing NEC clustering dendrogram labeling Cluster optimization results 1 experiment ca. 11 days BoK 0 | 1 | 2 | 3 | 4 |5| 6 PPS: We select clusters associating latent variables on the sphere and sources NEC: The number of clusters after the aggregation is determined by “cluster optimization”. SpecClass Leads to proper binning of parameter space Applicazione 2 con SVM Miglior Risultato: 81.5% PON-SCOPE GRID Infrastructure (110 nodes PON NA-CA-CT) lg2(gamma) lg2(C) SDSS spectroscopic subsample of confimed QSO (specclass=4 & 6) UKIDS HO-QSO’s Colours used for all these experimentswere calculated using adjacent bands: u−g, g−r, r−i, i−z for the optical bands, and Y −J, J −H, H −K for the near infrared ones Applicazione 2 con MLP Gli esperimenti sono stati effettuati selezionando soltanto gli oggetti presenti nel catalogo di G. Sorrentino et al. (2006) (z compreso tra 0.05 e 0.095) che venivano indicati come Tipo 1 e Tipo 2. Si sono selezionati solo quelli sicuramente AGN. Il dataset si componeva di 1570 oggetti: si è indicato con 1 gli oggetti di Tipo 1 e con 0 gli oggetti di Tipo 2. Il miglior risultato ottenuto è stato: Efficienza totale e = 99.4% Efficienza tipo 1 etipo 1 = 98.4% Efficienza tipo 2 etipo 2 = 100% Completezza tipo 1: ctipo 1 = 100% Completezza tipo 2: ctipo 2 = 98.9% 1(net) 0(net) 1(known) 126 0 0(known) 2 186 THE END Workshop SCoPE - Stato del progetto e dei Work Packages Sala Azzurra - Complesso universitario Monte Sant’Angelo 21-2-2008