Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
VO-Neural Group G. Longo – P.I. M. Brescia – P.M. Team A. Corazza (models) O. Laurino (System and models for image segmentation) S. Cavuoti & E. Russo (Models – SVM) N. Deniskina (Grid manager and interfacing with V.O.) G. d’Angelo (Grid developments, JAVA clients and documentation) M. Garofalo & A. Nocella (UML and models: PPS+NEC) B. Skordovski & C. Donalek (models: MLP) S. Cavuoti, E. de Filippis, R. D’Abrusco, (Test & Validation) The problem Band 2 Band 1 Cf. isophotal, petrosian, aperture magnitudes concentration indexes, shape parameters, tc. RA , , t , , , f ,..., , , f p1 RA1 , 1 , t , 1 , 1 , f11,1 , f11,1 ,..., f11,m , f11,m ,..., n , n , f n1,1 , f n1,1 ,..., f n1,m , f n1,m p2 2 2 1 ......................... 1 2 ,1 1 , f12,1 ,..., f12,m , f12,m n n 2 ,1 n , f n2,1 ,..., f n2,m , f n2,m Band 3 p N RA N , N , t , 1 , 1 , f1N ,1 , f1N ,1 ,..., f1N ,m , f1N ,m ,... D 3 mn ….. The scientific exploitation of a multi band, multiepoch (K epochs) survey implies to search for patterns, trends, etc. among Band n N points in a DxK dimensional parameter space N >109, D>>100, K>10 The mixed blessing of data richness Data Mining algorithms scale very badly: – Clustering ~ N log N N2, ~ D2 – Correlations ~ N log N N2, ~ Dk (k ≥ 1) – Likelihood, Bayesian ~ Nm (m ≥ 3), ~ Dk (k ≥ 1) Dimensionality reduction (without a significant loss of information) is a critical need! International Virtual Observatory Alliance Started in 2000 • • • • User friendly access to distributed computing Transparent homogeneization of multiwavelenght multiepoch standards Similar standards for real and simulated data Common learning framework (no need to adapt know-how’s to specific data: experimental work focused on science and not on technicalities) Tasks in Progress • Data Mining Models • MLP (Multilayer Perceptron) FANN library completed by including SOFT-MAX and Cross-Entropy • SVM (Support Vector Machines) • PPS (Probabilistic Principal Surfaces) • NEC (Negative Entropy Clustering & Dendrogram) • Additional problems • • • • Star/Galaxy Classification (in coll. with Caltech) Next (Neural Extractor) for Image Segmentation and object parameters extraction Simulation of cosmic strings signatures on Cosmic Microwave Background N body simulations (mesh code) Implementation of interface between ASTROGRID and GRID- SCOPE with different CA ASTROGRID – GRID Launcher (N. Deniskina) 1. 2. 3. 4. 5. 6. Forms directory on Lupalberto (i.e. executable file, input data) and wraps it Makes connection with SCOPE U.I. (checking certificate) Sends wrapped directory from Lupalberto to Scope U.I. Unzips the wrapped job directory on SCOPE U.I. & forms JDL job Sends job to GRID and waits for the results Wraps the output and sends it to Lupalberto Chart flow Scientific cases in progress • • • • Physical classification of galaxies Search for QSO at intermediate high redshifts Search for cosmic strings in CMB Characterization of cosmic large scale structure http://people.na.infn.it/~astroneural/ First results: AGN classification (Cavuoti, D’Abrusco & D’Angelo) Different orientations Different parameters become significant Different clusters in parameter space BUT, STILL THE SAME OBJECT ! First Scientific Experiments on SCOPE GRID SVM on AGN dataset extracted from SDSS for automatic classification of galaxies BoK from spectroscopically confirmed sample SVM code implemented from LIB-SVM 13 parameters for 89.000 objects SVM – RBF needs optimization against 2 parameters (C and g) Maximum of classification rate must be found in a given range 110 grid points in parameter space (each at least 1 h) 110 computers in GRID-SCOPE (Na-CT-CA) RESULTS: First Experiment Seyfert 1 vs Seyfert 2 Second Experiment AGN vs non-AGN Thanks to all the other WP’s & a special thank to S. Pardi String_simulation Velocità stringa ß Direzione stringa k INPUT Crea mappa dT/T Distanza osservatore – stringa ? Raggio di smoothing r INPUT MAP.FITS MAPPA HEALPIX dT/T SMOOTHED.FITS MAPPA HEALPIX dT/T Crea mappa dT/T smoothed END PROGRAM The problem: a huge Parameter space Applications: High dimensionality Massive Data sets (from astronomical survey but also any other high dimensionality data space) p t N Any observed (simulated) datum p is defined by a set of parameters. Ex.: N 100 R.A • • • • RA and dec time experimental setup (spatial and spectral resolution, limiting mag, limiting surface brightness, etc.) • Polarization • Etc. The parameter space concept is crucial to: 1. Guide the quest for new discoveries 2. Find new physical laws (patterns) in GRID SCOPE User interface Lupalberto CEC Output of results myspace Resource Broker resource registry ACR Astrogrid Middleware Computational element Working Node USER ASTROGRID Execution of job