Download Report del WP4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Progetto S.Co.P.E. – WP4
The Virtual Observatory and the PON-SCOPE
The VO-Neural Team
G. Longo (Principal Investigator)
S. Cavuoti (applications)
The
R. D’Abrusco (applications)
N. Deniskina (GRID – VO interface)
O. Laurino (System, Applications)
M. Brescia (Project Manager)
A. Corazza (models and algorithms)
VONeural
G. d’Angelo team
(documentation, GRID)
M. Garofalo (applications)
A. Nocella (UML software engineering)
G. Riccio (Applications)
S. Pardi
External Members
C. Donalek (Caltech)
G. Djorgovski (Caltech)
Summary
1. What is the Virtual Observatory & its international background
2. Why the V.Obs. is so important for the future of cosmology
3. Applications already ported under SCOPE
Astronomy has become an immensely data rich field
• Detector evolution
(plates to digital to mosaics)
• Telescope evolution
• Space instruments
1000
100
10
From 1MB/night to 1TB/night
1
Heterogeneous Data + Metadata
0.1
1970
1975
1980
1985
1990
1995
2000
CCDs
Glass
The VLT Survey Telescope
2.6 meter
0.021”/pxl
16 k x 16 k
100 GB/night
Secondary
Data
Providers
Follow-Up
Telescopes
and
Missions
Data Services
--------------Data Mining
and Analysis,
Target Selection
Results
Digital libraries
V.O
The Virtual Observatory
Users: >>1000
Total data ca. 1 PByte
Data Gathering (e.g., from sensor networks, telescopes…)
Data Farming:
Storage/Archiving
Indexing, Searchability
Data Fusion, Interoperability
Database
technologies
Data Mining (or Knowledge Discovery in Databases):
Pattern or correlation search
Clustering analysis, automated classification
Outlier / anomaly searches
Hyperdimensional visualization
Key mathematical
issues
Data understanding
Computer aided understanding
KDD
Etc.
New Knowledge
Ongoing research
Data Mining algorithms scale very badly:
– Clustering ~ N log N  N2, ~ D2
– Correlations ~ N log N  N2, ~ Dk (k ≥ 1)
– Likelihood, Bayesian ~ Nm (m ≥ 3), ~ Dk (k ≥ 1)
V.S.T.


 RA ,  , t ,  ,  , f
Cf. isophotal, petrosian,
aperture magnitudes
concentration indexes,
shape parameters, etc.
 
,...,  ,  , f
Band 3
Band 2
Band 1
p1  RA1 ,  1 , t , 1 , 1 , f11,1 , f11,1 ,..., f11,m , f11,m ,..., n , n , f n1,1 , f n1,1 ,..., f n1,m , f n1,m
p2
2
2
1
.........................

1
2 ,1
1
, f12,1 ,..., f12,m , f12,m

n
 
p N  RA N ,  N , t , 1 , 1 , f1N ,1 , f1N ,1 ,..., f1N ,m , f1N ,m ,...
D  3 mn
The scientific exploitation of a multi band, multiepoch
(K epochs) survey implies to search for patterns,
trends, etc. among
N points in a DxK dimensional parameter space
N >109, D>>100, K>10
n
2 ,1
n

, f n2,1 ,..., f n2,m , f n2,m

Tools in the VONeural
Middleware
• Astrogrid Model (Nocella)
• Interface between Virtual Observatory and GRID computing
(GRID-launcher; Deniskina, D’Angelo)
Models
• Multi Layer Perceptron (VONeural_MLP; Donalek, Cavuoti, Skordovski)
• Support Vector Machines (VONeural_SVM; Cavuoti, Russo)
• Probabilistic Principal Surfaces (VONeural_PPS; Garofalo)
Tools
• Segmentation of Astronomical images (VONeural_Ext; Laurino)
Scientific Applications
• Data mining in multiparametric spaces (supervised and unsupervised)
• Photometric redshifts (MLP, SVM)
• Search for candidate quasars and AGN (PPS, NEC)
• Galaxy groups and clusters
• CMB simulations of cosmic string signatures
• In collaboration with Moscow University
• Extraction of catalogues from astronomical images
• INAF + Caltech
• VST pipeline for distant clusters
• INAF + Caltech
Application 1 –
VONeural _MLP photometric redshifts

Phot z are an alternative way, less accurate than spectroscopic but much
more convenient in terms of computing power and observing time, to
derive redshifts (i.e. distances) of extragalactic objects
SDSS-DR4/5 – GG
training
validation
Phot Z for SDSS General Galaxy sample
at least 30 experiments (10-12 h/each)
training on 350.000 objects 12 features
results for 32.000.000 objects
Test set
60%, 20%, 20%
MLP, 1(5), 1(18)
0.01<Z<0.25
0.25<Z<0.50
MLP, 1(5), 1(23)
MLP, 1(5), 1(24)
Interpolation
of systematic errors
Interpolation
of systematic errors
s rob = 0.206
s rob = 0.234
99.6 % accuracy
Photometric redshifts for
30 million SDSS galaxies
σz = 0.02
Redshifts for 30 million galaxies
Two types of compact groups
•
Spatial clustering in phot_z space: two
types of groups:
•
•
•
Compact and isolated
Loose and non embebbed into larger
structures
95% of SKG has large fraction of E-type
galaxies f150 (E) ≥ 0.5.
Looking for AGN candidates
Different orientations
Different parameters become
significant
Different clusters in parameter space
BUT, STILL THE SAME
OBJECT !
Dimensionality reduction
(classification of correlated non linear data)
3-D PCA
PPS
Negative entropy clustering
Negative entropy clustering
NEC: a matter of Gaussians
Clustering method based on the “neg-entropy” NegE, a measure of non gaussianity of a variable. If A
is gaussian, then NegE(A) = 0. Given a threshold d:
If NegE(A
U B) < d, then clusters A and B are replaced by cluster A U B
Not replaced!
NegE=750
Replaced!
NegE=4
UKIDSS
SDSS
PPS
preprocessing
NEC
clustering
dendrogram
labeling
Cluster
optimization
results
1 experiment ca. 11 days
BoK
0 | 1 | 2 | 3 | 4 |5| 6
PPS: We select clusters associating latent variables on the
sphere and sources
NEC: The number of clusters after the aggregation is
determined by “cluster optimization”.
SpecClass
Leads to proper binning of parameter space
Applicazione 2 con SVM
Miglior Risultato: 81.5%
PON-SCOPE GRID Infrastructure (110 nodes PON NA-CA-CT)
lg2(gamma)
lg2(C)
SDSS spectroscopic subsample of confimed QSO (specclass=4 & 6)
UKIDS HO-QSO’s
Colours used for all these experimentswere calculated using adjacent
bands: u−g, g−r, r−i, i−z for the optical bands, and Y −J, J −H, H −K for
the near infrared ones
Applicazione 2 con MLP
Gli esperimenti sono stati effettuati selezionando soltanto gli oggetti
presenti nel catalogo di G. Sorrentino et al. (2006) (z compreso tra
0.05 e 0.095) che venivano indicati come Tipo 1 e Tipo 2. Si sono
selezionati solo quelli sicuramente AGN.
Il dataset si componeva di 1570 oggetti: si è indicato con 1 gli oggetti
di Tipo 1 e con 0 gli oggetti di Tipo 2.
Il miglior risultato ottenuto è stato:
Efficienza totale e = 99.4%
Efficienza tipo 1 etipo 1 = 98.4%
Efficienza tipo 2 etipo 2 = 100%
Completezza tipo 1: ctipo 1 = 100%
Completezza tipo 2: ctipo 2 = 98.9%
1(net)
0(net)
1(known) 126
0
0(known) 2
186
THE END
Workshop SCoPE - Stato del progetto e dei Work Packages
Sala Azzurra - Complesso universitario Monte Sant’Angelo
21-2-2008