Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Chinese VIRTUAL OBSERVATORY Mining data using MATLAB through AstroBox Chao LIU, Chenzhou CUI Presented by: Chenzhou CUI National Astronomical Observatory, China IVOA Interoperability Meeting, Trieste 2008-5-20 1 China-VO • Chinese Virtual Observatory (China-VO) is the national VO project in China initiated in 2002 by Chinese astronomical community led by National Astronomical Observatories, Chinese Academy of Sciences. • It focuses its research and development on VO science and applications. • R&D focuses: – China-VO Platform – Unified Access to On-line Astronomical Resources and Services – VO-ready Projects and Facilities – VO-based Astronomical Research Activities – VO-based Public Education IVOA Interoperability Meeting, Trieste 2008-5-20 2 An active IVOA member IVOA 2007, Beijing 1st Small projects meeting, 2003 IVOA Interoperability Meeting, Trieste 2008-5-20 3 Our products http://services.china-vo.org • VOFilter – an XML filter for OpenOffice.org Calc to open VOTable files • SkyMouse – A Smart On-line Astronomical Information Collector • FitHAS – FITS Header Archiving System • VO-DAS – An OGSA-DAI based data access service system to provide unified access to astronomy data, including catalogs, images and spectra. • AstroBox – Coming soon – ... IVOA Interoperability Meeting, Trieste 2008-5-20 4 First Science Paper from China-VO • SDSS DR5 photometric data were searched for new Milky Way companions or substructures in the Galactic halo. • Data analysis procedures were based on the VO-DAS. • Five candidates are identified as over-dense faint stellar sources that have color-magnitude diagrams similar to those of known globular clusters, or dwarf spherical galaxies. – (Liu et al., 2008, A&A) IVOA Interoperability Meeting, Trieste 2008-5-20 5 AstroBox: Goals • To provide an astronomical data mining application service, supporting VO protocols and tools • To provide an network environment for time-consuming astronomical data mining computing • A high-level data analysis environment, NOT a raw data analysis tool as IRAF IVOA Interoperability Meeting, Trieste 2008-5-20 6 General procedures of data mining • Data Accessing – query database – high volume of data • Data Pre-processing – select qualified data – eliminate BAD data • Data Mining – try multiple times and find a way to get unknown knowledge from specific data set • Data Analysis and Interpretation – visualization – comparisons with different data source – associate results with physical meaning IVOA Interoperability Meeting, Trieste 2008-5-20 7 An introduction to MATLAB • MATLAB is a popular numerical computation software used in variant fields. • It provides dozens of toolboxes for different purposes, e.g. statistics, pattern recognizing, optimizing, neural networks etc., as well as a number of way to access data from either local or remote sites. • It also offers visualizations by flexible 2D and 3D graphics routines. • It supports Java, C, and Fortran as well as its own M-language. • It is available of accessing URL resources and parsing XML, which is necessary for embedding web service. • In its latest release, refined parallel computation is ready. • We conclude that MATLAB is one of the best platforms on which astronomical data mining tools can be developed IVOA Interoperability Meeting, Trieste 2008-5-20 8 AstroBox • AstroBox is a plug-in package for MATLAB to be used for astronomical computing and data mining VO Tools (Aladin, TOPCAT) PLASTIC VOTable Local DB VO-DAS client Astronomical algorithms AstroBox MATLAB Database Toolbox – – – – – Java Libraries • It comprises of: VOTables PLASTIC Local DB MATLAB VO-DAS Client VO-DAS IVOA Interoperability Meeting, Trieste 2008-5-20 9 VO utilities in AstroBox • VOTable access and conversion – integrate STILS package • PLASTIC availability – embed a Java subroutine to connect to PLASTIC Hub through which to exchange data and messages with third party applications, e. g. Aladin and TOPCAT. – SAMP support next... • VO-DAS client interface – embed a VO-DAS command line client to send an ADQL to VO-DAS server and wait for query result – It is also capable for asynchronous query, which can access millions of rows of data (on going) IVOA Interoperability Meeting, Trieste 2008-5-20 10 Data mining support • Regressions – linear regression • inherited from MATLAB – nonlinear regression • provide astronomical common regressive functions, e.g. King model for density profile of a dwarf galaxy. – kernel regression • Fitting – provide specific algorithms for non analytic expression such as complicated observation dataset or user defined functions – several times faster than existed MATLAB functions • Spherical surface projecting functions – Equatorial projection & Galactic projection – equal-area Lambert projection in particular for density measurement on spherical surface – Aitoff projection for overall viewing • Visualizing functions – 2-D plotting – 3-D plotting – modified on existed MATLAB functions IVOA Interoperability Meeting, Trieste 2008-5-20 11 Other functions • High level functions aiming at specific research topics, most of which currently are Milky Way related – Kurucz stellar model – Gerardi stellar population model – isochrone fitting the stellar population – Galactic star count model with disk and halo components – Chemical evolution model for stellar population (on going) • Most common used utilities – Monte Carlo methods – coordination transformations – magnitude system transformations IVOA Interoperability Meeting, Trieste 2008-5-20 12 Demos 1 • PLASTIC implementation IVOA Interoperability Meeting, Trieste 2008-5-20 13 Demo 2 • Special regression – using a hyperbola relationship between independent and dependent variables • Model fitting – density profiles of candidate dwarf galaxy IVOA Interoperability Meeting, Trieste 2008-5-20 14 Demo 3 • Isochrone fitting – observed data are accessed from either local database or VODAS server – query reference data from Gerardi database to fit theoretical isochrones IVOA Interoperability Meeting, Trieste 2008-5-20 15 Demo 4 • Visualization IVOA Interoperability Meeting, Trieste 2008-5-20 16 Demo 5 • Parallel computation – fitting a 9-parameter star count model in a 8-core server – faster than that in a single-core computer at a factor of ~8. IVOA Interoperability Meeting, Trieste 2008-5-20 17 Future works • Release as a tool to the community • Extend cosmology methods • Establish a distributed parallel computation environment • Deploy an on-line data mining service IVOA Interoperability Meeting, Trieste 2008-5-20 18 Q&A IVOA Interoperability Meeting, Trieste 2008-5-20 19