Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Execution time and speed scaling estimates for stellar parametrization algorithms Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg, Germany [email protected] 18 June 2004 1 Report no.: GAIA-CBJ-017 version 3 Overview Execution time estimates are provided for the parametrization of stars based on their 11-band (something like MBP) photometry. Parametrization means the determination of their four principal APs (astrophysical parameters), AV , [Fe/H], log g, Teff . The parametrization is done with two different codes: a minimum distance method (MDM) and a feedforward neural network (ANN). Their execution times for parametrizing new objects scale differently with the number of objects, number of APs etc. It is not expected that either MDM or ANN will be used on the Gaia data in the form presented here. But it is useful to investigate the time scaling properties and have execution time estimates for these benchmark algorithms. For the sake of this exercise, classification/parametrization codes can be split into two types: trained and untrained. ANN is trained because it must undergo a one-off training session to set its internal parameters (weights). These are optimized using the template stars in the grid, but the the grid is not then (explicitly) used in the application phase to parametrize new objects. MDM is untrained because each program star (to be parametrized) is – in principle at least – compared to every template star in the grid. (A program star is an unclassified Gaia observation; a template star is one with existing classifications on the basis of which program stars are classified.) For application to the full complexity of the Gaia data and stars observed, I suspect that some kind of hybrid of trained and untrained methods will be required, e.g. with some form of continuous or updated training or iterative classification scheme. One example is a real time multidimensional interpolation at every point in the grid set by each program star. This would increase the total time to parametrize objects. But in the absence of an algorithm for this, a corresponding execution time estimate cannot be provided at this time. 2 System details Computer on which tests are run is a Toshiba Satellite A15-S129 with a Mobile Intel Celeron 2.40 GHz running at 2.40 GHz with 512 MB memory (swap not used). This corresponds to about 2 GFLOPS. The operating system is Linux/RH9. The ‘time’ command is used to time the code and gives four outputs: 1. the user CPU time 2. the system CPU time 1 3. the elapsed real time between invocation and termination 4. (user CPU time / system CPU time) × 100% The user CPU is the relevant one. 3 Data The data used for timing both codes is that supplied with SSP (v01) for GDAAS. It consists of X Y T P = = = = no. no. no. no. filters (inputs) APs (outputs) templates program stars = 11 = 4 = 3450 = 3594 The data are (area normalized) photon counts stored as real numbers. 4 4.1 Minimum Distance Method (MDM) Execution time estimates I use my code mdm (v1.03) which was supplied as the SSP algorithm (v01) for GDAAS. It is a very simple minimum distance method. APs are assigned to each program star based on the APs of a set of template stars. The assigned AP (for each of the four APs to be determined) is the mean of the APs of all templates found within unit search radius in the data (photometry) space, where each data dimension is scaled by the 1-sigma photometric error for that program star. Note that a minimum and a maximum number of nearest neighbours which can be used in the mean is defined, currently set at 1 and 10 respectively. The code has not be optimized for speed. Five timing runs are carried out: > time 5.700u 5.640u 5.650u 5.910u 5.640u ./mdm SSP_v01.spec 0.020s 0:05.85 97.7% 0.010s 0:05.72 98.7% 0.020s 0:05.71 99.2% 0.080s 0:06.45 92.8% 0.010s 0:05.76 98.0% The user CPU time is about 5.7s. Note that the system time and elapsed time (columns 2 and 3) vary depending on the amount of screen output, but the user CPU time is independent of this. (Note that only a negligible fraction of this time is spend with internal set-ups, as found re-running the code with P = 0.) 4.2 Scaling For this simple MDM code, the execution time scales as follows (where O[] indicates an order of magnitude estimate) 2 loop over program stars O[P ] neighbour identification O[T ] distance calculation O[X ] insert template into nearest neighbour set O[N ] ⇒ neighbour assembly time (per program star) O[TXN ] determine APs from nearest neighbours O[YN ] Total execution time O[P (TXN + YN )] ∼ O[PTXN ] as T X Y where N is the size of the nearest neighbour set (= 1 for single nearest neighbours, = 10 in runs above). Note that a single neighbour search is performed for all Y APs. This may not be appropriate, in which case there could be an additional factor of O[Y ]. Significantly, the code currently uses a brute force neighbour search. We know that we can speed this up using search trees, which require O[log2 T ]. However, search trees have an extra multiplicative dependence of O[X n ], where n ≥ 1. As far as I am aware this dependence is not properly determined from an analytic perspective. Note that the size of T is set by the need to properly sample the data space for each AP and is roughly exponential in Y : If we naively think that we must sample each AP at k different points, then the number of templates required is k Y . In practice such as ‘complete’ grid is not required as not all combinations refer to real stars (or so we currently believe) or we do not need the same density of templates throughout the AP space. But it is precisely because of this ‘curse of dimensionality’ problem that minimum distance methods are not considered practicable for large Y , leading us to consider regression solutions such as ANNs. 5 5.1 Feedforward neural networks (ANN) Execution time estimates I use my ANN code statnet (v2.02). This is a simple feedforward neural network. An 11:5:4 architecture is used (i.e. a single hidden layer of H = 5 nodes). The code has been optimized to some degree for rapid execution. The application is done with randomly initialized weights; there is no training. The execution time is as follows: > time 0.110u 0.120u 0.110u 0.110u 0.110u 5.2 ./statnet flopstest.spec 0.000s 0:00.16 68.7% 0.000s 0:00.10 120.0% 0.010s 0:00.12 100.0% 0.010s 0:00.16 75.0% 0.010s 0:00.10 120.0% Scaling The time for applying a trained network to new data scales as follows: loop over program stars O[P ] pass of single star through network O[XH + HY ] Total (application) execution time O[P (XH + HY )] where H is the number of nodes in the hidden layer. 3 The scaling for the training depends on what training algorithm is used, and there is a large variety of methods (back propagation with steepest descent or with conjugate gradients, simulated annealing, genetic algorithms, Bayesian marginalization over weights etc.) The training time does not scale with P , although the number of templates in the training grid should scale with the variance of objects in the application set and should scale with Y (and ideally not exponentially fast as was the case with MDM). If a network is to parametrize on the 4 APs covering their full variance, then experience implies that we require T ∼ 10 000, and the training time on the processor mentioned will be of order of a day. But this is currently still an open issue. 6 Application to Gaia Both the MDM and ANN algorithms used here are very simple. I expect that more sophisticated algorithms requiring more processing steps will be required for parametrization with Gaia. The parametrization algorithms need to be applied at least once to each of the 109 program stars which Gaia observes. In practice, multiple applications will be desirable: once on the mean end-of-mission data; a few times at different times during the mission to construct intermediate catalogues, both for astrophysical purposes and for selecting GIS stars etc. In this latter case, this will not be necessary for all stars, nor will it be necessary to determine all APs. Indeed, it may not be necessary to determine APs at all, but rather get some kind of empirical classification (e.g. hot/cool star, giant/dwarf) using an algorithm similar to those for astrophysical parametrization. Furthermore, if parametrizations are tied to specific stellar models (i.e. the templates are based on synthetic spectra), then we might want multiple runs against different models. I consider my present estimates as very uncertain. The ANN estimates are almost certainly an underestimate of the number of processing steps required. At the very least we will probably require a hierarchical approach involving multiple layers of networks. This would increase the number of processing steps by a factor of a few. My current feeling, however, is that it will not be sufficient for parametrization to consist of just a series of applications of this type of algorithm. I expect that some kind of online learning or real-time selection of templates will be required: this procedure would probably dominate the total execution time. But as a scaling exercise, we see that the network in section 5.1 took around 0.1s to parametrize 3594 stars. Thus for 109 Gaia stars it would take (109 /3594) · 0.1s ∼ 0.3 days. Increasing this by a factor of five to allow for a hierarchical approach (and by even more if a committee of networks is used at each point in the hierarchy), I would estimate that around 2 days is the very minimum execution time for parametrizing all Gaia stars based on MBP photometry. Based on the current code, the training time is likely to be around a day or two per network. This is of order of the application time (for all P = 109 stars) and is independent of P . Turning to MDM, a simple extrapolation of this inefficient algorithm to P = 109 and perhaps T = 106 is almost certainly a wild overestimate of the time required for this parametrization problem, as we know that much faster methods can and should be used.1 However, we can implement MDM in a much quicker way using a tree searching algorithm for neighbour identification. This has a time dependence of O[X log2 T ] (see section 4.2), as opposed to O[T ] for the 1 With P = 109 and T = 106 , the MDM algorithm would require P T /(3594 × 3450) ∼ 108 times longer to run than the times given in section 4.1. The value of T is achieved by assuming that we must sample the Teff and AV parameters each with 50 points and [Fe/H] and log g each with 15 points and that we sample this 4D grid completely. 4 brute force search currently implemented. For large T , the tree search is much faster (log2 T increases much more slowly than T ). I will also assume that in the new algorithm the neighbour calculation is performed independently for each AP, increasing the time by a factor of Y . Thus the time dependence of this new algorithm is O[PX 2 YN log2 T ] compared to O[PTXN ] for the naive algorithm used in section 4. The ratio is O[PX 2 YN log2 T ] log2 T = O XY · O[PTXN ] T where factors of P , X and N have been assumed to be the same in the two algorithms and so are cancelled. Using the figures in section 3, this increase in execution time is a factor 44 · 0.003 = 0.15, i.e. the new algorithm would run is almost 7 times faster. For the full Gaia problem we would require many more templates, something like T = 106 . The time required (per program star) of this new algorithm compared to the the naive one (with smaller T ) is then 44 · log2 106 /3450 = 44 · 0.006 = 0.25. However, I assume that the new code is more complex and will require an additional factor of at least 10 in execution time (e.g. to allow for a higher dependence of the tree algorithm on X), giving an execution time of 2.5 times that of the naive algorithm results. This is the scaling factor to use when scaling the times in section 4 to Gaia (assuming we need T = 106 templates). From section 4, the execution time per program star is about 5s/3594 = 0.0014s, so 2.5 · 0.0014s = 0.0035s is required per program star with the new algorithm. For P = 109 this gives a total execution time of around 40 days. This calculation has assumed Y = 4 APs yet we may need to consider Y = 5 or 6 (variable extinction law and alpha element abundances in addition). The new algorithm is presumed to have an explicit linear dependence on Y , but more importantly, T shows a significant (up to exponential) implicit dependence on Y (see section 4.2). The algorithms tested deal only with determining APs from MBP for single stars. There are numerous other classification and parametrization tasks including discrete object classification, unresolved binary star identification and parametrization. Many of these are additional – not alternative – algorithms which must be run, so their execution time adds to the total time required. In the ‘parallel’ classification scheme described by Bailer-Jones (2003, ASP vol. 298, Monte Rosa proceedings; also see ICAP-CBJ-007 and -011) we may want to apply a suite of order 10 independent parametrizers to each object. Alternatively, a hierarchical system could involve a few layers of classification/parametrization, in which parameters are successively improved upon. Furthermore, we will not only want to consider MBP data. BBP data will add a small increment, but RVS data represent significantly more, with X of several hundred for millions of bright stars and several tens (i.e. compressed data) for perhaps 100 million stars. Parametrization based on these data is likely to outweigh the MBP processing. 7 Conclusions The determination of four APs for 109 Gaia stars nased on their 11 band photometry using a current modest CPU (2 GFLOPS) is predicted to take between 2 days (for a globally pretrained ANN-like approach) and 1–2 months (for an MDM-like approach using an efficient tree searching algorithm assuming 106 templates). The training time per ANN is likewise of order 1 day, but as several ANNs are assumed in this model (e.g. in a hierarchy), the training times 5 would dominate the total time. Applying three cycles of Moore’s law2 brings this time down by a factor of 8. The ANN estimate is a fairly robust lower limit to the total execution time, whereas the MDM estimate is a rather vague upper limit. In practice a hybrid algorithm may well be used which makes use of online learning or real-time selection of templates. The execution time for such an algorithm is virtually impossible to extrapolate from the above estimates, but by design it should lie between these lower (ANN) and upper (MDM) limits. (In principle a hybrid algorithm could take much longer than either, e.g. if it involved a lot of real-time training.) It must be stressed that there is significant uncertainty in these figures. The main factor comes from the type of algorithm used, or rather, the complexity of algorithm needed to get the best parameter estimates from the data. Moreover, the time required for numerous other additional classification and parametrization tasks must be added to this, such as discrete classification, binary stars, multiple applications during the mission and parametrization with RVS data. The time for RVS classification in particular is likely to exceed MBP by some way. Finally, it is worth remembering that the input/output of significant amounts of data could have a significant impact on the total execution time of the code. When scaling this up to the full Gaia problem, attention must therefore be paid to the way in which the input/outupt is performed (buffering, batch processing, flat files vs. database etc.). 2 the doubling of transistors per unit area on silicon every 18 months, indicating a doubling of processing capacity for given cost in the same time; three cycles from 2004.5 takes us to 2009, around the time that the processing hardware needs to be purchased. 6