Download Execution time and speed scaling estimates for stellar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ursa Minor wikipedia , lookup

Corvus (constellation) wikipedia , lookup

CoRoT wikipedia , lookup

Star catalogue wikipedia , lookup

R136a1 wikipedia , lookup

Stellar kinematics wikipedia , lookup

Star formation wikipedia , lookup

Hipparcos wikipedia , lookup

Transcript
Execution time and speed scaling estimates
for stellar parametrization algorithms
Coryn A.L. Bailer-Jones
Max-Planck-Institut für Astronomie, Heidelberg, Germany
[email protected]
18 June 2004
1
Report no.: GAIA-CBJ-017 version 3
Overview
Execution time estimates are provided for the parametrization of stars based on their 11-band
(something like MBP) photometry. Parametrization means the determination of their four
principal APs (astrophysical parameters), AV , [Fe/H], log g, Teff . The parametrization is done
with two different codes: a minimum distance method (MDM) and a feedforward neural network
(ANN). Their execution times for parametrizing new objects scale differently with the number
of objects, number of APs etc. It is not expected that either MDM or ANN will be used on the
Gaia data in the form presented here. But it is useful to investigate the time scaling properties
and have execution time estimates for these benchmark algorithms.
For the sake of this exercise, classification/parametrization codes can be split into two types:
trained and untrained. ANN is trained because it must undergo a one-off training session to
set its internal parameters (weights). These are optimized using the template stars in the
grid, but the the grid is not then (explicitly) used in the application phase to parametrize new
objects. MDM is untrained because each program star (to be parametrized) is – in principle
at least – compared to every template star in the grid. (A program star is an unclassified Gaia
observation; a template star is one with existing classifications on the basis of which program
stars are classified.)
For application to the full complexity of the Gaia data and stars observed, I suspect that
some kind of hybrid of trained and untrained methods will be required, e.g. with some form of
continuous or updated training or iterative classification scheme. One example is a real time
multidimensional interpolation at every point in the grid set by each program star. This would
increase the total time to parametrize objects. But in the absence of an algorithm for this, a
corresponding execution time estimate cannot be provided at this time.
2
System details
Computer on which tests are run is a Toshiba Satellite A15-S129 with a Mobile Intel Celeron
2.40 GHz running at 2.40 GHz with 512 MB memory (swap not used). This corresponds to
about 2 GFLOPS. The operating system is Linux/RH9. The ‘time’ command is used to time
the code and gives four outputs:
1. the user CPU time
2. the system CPU time
1
3. the elapsed real time between invocation and termination
4. (user CPU time / system CPU time) × 100%
The user CPU is the relevant one.
3
Data
The data used for timing both codes is that supplied with SSP (v01) for GDAAS. It consists of
X
Y
T
P
=
=
=
=
no.
no.
no.
no.
filters (inputs)
APs (outputs)
templates
program stars
=
11
=
4
= 3450
= 3594
The data are (area normalized) photon counts stored as real numbers.
4
4.1
Minimum Distance Method (MDM)
Execution time estimates
I use my code mdm (v1.03) which was supplied as the SSP algorithm (v01) for GDAAS. It is
a very simple minimum distance method. APs are assigned to each program star based on the
APs of a set of template stars. The assigned AP (for each of the four APs to be determined) is
the mean of the APs of all templates found within unit search radius in the data (photometry)
space, where each data dimension is scaled by the 1-sigma photometric error for that program
star. Note that a minimum and a maximum number of nearest neighbours which can be used
in the mean is defined, currently set at 1 and 10 respectively. The code has not be optimized
for speed.
Five timing runs are carried out:
> time
5.700u
5.640u
5.650u
5.910u
5.640u
./mdm SSP_v01.spec
0.020s 0:05.85 97.7%
0.010s 0:05.72 98.7%
0.020s 0:05.71 99.2%
0.080s 0:06.45 92.8%
0.010s 0:05.76 98.0%
The user CPU time is about 5.7s. Note that the system time and elapsed time (columns 2 and
3) vary depending on the amount of screen output, but the user CPU time is independent of
this. (Note that only a negligible fraction of this time is spend with internal set-ups, as found
re-running the code with P = 0.)
4.2
Scaling
For this simple MDM code, the execution time scales as follows (where O[] indicates an order
of magnitude estimate)
2
loop over program stars O[P ]
neighbour identification O[T ]
distance calculation O[X ]
insert template into nearest neighbour set O[N ]
⇒ neighbour assembly time (per program star) O[TXN ]
determine APs from nearest neighbours O[YN ]
Total execution time O[P (TXN + YN )] ∼ O[PTXN ] as T X Y
where N is the size of the nearest neighbour set (= 1 for single nearest neighbours, = 10 in
runs above). Note that a single neighbour search is performed for all Y APs. This may not be
appropriate, in which case there could be an additional factor of O[Y ]. Significantly, the code
currently uses a brute force neighbour search. We know that we can speed this up using search
trees, which require O[log2 T ]. However, search trees have an extra multiplicative dependence
of O[X n ], where n ≥ 1. As far as I am aware this dependence is not properly determined from
an analytic perspective. Note that the size of T is set by the need to properly sample the data
space for each AP and is roughly exponential in Y : If we naively think that we must sample
each AP at k different points, then the number of templates required is k Y . In practice such
as ‘complete’ grid is not required as not all combinations refer to real stars (or so we currently
believe) or we do not need the same density of templates throughout the AP space. But it
is precisely because of this ‘curse of dimensionality’ problem that minimum distance methods
are not considered practicable for large Y , leading us to consider regression solutions such as
ANNs.
5
5.1
Feedforward neural networks (ANN)
Execution time estimates
I use my ANN code statnet (v2.02). This is a simple feedforward neural network. An 11:5:4
architecture is used (i.e. a single hidden layer of H = 5 nodes). The code has been optimized
to some degree for rapid execution. The application is done with randomly initialized weights;
there is no training. The execution time is as follows:
> time
0.110u
0.120u
0.110u
0.110u
0.110u
5.2
./statnet flopstest.spec
0.000s 0:00.16 68.7%
0.000s 0:00.10 120.0%
0.010s 0:00.12 100.0%
0.010s 0:00.16 75.0%
0.010s 0:00.10 120.0%
Scaling
The time for applying a trained network to new data scales as follows:
loop over program stars O[P ]
pass of single star through network O[XH + HY ]
Total (application) execution time O[P (XH + HY )]
where H is the number of nodes in the hidden layer.
3
The scaling for the training depends on what training algorithm is used, and there is a large
variety of methods (back propagation with steepest descent or with conjugate gradients, simulated annealing, genetic algorithms, Bayesian marginalization over weights etc.) The training
time does not scale with P , although the number of templates in the training grid should scale
with the variance of objects in the application set and should scale with Y (and ideally not
exponentially fast as was the case with MDM). If a network is to parametrize on the 4 APs
covering their full variance, then experience implies that we require T ∼ 10 000, and the training
time on the processor mentioned will be of order of a day. But this is currently still an open
issue.
6
Application to Gaia
Both the MDM and ANN algorithms used here are very simple. I expect that more sophisticated
algorithms requiring more processing steps will be required for parametrization with Gaia.
The parametrization algorithms need to be applied at least once to each of the 109 program
stars which Gaia observes. In practice, multiple applications will be desirable: once on the mean
end-of-mission data; a few times at different times during the mission to construct intermediate
catalogues, both for astrophysical purposes and for selecting GIS stars etc. In this latter case,
this will not be necessary for all stars, nor will it be necessary to determine all APs. Indeed,
it may not be necessary to determine APs at all, but rather get some kind of empirical classification (e.g. hot/cool star, giant/dwarf) using an algorithm similar to those for astrophysical
parametrization. Furthermore, if parametrizations are tied to specific stellar models (i.e. the
templates are based on synthetic spectra), then we might want multiple runs against different
models.
I consider my present estimates as very uncertain. The ANN estimates are almost certainly an
underestimate of the number of processing steps required. At the very least we will probably
require a hierarchical approach involving multiple layers of networks. This would increase the
number of processing steps by a factor of a few. My current feeling, however, is that it will
not be sufficient for parametrization to consist of just a series of applications of this type of
algorithm. I expect that some kind of online learning or real-time selection of templates will be
required: this procedure would probably dominate the total execution time. But as a scaling
exercise, we see that the network in section 5.1 took around 0.1s to parametrize 3594 stars.
Thus for 109 Gaia stars it would take (109 /3594) · 0.1s ∼ 0.3 days. Increasing this by a factor
of five to allow for a hierarchical approach (and by even more if a committee of networks is
used at each point in the hierarchy), I would estimate that around 2 days is the very minimum
execution time for parametrizing all Gaia stars based on MBP photometry. Based on the
current code, the training time is likely to be around a day or two per network. This is of order
of the application time (for all P = 109 stars) and is independent of P .
Turning to MDM, a simple extrapolation of this inefficient algorithm to P = 109 and perhaps
T = 106 is almost certainly a wild overestimate of the time required for this parametrization
problem, as we know that much faster methods can and should be used.1 However, we can
implement MDM in a much quicker way using a tree searching algorithm for neighbour identification. This has a time dependence of O[X log2 T ] (see section 4.2), as opposed to O[T ] for the
1
With P = 109 and T = 106 , the MDM algorithm would require P T /(3594 × 3450) ∼ 108 times longer to run
than the times given in section 4.1. The value of T is achieved by assuming that we must sample the Teff and
AV parameters each with 50 points and [Fe/H] and log g each with 15 points and that we sample this 4D grid
completely.
4
brute force search currently implemented. For large T , the tree search is much faster (log2 T
increases much more slowly than T ). I will also assume that in the new algorithm the neighbour
calculation is performed independently for each AP, increasing the time by a factor of Y . Thus
the time dependence of this new algorithm is O[PX 2 YN log2 T ] compared to O[PTXN ] for the
naive algorithm used in section 4. The ratio is
O[PX 2 YN log2 T ]
log2 T
= O XY ·
O[PTXN ]
T
where factors of P , X and N have been assumed to be the same in the two algorithms and
so are cancelled. Using the figures in section 3, this increase in execution time is a factor
44 · 0.003 = 0.15, i.e. the new algorithm would run is almost 7 times faster. For the full Gaia
problem we would require many more templates, something like T = 106 . The time required
(per program star) of this new algorithm compared to the the naive one (with smaller T ) is then
44 · log2 106 /3450 = 44 · 0.006 = 0.25. However, I assume that the new code is more complex
and will require an additional factor of at least 10 in execution time (e.g. to allow for a higher
dependence of the tree algorithm on X), giving an execution time of 2.5 times that of the naive
algorithm results. This is the scaling factor to use when scaling the times in section 4 to Gaia
(assuming we need T = 106 templates). From section 4, the execution time per program star is
about 5s/3594 = 0.0014s, so 2.5 · 0.0014s = 0.0035s is required per program star with the new
algorithm. For P = 109 this gives a total execution time of around 40 days.
This calculation has assumed Y = 4 APs yet we may need to consider Y = 5 or 6 (variable
extinction law and alpha element abundances in addition). The new algorithm is presumed to
have an explicit linear dependence on Y , but more importantly, T shows a significant (up to
exponential) implicit dependence on Y (see section 4.2).
The algorithms tested deal only with determining APs from MBP for single stars. There are
numerous other classification and parametrization tasks including discrete object classification,
unresolved binary star identification and parametrization. Many of these are additional –
not alternative – algorithms which must be run, so their execution time adds to the total
time required. In the ‘parallel’ classification scheme described by Bailer-Jones (2003, ASP
vol. 298, Monte Rosa proceedings; also see ICAP-CBJ-007 and -011) we may want to apply a
suite of order 10 independent parametrizers to each object. Alternatively, a hierarchical system
could involve a few layers of classification/parametrization, in which parameters are successively
improved upon. Furthermore, we will not only want to consider MBP data. BBP data will add
a small increment, but RVS data represent significantly more, with X of several hundred for
millions of bright stars and several tens (i.e. compressed data) for perhaps 100 million stars.
Parametrization based on these data is likely to outweigh the MBP processing.
7
Conclusions
The determination of four APs for 109 Gaia stars nased on their 11 band photometry using
a current modest CPU (2 GFLOPS) is predicted to take between 2 days (for a globally pretrained ANN-like approach) and 1–2 months (for an MDM-like approach using an efficient tree
searching algorithm assuming 106 templates). The training time per ANN is likewise of order
1 day, but as several ANNs are assumed in this model (e.g. in a hierarchy), the training times
5
would dominate the total time. Applying three cycles of Moore’s law2 brings this time down by a
factor of 8. The ANN estimate is a fairly robust lower limit to the total execution time, whereas
the MDM estimate is a rather vague upper limit. In practice a hybrid algorithm may well be
used which makes use of online learning or real-time selection of templates. The execution
time for such an algorithm is virtually impossible to extrapolate from the above estimates, but
by design it should lie between these lower (ANN) and upper (MDM) limits. (In principle
a hybrid algorithm could take much longer than either, e.g. if it involved a lot of real-time
training.) It must be stressed that there is significant uncertainty in these figures. The main
factor comes from the type of algorithm used, or rather, the complexity of algorithm needed
to get the best parameter estimates from the data. Moreover, the time required for numerous
other additional classification and parametrization tasks must be added to this, such as discrete
classification, binary stars, multiple applications during the mission and parametrization with
RVS data. The time for RVS classification in particular is likely to exceed MBP by some way.
Finally, it is worth remembering that the input/output of significant amounts of data could
have a significant impact on the total execution time of the code. When scaling this up to the
full Gaia problem, attention must therefore be paid to the way in which the input/outupt is
performed (buffering, batch processing, flat files vs. database etc.).
2
the doubling of transistors per unit area on silicon every 18 months, indicating a doubling of processing
capacity for given cost in the same time; three cycles from 2004.5 takes us to 2009, around the time that the
processing hardware needs to be purchased.
6