Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Calhoun: The NPS Institutional Archive Faculty and Researcher Publications Faculty and Researcher Publications Collection 2007 Extensions of Stochastic Optimization Results from Problems with Simple to Problems with Complex Failure Probability Functions Royset, J.O. þÿ28. J.O. Royset and E. Polak, 2007, ÿ“Extensions of Stochastic Optimization Results from þÿProblems with Simple to Problems with Complex Failure Probability Functions,ÿ” Journal of Optimization Theory and Applications, Vol. 133, No. 1, pp. http://hdl.handle.net/10945/38227 J Optim Theory Appl (2007) 133: 1–18 DOI 10.1007/s10957-007-9178-0 Extensions of Stochastic Optimization Results to Problems with System Failure Probability Functions J.O. Royset · E. Polak Published online: 20 April 2007 © Springer Science+Business Media, LLC 2007 Abstract We derive an implementable algorithm for solving nonlinear stochastic optimization problems with failure probability constraints using sample average approximations. The paper extends prior results dealing with a failure probability expressed by a single measure to the case of failure probability expressed in terms of multiple performance measures. We also present a new formula for the failure probability gradient. A numerical example addressing the optimal design of a reinforced concrete highway bridge illustrates the algorithm. Keywords Stochastic optimization · Sample average approximations · Monte Carlo simulation · Reliability-based optimal design 1 Introduction This paper focuses on a class of decision-making problems frequently arising in design and maintenance optimization of mechanical structures such as bridges, building frames and aircraft wings. Let x ∈ Rn be a vector of design variables, e.g. related to the size and form of the structure, let c : Rn → R be a deterministic, continuously Communicated by P. Tseng. This work was sponsored by the Research Associateship Program, National Research Council. The authors are grateful for the valuable insight from Professors Alexander Shapiro, Evarist Gine, and Jon A. Wellner. The authors also thank Professor Tito Homem-de-Mello for commenting on an early draft of this paper. J.O. Royset () Operations Research Department, Naval Postgraduate School, Monterey, CA, USA e-mail: [email protected] E. Polak Department of Electrical Engineering & Computer Sciences, University of California, Berkeley, CA, USA 2 J Optim Theory Appl (2007) 133: 1–18 differentiable cost function, and let p : Rn → [0, 1] be a failure probability (to be defined below). Then, the optimal design problem is defined as a chance-constrained stochastic program in the form (P) minn c(x) | p(x) ≤ q, x ∈ X , (1) x∈R where q is a bound on the failure probability, X = {x ∈ Rn | fj (x) ≤ 0, j ∈ J}, and fj : Rn → R, j ∈ J = {1, 2, . . . , J }, are deterministic, continuously differentiable functions. Mechanical structures are assessed using one or more performance measures, e.g., displacement and stress levels at various locations in the structure. In [1], we focused on the case were failure probability is defined as the probability of one performance measure being unsatisfactory. In this paper, we consider the general case of “system failure probability”, where failure probability is defined by a collection of performance measures. Here, “failure” occurs when specific combinations of the performance measures are unsatisfactory. Let gk : Rn × Rm → R, k ∈ K = {1, 2, . . . , K}, be a collection of performance functions describing the relevant performance measures. The functions gk (·, ·) depend on the design x ∈ Rn and a standard normal random m-vector u. This random vector incorporates the uncertainty in the structure and its environment. Note that random vectors governed by distributions such as the multivariate normal (possibly with correlation) and lognormal distributions can be transformed into a standard normal vector using a smooth bijective mapping. Hence, the limitation to a multivariate standard normal distribution is in many applications not restrictive (see e.g. Chap. 7 of [2] and [3–5]. By convention, gk (x, u) ≤ 0 represents unsatisfactory performance of the kth measure. Formally, let the probability space (Rm , Rm , P) be defined in terms of the sample space Rm , the Borel sets on Rm , denoted Rm , and the multivariate standard normal distribution P of u. Assuming that gk (x, ·) is measurable for all x ∈ Rn and that k ∈ K, we define the failure probability of the structure by p(x) = P[F(x)], where the failure domain F(x) = u ∈ Rm | gk (x, u) ≤ 0 , (2) i∈I k∈Ci with Ci ⊂ K and I = {1, 2, . . . , I } defining the combinations of performance measures that leads to structural failure. In [1], we focused on the case K = 1, C1 = {1}, and I = 1. Solution approaches for stochastic programs tend to be based on either interior or exterior sampling. Interior sampling approaches aim to solve stochastic programs directly and resort to sampling techniques whenever the algorithm requires the evaluation of probability functions, or more generally, expectations. Usually, different samples are generated each time an evaluation is necessary. In this group of approaches we find stochastic quasigradient methods [6–8]. These methods are difficult to apply to problems involving failure probability constraints. In principle, such constraints can be removed by penalty or barrier terms in the objective function. However, the details of an implementable algorithm for nonlinear problems based on this principle do not appear to have been worked out. J Optim Theory Appl (2007) 133: 1–18 3 Exterior sampling techniques construct and solve a sample average approximation without further sampling during the optimization. The results available for exterior sampling techniques include the fact that minimizers and minimum values of sample average approximations converge with probability one to the minimizers and the minimum value of the original problem, respectively, as the number of samples goes to infinity (see Chap. 6 of [9] and references therein). For techniques for checking whether a given point is close to stationarity see e.g. Sect. 6.4 of [9]. These results provide guidance for the selection of approximating problems to be solved using some deterministic optimization algorithm. The authors of [10] present a framework based on sample average approximations, for proving convergence of nonlinear programs involving probabilities or expectations. Only a rather weak sense of convergence is presented in [10]: every accumulation point of the sequence of function values generated by the algorithm is bounded from above by the largest function value at any stationary point. In addition, the details of an implementable algorithm for problems with failure probability constraints are not provided. Recently, elements of interior and exterior sampling techniques have been combined. In [11], sample average approximations are used to derive a gradient-type search direction for nonlinear stochastic programs. For each iteration, resampling is performed to generate a new search direction. Under the assumption of convex, twice-differentiable functions, it is shown that the expected distance to a Karush– Kuhn–Tucker point vanishes with increasing iterations when the search direction is combined with sufficiently small stepsizes. However, it is not clear how to implement a satisfactory stepsize rule. In addition, the literature contains a large number of approximate or heuristic approaches for solving (P) and similar, more specialized problems. A review of such results is found in [12]. The failure probability is continuously differentiable under broad conditions when F(x) is bounded and given by a union of events [13]. However, the derivative formula in [13] is difficult to use in estimation because it may involve surface integrals. In [14] (see also [8]), an integral transformation is presented, which, when it exists, leads to a simple formula for ∇p(x). However, it is not clear under what conditions the transformation exists. As in [13], Ref. [15] assumes that F(x) is bounded and given by a union of events. With this restriction, a formula for ∇p(x) involving integration over a simplex is derived. In principle, this integral can be evaluated by Monte Carlo methods. However, to the authors’ knowledge, there is no computational experience with estimation of failure probabilities for highly reliable mechanical structures using this formula. In Sect. 9.2 of [2], a formula for ∇p(x) is suggested, without a complete proof, for the case K = 1. This formula is based on an expression for p(x) that has been found computationally efficient in applications. In the next section, we generalize the special-case formula for ∇p(x) found in [2] and provide a proof. We also present estimators for p(x) and ∇p(x) and discuss their properties. For completeness, Sect. 3 presents Algorithm Model 3.3.27 from [16], which we use to develop our algorithm for solving (P). Section 4 derives an implementation of the algorithm model by generalizing the results of [1]. Section 5 presents a numerical example. 4 J Optim Theory Appl (2007) 133: 1–18 2 Failure Probability As indicated in Sect. 1, a significant difficulty is associated with deriving a tractable formula for the gradient of p(x).1 To overcome this difficulty, we decompose the vector u into a direction w and a positive length r as described in further detail below (see [17] for the first application of such an decomposition). We need the component failure domain Fk (x) defined by Fk (x) = {u ∈ Rm | gk (x, u) ≤ 0}, k ∈ K, and the surface of the unit hypersphere denoted by B = {w ∈ Rm | w = 1}. The following assumption is sufficient to ensure equivalence between p(x) and its alternative expression in Proposition 2.1 below. Assumption 2.1 We assume that: (i) The distribution P of the random m-vector u is standard normal. (ii) For a given S ⊂ Rn , Fk (x)c (= Rm − Fk (x)) is star-shaped with respect to the origin for all k ∈ K and x ∈ S, i.e., for all x ∈ S, 0 ∈ Fk (x)c and for every w ∈ B there either exists a unique r > 0 such that gk (x, rw) = 0 or gk (x, rw) = 0 for all r > 0. In view of Assumption 2.1(i) and the fact that mechanical structures have small failure probabilities, we conclude that Fk (x) tends to be located far from the “high probability region” close to 0 ∈ Rm . Hence, the condition that 0 ∈ Fk (x)c is almost always satisfied for mechanical structures. The second part of Assumption 2.1(ii) is satisfied when gk (x, ·) is affine for all x ∈ S and k ∈ K, which is approximately true for many mechanical structures (Sects. 4.1 and 5.2 of [2]). However, in general, it can be difficult to verify the second part of Assumption 2.1(ii) analytically. This is especially the case when gk (·, ·), k ∈ K, are given by the solutions of some (differential) equations. Nevertheless, it is possible to obtain numerical indications of the validity of Assumption 2.1(ii) by estimating the failure probability using an alternative estimator as explained below. We also note that equivalent assumptions were adopted by [15, 18, 19] and Sect. 9.2 of [2]. Let P and E be the uniform distribution on B and the corresponding expectation, respectively. Furthermore, we define rk : Rn × B → [0, ∞], k ∈ K, to be the smallest nonnegative solution of gk (x, rw) = 0 for a given x ∈ Rn and w ∈ B, i.e., min{r | gk (x, rw) ≤ 0, r ≥ 0}, if {r | gk (x, rw) ≤ 0, r ≥ 0} = ∅, r rk (x, w) = ∞, otherwise. (3) Note that, under Assumption 2.1(ii), 0 ∈ Fk (x)c and hence rk (x, w) > 0. Proposition 2.1 If Assumption 2.1 holds at x ∈ Rn , then p(x) = E φ(x, w) , (4) 1 We will not make use of this fact, but note that p(x) can be estimated using standard Monte Carlo techniques independently of the results in this section. J Optim Theory Appl (2007) 133: 1–18 where 5 φ(x, w) = max min 1 − χm2 rk2 (x, w) i∈I k∈Ci (5) and χm2 (·) is the chi-square cumulative distribution function with m degrees of freedom. Proof As in [17] (see alternatively [18, 19], and Sect. 9.2 of [2]), we observe that, if the standard normal random vector u = rw and r 2 is chi-square distributed with m degrees of freedom, then w is a random vector, independent of r, uniformly distributed over the surface of the m-dimensional unit hypersphere. Hence, using an equivalent minimax expression of (2) and the total probability rule we obtain that p(x) = E Pr min max gk (x, rw) ≤ 0 | w , (6) i∈I k∈Ci where Pr is the chi-square distribution. In view of Assumption 2.1, the expression inside the expectation in (6) equals 1 − χm2 (r 2 (x, w)), where r(x, w) = mini∈I maxk∈Ci rk (x, w). Geometrically, r(x, w) is the minimum distance in direction w from 0 ∈ Rm to F(x). Since χm2 (·) and the square function (positive domain) are strictly increasing, the result follows. Informally, we note that φ(x, w) is the probability of a failure event in the direction of w for a given x. We also observe that when Assumption 2.1(ii) is not satisfied, (4) may overestimate the failure probability. Hence, it is conservative to assume that to get an indication Assumption 2.1(ii) is satisfied. For a given x ∈ Rn , it is possible of whether Assumption 2.1(ii) holds by computing an estimate N j =1 IF (x) (uj )/N of p(x), where u1 , u2 , . . . , uN are independent, identically distributed standard normal vectors and IF (x) (uj ) = 1 if uj ∈ F(x), and zero otherwise. If this estimate is significantly smaller than the one of (4), then Assumption 2.1(ii) is not satisfied. For a given x ∈ Rn and w ∈ B, let the set of active performance functions (7) K̂(x, w) = k ∈ K | k ∈ Ĉi (x, w), i ∈ Î(x, w) , where max r (x, w) = max r (x, w) , Î(x, w) = i ∈ I | min k k (8) Ĉi (x, w) = k ∈ Ci | max rk (x, w) = rk (x, w) . (9) i ∈I k∈Ci k∈Ci k ∈Ci Furthermore, let Ak (x) be the w-directions where k is active, i.e., Ak (x) = w ∈ B | k ∈ K̂(x, w) . (10) We now show that p(·) is continuously differentiable under the following assumptions. Assumption 2.2 We assume that, for a given set S ⊂ Rn , the following hold: 6 J Optim Theory Appl (2007) 133: 1–18 (i) There exists a constant C1 < ∞ such that mini∈I maxk∈Ci rk (x, w) ≤ C1 for all x ∈ S and w ∈ B. (ii) The performance functions gk (x, u), k ∈ K, are continuously differentiable in both arguments for all x ∈ S, u ∈ Rm . (iii) There exists a constant C2 > 0 such that |∇u gk (x, rk (x, w)w)T w| ≥ C2 for all x ∈ S, w ∈ Ak (x), and k ∈ K. (iv) P [Ak (x) ∩ Al (x)] = 0 for all x ∈ S and k, l ∈ K, k = l. The particular set of performance functions gk (·, ·), k ∈ K, arising in an application may not satisfy Assumption 2.2(i). However, it is always possible to define an artificial performance function gK+1 (x, u) = ρ − u, with a sufficiently large ρ > 0, replace I by I + 1, and set CI = {K + 1}. Then, F(x) satisfies Assumption 2.2(i). This is equivalent to enlarging the failure domain. The probability associated with the enlarged failure domain is slightly larger than the one associated with the original failure domain. The difference, however, is no greater than 1 − χm2 (ρ 2 ) and therefore is negligible for sufficiently large ρ. Consequently, Assumption 2.2(i) is not restrictive in practice. Assumption 2.2(iii) can be difficult to verify. However, it is our experience that the values of performance functions arising in practice tend to change as one moves from F(x)c into the interior of the failure domain for a fixed x. If this was not the case, a perturbation of a scenario would have resulted in no change in the performance measure, which is unlikely in mechanical structures. Assumption 2.2(iv) states that only one performance function is active at each point on the boundary of the failure domain almost surely. Since the performance functions represent different performance measures in a structure, they tend to have quite different forms. If two performance functions are identical on significant subsets, then one of them is redundant and remodelling is appropriate. Lemma 2.1 Suppose that Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a compact set X0 . Then: (i) There exists a constant C < ∞ such that |φ(x, w) − φ(x , w)| ≤ Cx − x for all x, x ∈ X0 and w ∈ B. (ii) For each fixed x ∈ X0 , φ(·, w) is continuously differentiable at x for P -almost all w ∈ B. (iii) The collections {φ(x, w)}x∈X0 and {∇x φ(x, w)}x∈X0 are uniformly integrable, i.e., φ(x, w)P (dw) = 0 (11) lim sup γ →∞ x∈X 0 {w∈B||φ(x,w)|≥γ } and similarly with |φ(x, w)| replaced by ∇x φ(x, w). Proof First, consider (i). By Assumption 2.2(i), rk (x, w) < ∞ for all x ∈ X0 , w ∈ B, and k ∈ K̂(x, w). Hence, in view of Assumptions 2.2(ii, iii) and 2.1(ii), the implicit function theorem gives that rk (·, w) is continuously differentiable for all x ∈ X0 , w ∈ B, and k ∈ K̂(x, w), and T ∇x rk (x, w) = −∇x gk x, rk (x, w)w /∇u gk x, rk (x, w)w w, (12) J Optim Theory Appl (2007) 133: 1–18 7 when defined. Consequently, it can be deduced from Theorem 5.4.5 in [16] that for all w ∈ B, φ(x, w) has directional derivatives on x ∈ X0 in all directions with respect to its first argument and the directional derivative in the direction h ∈ Rn is defined by dφ(x, w; h) = max min i∈Î(x,w) k∈Ĉi (x,w) −2fχm2 rk2 (x, w) rk (x, w)∇x rk (x, w)T h, (13) where fχm2 (·) is the probability density function of the chi-square distribution. Using Assumption 2.2(iii) and the fact that rk (x, w), k ∈ K̂(x, w) are bounded on X0 for all w ∈ B, (12) is bounded for all x ∈ X0 , w ∈ B, and k ∈ K̂(x, w). Since the max-min in (13) is only over i ∈ Î(x, w) and k ∈ Ĉi (x, w), it follows from the definition of K̂(x, w) and the boundedness of (12) that (13), given an h ∈ Rn , is bounded, for all x ∈ X0 and w ∈ B. Hence, |φ(x, w) − φ(x , w)| ≤ Cx − x , for all x, x ∈ X0 and w ∈ B. Now, consider (ii). Let x ∈ X0 be arbitrary. For P -almost all w ∈ B, it follows from Assumption 2.2(iv) that the ray from 0 ∈ Rm in the direction of w does not intersect Ak (x ) ∩ Al (x ) for any k, l ∈ K, k = l. Let w ∈ B be such that the corresponding ray has this property. Consequently, the set K̂(x , w ) has cardinality one. Suppose that k = K̂(x , w ). Then, gk (x , r(x , w )w ) = 0, where r(x, w) = mini∈I maxk∈Ci rk (x, w). By Assumption 2.1(ii) and continuity, there exists a neighborhood X0 ⊂ X0 of x such that k = K̂(x, w ) and gk (x, r(x, w )w ) = 0 for all x ∈ X0 . Hence, φ(x, w ) = 1 − χm2 (rk2 (x, w )) for all x ∈ X0 and, due to the smoothness of rk (·, w ), φ(x, w ) is continuously differentiable at x with ∇x gk (x , rk (x , w )w ) . ∇x φ(x , w ) = 2fχm2 rk2 (x , w ) rk (x , w ) ∇u gk (x , rk (x , w )w )T w (14) Finally, consider (iii). Clearly, |φ(x, w)| ≤ 1 for all x ∈ X0 and w ∈ B and (11) holds. By Assumption 2.2(i–iii) and the fact that X0 is compact, it follows that for all w ∈ B, the right-hand side of (14) is uniformly bounded on X0 . Hence, {∇x φ(x, w)}x∈X0 is uniformly integrable. In view of Lemma 2.1, the next result follows directly from Proposition 2.1 in [20]. Proposition 2.2 If Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a convex and compact set X0 , then p(·) is continuously differentiable on X0 and its gradient is given by ∇p(x) = E ∇x φ(x, w) , (15) where ∇x φ(x, w) is given in (14) with k ∈ K̂(x, w). We estimate the expectations in (4) and (15) by Monte Carlo sampling. Consider an infinite sequence of sample points, each generated by independent sampling from P . Let B = B × B × · · · and let P be the probability distribution on B generated by P . 8 J Optim Theory Appl (2007) 133: 1–18 Let subelements of w ∈ B be denoted wj ∈ B, j = 1, 2, . . . , i.e., w = (w1 , w2 , . . .). For every w ∈ B, we define the estimator of (4), pN (x, w) = N φ(x, wj )/N. (16) j =1 The asymptotic property of this estimator is given by the next well-known result (see e.g. [21] for a proof which holds under weaker assumptions than those stated here.). Proposition 2.3 If Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a compact set X0 , then, for P -almost all w ∈ B, (17) lim sup pN (x, w) − p(x) = 0. N →∞ x∈X0 Since φ(·, w) is only Lipschitz continuous (Lemma 2.1(i)), we observe that pN (·, w) is generally nonsmooth. However, as the next proposition shows, pN (·, w) has directional derivatives and a nonempty subgradient2 for all w ∈ B. Proposition 2.4 Suppose that Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a compact set X0 . Then, for all w ∈ B, pN (·, w) is Lipschitz continuous on X0 and has a nonempty subgradient ∂pN (x, w) defined by ∂pN (x, w) = N ∂φ(x, wj )/N , (18) j =1 where ∇x gk (x, rk (x, wj )wj ) , ∇u gk (x, rk (x, wj )wj )T wj k∈K̂(x,wj ) (19) being the chi-square probability density function. ∂φ(x, wj ) = fχm2 conv 2fχm2 rk2 (x, wj ) rk (x, wj ) Proof It follows from Lemma 2.1(i), that for all w ∈ B, pN (·, w) is Lipschitz continuous on X0 . It can be deduced from Theorem 5.4.5 in [16] that for all w ∈ B, φ(·, w) has directional derivatives with respect to its first argument in all directions. Hence, it follows from Lemma 1 in [23] that for all w ∈ B, pN (x, w) has directional derivatives in all directions for all x ∈ X0 . Furthermore, a slight generalization of Corollary 5.4.6 in [16] yields that the directional derivative of pN (x, w) is identical to the (Clarke) generalized directional derivative of pN (x, w). Hence, ∂pN (x, w) is identical to the (Clarke) generalized gradient of pN (x, w), which is nonempty. A slight extension of Corollary 5.4.6 in [16], yields the formula above. By Lemma 2.1, the next result follows directly from Proposition 2.2 in [20]. 2 See e.g. Definition 5.1.31 in [16]. This type of subgradient is sometimes referred to as a regular subgra- dient (Definition 8.3 in [22]). J Optim Theory Appl (2007) 133: 1–18 9 Proposition 2.5 Suppose that Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a convex and compact set X0 . Then, for P -almost all w ∈ B, ∂pN (x, w) converges uniformly to ∇p(x), i.e., d − ∇p(x) = 0. (20) lim sup sup N →∞ x∈X0 d∈∂pN (x,w) Using (16), we define a sequence of approximating problems. For any w ∈ B and N ∈ N = {1, 2, . . .}, let the sample average approximating problem (PN (w)) be defined by (21) (PN (w)) minn c(x) | pN (x, w) ≤ q, x ∈ X . x∈R Intuitively, (PN (w)) becomes a better “approximation” to (P) as N increases. In fact, epi-convergence characterizes this effect more precisely, as we see in the next proposition (see e.g. Theorems 3.3.2–3.3.3 in [16] for a proof), which requires a constraint qualification: Assumption 2.3 Given w ∈ B, we assume that for every x ∈ X satisfying p(x) ≤ q, there exists a sequence {xN }∞ N =1 ⊂ X, with pN (xN , w) ≤ q, such that xN → x, as N → ∞. Proposition 2.6 Consider the sequence of approximate problems {PN (w)}∞ N =1 . Suppose that Assumptions 2.1 and 2.2 hold on a sufficiently large convex and compact subset of Rn . Then, for P -almost all w ∈ B, the following holds: (i) If Assumption 2.3 is satisfied at w ∈ B, then {PN (w)}∞ N =1 epi-converges to P. is a sequence of global min(ii) If Assumption 2.3 is satisfied at w ∈ B and {x̂N }∞ N =1 imizers of {PN (w)}∞ , then every accumulation point of {x̂N }∞ N =1 N =1 is a global minimizer of P. Before presenting an algorithm, we need to strengthen the result in Proposition 2.3. Proposition 2.7 Suppose that Assumptions 2.1 and 2.2 hold on a sufficiently large subset of Rn containing a compact set X0 . Then, for P -almost all w ∈ B, there exists a constant C < ∞ such that, for all x ∈ X0 and N ∈ N, pN (x, w) − p(x) ≤ C (log log N )/N . (22) Proof Let G be defined by G(x, w) = φ(x, w) − p(x) for all x ∈ X0 , w ∈ B. By Lemma 2.1 and (4), G is centered and G(·, w) ∈ C(X0 ), where C(X0 ) is the space of continuous functions on X0 . Furthermore, by Assumptions 2.1 and 2.2(ii, iii) and by the implicit function theorem, rk (·, ·) is continuous for all x ∈ X0 , w ∈ B, and k ∈ K̂(x, w). Hence, it can be deduced from Corollary 5.4.4 in [16] that r(·, ·) = mini∈I maxk∈Ci rk (·, ·) is continuous on X0 × B. Since X0 and B are compact, it follows that r(·, ·) is uniformly continuous on X0 × B. Let r̃ : B → C(X0 ) be defined by r̃(w) = r(·, w). Then, it follows that r̃ is continuous on B and hence measurable 10 J Optim Theory Appl (2007) 133: 1–18 with respect to the Borel sets in C(X0 ). Consequently, G is also measurable. Hence, G is a random variable with values in a separable Banach space. Define G∞ = supx∈X0 |G(x, w)|. A corollary of Theorem 8.11 in [24, p. 217], is that if G is a centered Banach space valued random variable such that E(G2∞ / log log G∞ ) < ∞ and such that the central limit theorem holds for G then the law of the iterated logarithm also holds for G. Since G is bounded, E(G2∞ / log log G∞ ) < ∞. Hence, it only remains to show that the central limit theorem holds for G. Let N (, X0 , · ) be the covering number, i.e., the minimal number of open balls B o (x, ) = {x ∈ Rn |x − x < } needed to cover X0 . Since X0 is compact, there exists a constant η < ∞ such that N (, X0 , · ) ≤ (η/)n for all > 0. Hence, the entropy integral η ∞ log N (, X0 , · )d ≤ n(log η − log )d < ∞. (23) 0 0 By Lemma 2.1(i), G has Lipschitz continuous sample paths with a square-integrable Lipschitz constant. Hence, by the Jain–Marcus theorem (see [25], Example 2.11.13), it follows that the central limit theorem holds. Consequently, the law of iterated logarithm holds for G. Hence, for P -almost all w ∈ B and all x ∈ X0 , (22) holds for some C and sufficiently large N . By increasing C, the result can be made to hold for all N . 3 Algorithm Model Before presenting an algorithm model, we present the optimality conditions for (P). Under Assumptions 2.1 and 2.2, (P) is a nonlinear program involving continuously differentiable functions, with stationary points defined by the F. John conditions. We find it convenient to express the F. John conditions (see Theorems 2.2.4 and 2.2.8 in [16]) by means of a nonpositive, continuous, optimality function θ : Rn → R, which is defined by θ (x) = − min zT b(x) + zT B(x)T B(x)z/(2δ) , (24) z∈Z with Z = {z ∈ RJ +2 | J +2 l=1 z(l) = 1, z(l) ≥ 0, l = 1, . . . , J + 2}, T b(x) = γ ψ(x)+ , ψ(x)+ − p(x) + q, ψ(x)+ − f1 (x), . . . , ψ(x)+ − fJ (x) , (25) and B(x) = (∇c(x), ∇p(x), ∇f1 (x), . . . , ∇fJ (x)), where ψ(x) = max{p(x) − q, maxj ∈J fj (x)}, ψ(x)+ = max{0, ψ(x)}, and γ , δ > 0. For any x̂ ∈ X such that p(x̂) ≤ q, the F. John conditions hold if and only if θ (x̂) = 0 (Theorem 2.2.8 in [16]). Since neither p(x) nor ∇p(x) can be evaluated exactly in finite computing time, an algorithm for (P) involving the evaluations of p(x) and ∇p(x) is conceptual. We construct an implementable algorithm by using Algorithm Model 3.3.27 in [16]. For completeness, the algorithm model is presented below. The algorithm model makes n use of an approximate algorithm map AN,w : Rn → 2R involving pN (x, w) and J Optim Theory Appl (2007) 133: 1–18 11 ∂pN (x, w). Note that the algorithm map can be set-valued. The algorithm model also uses the function FN : Rn × Rn → R in a precision-adjustment rule, where FN (x , x ) = max c(x ) − c(x ) − γ ψN (x )+ , ψN (x ) − ψN (x )+ , ψN (x) = max pN (x, w) − q, max fj (x) , j ∈J (26) (27) ψN (x)+ = max{0, ψN (x)}, and γ > 0. Algorithm Model for Solving (P) (Adapted from Algorithm Model 3.3.27, [16]) Parameters τ ∈ (0, 1), η > 0. Data x0 ∈ Rn , an unbounded set N of positive integers, and a collection w = (w1 , w2 , . . .) ∈ B of independent sample points from P . Step 0 Set i = 0 and N = min N . Step 1 Compute y ∈ AN,w (xi ). Step 2 If τ (28) FN (xi , y) ≤ −η (log log N )/N , then set xi+1 = y, Ni = N , replace i by i + 1, and go to Step 1. Else, replace N by min{N ∈ N |N > N}, and go to Step 1. Note that the algorithm model uses a precision-adjustment rule (28) to ensure that the error in function evaluations is sufficiently small in comparison to the algorithmic progress. Given τ and η as well as a set of sample sizes N , the rule determines how fast the sample size is increased. Empirical evidence from the areas of optimal control and semi-infinite optimization indicates that such a feedback rule is computationally more efficient than using a predetermined schedule for increasing the sample size. To ensure convergence of the algorithm model, we adopt the following assumption regarding the algorithm map in Step 1. For brevity, for any x ∈ Rn and ρ > 0, let B(x, ρ) = {x ∈ Rn |x − x ≤ ρ}. Assumption 3.1 Given S ⊂ Rn , we assume that the algorithm map AN,w : Rn → 2R satisfies the following property for P -almost all w ∈ B: For every x ∈ S with θ (x) < 0, there exist δx > 0, Nx ∈ N, and ρx > 0 such that FN (x , y) ≤ −δx for all N ≥ Nx , x ∈ B(x, ρx ), and y ∈ AN,w (x ). n In the next section, we present one particular algorithm map that satisfies Assumption 3.1. The convergence of the algorithm model is given by the next theorem, which can be proven using the same arguments as in the proof of Theorem 3.3.29 in [16]. Theorem 3.1 Suppose that Assumptions 2.1, 2.2, and 3.1 hold on a sufficiently large subset of Rn and that the algorithm model for solving (P) has constructed a bounded ∞ sequence {xi }∞ i=0 . If x̂ is an accumulation point of {xi }i=0 , then x̂ is a stationary point for (P), i.e., θ (x̂) = 0, P -almost surely. 12 J Optim Theory Appl (2007) 133: 1–18 4 Implementation Consider the following set-valued algorithm map, which is a generalization of the Polak–He algorithm, see Sect. 2.6 in [16]. For any xi ∈ Rn , N ∈ N, and w ∈ B, we define (29) AN,w (xi ) = xi + λN (xi , d)hN (xi , d) | d ∈ ∂pN (xi , w) , where the Armijo stepsize is given by k λN (xi , d) = max β | FN (xi , xi + β k hN (xi , d)) ≤ β k αθN (xi , d) , k∈{0,1,2,...,} (30) with FN (·, ·) as in (26), θN (x, d) = − min zT bN (x) + zT BN (x, d)T BN (x, d)z/(2δ) , z∈Z where δ > 0, Z is defined as in (24), bN (x) = γ ψN (x)+ , ψN (x)+ − pN (x, w) + q, ψN (x)+ − f1 (x), . . . , T ψN (x)+ − fJ (x) , BN (x, d) = ∇c(x), d, ∇f1 (x), . . . , ∇fJ (x) , (31) (32) (33) α ∈ (0, 1], and β ∈ (0, 1). Finally, the search direction hN (xi , d) = −BN (xi , d)ẑ/δ, (34) where ẑ is any solution of (31). The parameter γ in (26) and (31) should be set equal to γ in FN (·, ·) in the algorithm model. The problem in (31) is quadratic and can be solved in a finite number of iterations by a standard QP-solver (e.g. Quadprog [26]). Our next result shows that (29) satisfies Assumption 3.1. Hence, this algorithm map combined with the algorithm model result in a convergent implementable algorithm. Proposition 4.1 Suppose that Assumptions 2.1 and 2.2 hold on an open set S ⊂ Rn . For any N ∈ N and w ∈ B, let the algorithm map AN,w (·) be defined by (29), with the same values of the parameters α, β, δ, and γ for all N ∈ N. Then, AN,w (·) satisfies Assumption 3.1 on any convex and compact subset of S for P -almost all w ∈ B. Proof Let X0 ⊂ S be convex and compact. For P -almost all w ∈ B, the search direction hN (x, d) is bounded for all x ∈ X0 and d ∈ ∂pN (x, w) because it is defined as a linear combination of bounded vector-valued functions (see (34), (33), Proposition 2.4, and Assumption 2.2). For P -almost all w ∈ B, the bound is independent of N due to Proposition 2.5. Since S is open, there exists a λ1 ∈ (0, 1] such that x + λhN (x, d) ∈ S for P -almost all w ∈ B and for all x ∈ X0 , λ ∈ (0, λ1 ], N ∈ N, and d ∈ ∂pN (x, w). It is seen from Proposition 2.4 and its proof that for all w ∈ B and x ∈ S, pN (x, w) is Lipschitz continuous, with a directional derivative equal to the (Clarke) generalized directional derivative. Hence, the Lebourg mean-value theorem (see e.g. Theorem 5.4.13b in [16]) is applicable. By the Lebourg mean-value theorem and the fact J Optim Theory Appl (2007) 133: 1–18 13 that λ1 ≤ 1, we obtain that for P -almost all w ∈ B and for all x ∈ X0 , λ ∈ (0, λ1 ], d ∈ ∂pN (x, w), and N ∈ N, there exist some s, sj ∈ [0, 1], j = 0, 1, 2, . . . , J, and d ∈ ∂pN (x + sλhN (x, d), w) such that FN x, x + λhN (x, d) ≤ λ max −γ ψN (x)+ + ∇c(x)T hN (x, d) T + ∇c x + s0 λhN (x, d) − ∇c(x) hN (x, d), pN (x, w) − q − ψN (x)+ + d T hN (x, d) + (d − d)T hN (x, d), max fj (x) − ψN (x)+ + ∇fj (x)T hN (x, d) j ∈J T + ∇fj x + sj λhN (x, d) − ∇fj (x) hN (x, d) . (35) For any > 0, it follows from Proposition 2.5 that for P -almost all w ∈ B there exists a N ∈ N such that, for all N ≥ N d − ∇p(x) ≤ /3. (36) sup sup x∈S d∈∂pN (x,w) Furthermore, ∇c(·), ∇fj (·), j ∈ J, and ∇p(·) are uniformly continuous on compact sets and hN (x, d) is bounded almost surely. Hence, for P -almost all w ∈ B there exists a λ ≤ λ1 such that for all λ ∈ (0, λ ], x ∈ X0 , d ∈ ∂pN (x, w), and N ∈ N we have ∇p x + sλhN (x, d) − ∇p(x) ≤ /3, (37) ∇c x + s0 λhN (x, d) − ∇c(x) ≤ , (38) ∇fj x + sj λhN (x, d) − ∇fj (x) ≤ , j ∈ J. (39) Hence, using (36) and (37), we obtain that d − d ≤ . From (35), we then obtain that for P -almost all w ∈ B and for all x ∈ X0 , λ ∈ (0, λ ], d ∈ ∂pN (x, w), and N ≥ N , FN x, x + λhN (x, d) ≤ λ max −γ ψN (x)+ + ∇c(x)T hN (x, d), pN (x, w) − q − ψN (x)+ + d T hN (x, d), max fj (x) − ψN (x)+ + ∇fj (x)T hN (x, d) + λ hN (x, d). (40) j ∈J We can deduce from Theorem 2.2.8 in [16] that θN (x, d) = max −γ ψN (x)+ + ∇c(x)T hN (x, d), pN (x, w) − q − ψN (x)+ + d T hN (x, d), 14 J Optim Theory Appl (2007) 133: 1–18 max fj (x) − ψN (x)+ + ∇fj (x)T hN (x, d) j ∈J 2 + δ hN (x, d) /2. (41) Hence, by adding and subtracting λδhN (x, d)2 /2 from (40), we obtain FN x, x + λhN (x, d) ≤ λθN (x, d) + λ − δ hN (x, d)/2 hN (x, d) (42) for P -almost all w ∈ B and for all x ∈ X0 , λ ∈ (0, λ ], d ∈ ∂pN (x, w), and N ≥ N . Now, suppose that x ∗ ∈ X0 is such that θ (x ∗ ) < 0. Without loss of generality, we assume that x ∗ is in the interior of X0 . Let h(x) be given by h(x) = −B(x)T ẑ/δ, (43) where B(x) is defined as in (24), δ > 0 is as in (24), and ẑ is any solution of (24). Since h(x ∗ ) = 0 implies θ (x ∗ ) = 0, h(x ∗ ) = 0. Then, by continuity of θ (·) and h(·) (see Theorem 2.2.8 in [16]), there exist δ1 > 0 and ρx ∗ > 0 such that θ (x) ≤ −δ1 and h(x) ≥ δ1 for all x ∈ B(x ∗ , ρx ∗ ). Set ∗ = δδ1 /4. By Proposition 7.1 (see Appendix), for P -almost all w ∈ B there exists an Nx ∗ ≥ N ∗ such that θN (x, d) ≤ −δ1 /2 and hN (x, d) ≥ δ1 /2 for all x ∈ B(x ∗ , ρx ∗ ) and d ∈ ∂pN (x, w). Since ∗ − δhN (x, d)/2) ≤ 0 for P -almost all w ∈ B and for all x ∈ B(x ∗ , ρx ∗ ) and d ∈ ∂pN (x, w), it now follows from (42) that FN (x, x + λhN (x, d)) ≤ λθN (x, d) ≤ 0 for P -almost all w ∈ B and for all x ∈ B(x ∗ , ρx ∗ ), λ ∈ (0, λ ∗ ], d ∈ ∂pN (x, w), and N ≥ Nx ∗ . Hence, for the algorithm parameter α ∈ (0, 1], FN x, x + λhN (x, d) − λαθN (x, d) ≤ λ(1 − α)θN (x, d) ≤ 0 (44) for P -almost all w ∈ B and for all x ∈ B(x ∗ , ρx ∗ ), λ ∈ (0, λ ∗ ], d ∈ ∂pN (x, w), and N ≥ Nx ∗ . Consequently, for any x ∈ B(x ∗ , ρx ∗ ) and N ≥ Nx ∗ , the algorithm map AN,w (·) has stepsize λN (x, d) ≥ βλ ∗ for any d ∈ ∂pN (x, w) and for P -almost all w ∈ B. Hence, for any x ∈ B(x ∗ , ρx ∗ ), d ∈ ∂pN (x, w), and N ≥ Nx ∗ , FN (x, y) ≤ αλN (x, d)θN (x, d) ≤ αβλ ∗ θN (x, d) ≤ −αβλ ∗ δ1 /2, for all y ∈ AN,w (x ) and P -almost all w ∈ B. This completes the proof. (45) Usually, the one-dimensional root finding problems in the evaluation of rk (x, w), k ∈ K̂(x, w), cannot be solved exactly in finite computing time. One possibility is to introduce a precision parameter that ensures a gradually better accuracy in the root finding as the algorithm progresses. Alternatively, we can prescribe a rule saying that the root finding algorithm should terminate after CNi iterations, with C being some constant. For simplicity, we have not discussed the issue of root finding. In fact, this issue is not problematic in practice. The root finding problems can be solved in a few iterations with high accuracy using standard algorithms. Hence, the root finding problems are solved with a fixed precision for all iterations in the algorithm giving a negligible error. J Optim Theory Appl (2007) 133: 1–18 15 5 Numerical Example We consider the same highway bridge as in [12]. The objective is to design a reinforced concrete girder for minimum cost using nine design variables (n = 9). Uncertainty is modeled using eight random variables (m = 8). We assume that the girder can fail in four different modes. Failure occurs if any of the four failure modes occur. This gives rise to four performance functions. To ensure that F(x)c is bounded (see Assumption 2.2(i)), we define an artificial performance function g5 (x, u) = 8 − u. Note that this implies an enlargement of the failure domain, but the increase in the failure probability is less than 10−10 . This leads to five performance functions and Ci = {i}, i ∈ I = {1, 2, . . . , 5} (see (2)). We impose the constraint p(x) ≤ q = 0.001350, as well as 23 other deterministic, nonlinear constraints. For details about the example, we refer to [12]. It should be noted that the performance functions are nonlinear and sufficiently differentiable. We are unable to verify analytically that the performance functions satisfy Assumption 2.1(ii). However, we estimate the failure probability for the first and last iterations using the estimator N j =1 IF (x) (uj )/N of p(x), where u1 , u2 , . . . , uN are independent, identically distributed standard normal vectors and IF (x) (uj ) = 1 if uj ∈ F(x), and zero otherwise, and find it not significantly different from the estimates obtained using (16). Furthermore, we do not experience numerically difficulties, which could have been expected in an example not satisfying Assumption 2.1(ii). Hence, it is reasonable to believe that Assumption 2.1(ii) is satisfied over a sufficiently large subset. The resulting instance of (P) is implemented in Matlab 6.5 [26] and solved using our algorithm model with the algorithm map defined in (29). The evaluation of rk (x, w) is performed using the Matlab root-finder Fzero, with tolerance 1 × 10−5 , and (31) is solved using Matlab’s Quadprog. The parameters are τ = 0.9999, η = 0.002, γ = 2, α = 0.5, β = 0.8, and δ = 1. Furthermore, N = {200, 1600, 5400, 12800, 25000, . . .}. The computations are terminated when the algorithm model reaches a sample size greater than 25000. After an application of our new algorithm, we obtain an optimized structure with cost 13.288. In comparison, the design obtained in [12] has a somewhat larger cost of 13.664. In this example, a less reliable structure is also cheaper. As expected, when our algorithm was terminated the constraint p(x) ≤ 0.001350 was (approximately) active: estimated failure probability is 0.001350 with coefficient of variation 0.02. An examination of the design from [12] shows that its failure probability is 0.001310 with coefficient of variation 0.01. Hence, a 95% confidence interval of the failure probability is (0.001284, 0.001336) which is outside the constraint limit 0.001350. From this analysis we conclude that the algorithm in [12] may give excessively safe designs. The algorithm in [12] is based on the heuristic updating of first-order approximations of the failure probability and is not expected to lead to the same accuracy level as our new algorithm. However, the algorithm in [12] appears to involve fewer evaluations of the performance functions and their gradients. Note also that the algorithm in [12] is limited to the special case of Ci having only one element for all i ∈ I. 16 J Optim Theory Appl (2007) 133: 1–18 6 Conclusions We construct an implementable algorithm for nonlinear stochastic programming problems with system failure probability constraints. First, we generalize an expression for the failure probability and show that it is continuously differentiable. Second, we prove a uniform strong law of large numbers for the estimators of the failure probability and its gradient. We also establish a uniform law of the iterated logarithm for the estimator of the failure probability. Third, we construct an algorithm map that satisfies the assumptions of an algorithm model. Preliminary numerical testing on a realistic design problem demonstrates the potential for sampling-based optimization algorithms in structural engineering. In particular, the high accuracy of such algorithms compared to frequently used heuristics is promising. Appendix Proposition 7.1 Suppose that Assumptions 2.1 and 2.2 hold on a convex and compact set X0 ⊂ Rn . Then, for P -almost all w ∈ B, θN (x, d) − θ (x) = 0, (46) sup lim sup N →∞ x∈X0 d∈∂pN (x,w) lim sup sup N →∞ x∈X0 d∈∂pN (x,w) hN (x, d) − h(x) = 0, (47) where θ (·), θN (·, ·), h(·), and hN (·, ·) are defined in (24), (31), (43), and (34), respectively. Proof Let > 0 be arbitrary. We deduce from Propositions 2.3 and 2.5 that for P -almost all w ∈ B there exists a N ∈ N such that, for all N ≥ N and h ∈ Rn (48) sup ψ(x)+ − ψN (x)+ ≤ , x∈X0 sup sup T d h − ∇p(x)T h ≤ h. (49) x∈X0 d∈∂pN (x,w) Consequently, for P -almost all w ∈ B and all N ≥ N and h ∈ Rn sup sup pN (x, w) − q − ψN (x)+ + d T h x∈X0 d∈∂pN (x,w) − (p(x) − q − ψ(x)+ + ∇p(x)T h) ≤ 2 + h. (50) Let ψ̃N (x, x + h, d) = max −γ ψN (x)+ + ∇c(x)T h, pN (x, w) − q − ψN (x)+ + d T h, max{fj (x) − ψN (x)+ + ∇fj (x)T h} + δh2 /2, (51) j ∈J ψ̃(x, x + h) = max −γ ψ(x)+ + ∇c(x)T h, p(x) − q − ψ(x)+ + ∇p(x)T h, max{fj (x) − ψ(x)+ + ∇fj (x)T h} + δh2 /2. j ∈J (52) J Optim Theory Appl (2007) 133: 1–18 17 Using (48), (50), and the fact that c(·) and ∇c(·) are bounded functions on X0 , we find that for P -almost all w ∈ B and all N ≥ N and h ∈ Rn , ψ̃N (x, x + h, d) − ψ̃(x, x + h) ≤ max γ , 2 + h . (53) sup sup x∈X0 d∈∂pN (x,w) Next, h(x) is bounded for all x ∈ X0 because it is defined as a linear combination of bounded vector-valued functions. Using the same argument and Proposition 2.5 we have that for P -almost all w ∈ B, hN (x, d) is bounded for all x ∈ X0 , d ∈ ∂pN (x, w), and N ∈ N. Hence, for P -almost all w ∈ B there exists a C ∗ < ∞ such that h(x) ≤ C ∗ and hN (x, d) ≤ C ∗ for all x ∈ X0 , d ∈ ∂pN (x, w), and N ∈ N. From Theorem 2.2.8 of [16] we deduce that θN (x, d) = ψ̃N x, x + hN (x, d), d = minn ψ̃N (x, x + h, d), (54) h∈R θ (x) = ψ̃ x, x + h(x) = minn ψ̃(x, x + h). h∈R (55) Let ∗ = max{γ , 2 + C ∗ }. We now have that for P -almost all w ∈ B and all x ∈ X0 , d ∈ ∂pN (x, w), and N ≥ N ∗ , θ (x) ≤ ψ̃ x, x + hN (x, d) ≤ ψ̃N x, x + hN (x, d) + ∗ ≤ θN (x) + ∗ , (56) (57) θ (x) = ψ̃ x, x + h(x) ≥ ψ̃N x, x + h(x) − ∗ ≥ θN (x) − ∗ . Hence, (46) holds. We now address (47). For the sake of a contradiction, suppose that (47) is not valid. Then, there exists a subset B0 ⊂ B with P [B0 ] > 0 such that for ∞ ∞ every w ∈ B0 there exist > 0, {Ni }∞ i=1 , Ni → ∞, as i → ∞, {xi }i=1 ⊂ X0 , {di }i=1 , di ∈ ∂pNi (xi , w), with the property that hN (xi , di ) − h(xi ) ≥ (58) i for all i ∈ N. As stated above, for P -almost all w ∈ B, hN (x, d) is bounded for all x ∈ X0 , d ∈ ∂pN (x, w), and N ∈ N. Consider an w ∈ B0 with this boundedness property. Then, there exist an infinite subset I0 ⊂ N, an x ∗ ∈ Rn and an h∗ ∈ Rn such that xi → x ∗ , hNi (xi , di ) → h∗ , as i → ∞, i ∈ I0 . The continuity of ψ̃(·, ·) and (53) imply that (59) lim ψ̃Ni xi , xi + hNi (xi , di ), di − ψ̃(x ∗ , x ∗ + h∗ ) = 0. i→∞,i∈I0 Hence, it follows Theorem 3.3.2 in [16] that the problems min ψ̃Ni (xi , xi + h, di ) (60) min ψ̃(x ∗ , x ∗ + h), (61) h∈Rn epi-converge to the problem h∈Rn as i → ∞, i ∈ I0 . Since {hNi (xi , di )}∞ i=1 is a sequence of global minimizers of (60), it follows from Theorem 3.3.3 in [16] that h∗ must be a global minimizer of (61). Since the problem in (61) is strictly convex, it has a unique global minimizer. Hence, h∗ = h(x ∗ ), which contradicts (58). This completes the proof. 18 J Optim Theory Appl (2007) 133: 1–18 References 1. Royset, J.O., Polak, E.: Implementable algorithm for stochastic programs using sample average approximations. J. Optim. Theory Appl. 122, 157–184 (2004) 2. Ditlevsen, O., Madsen, H.O.: Structural Reliability Methods. Wiley, New York (1996) 3. Liu, P.-L., Kuo, C.-Y.: Safety evaluation of the upper structure of bridge based on concrete nondestructive tests. In: Der Kiureghian, A., Madanat, S., Pestana, J.M. (eds.) Applications of Statistics and Probability in Civil Engineering, pp. 1683–1688. Millpress, Rotterdam (2003) 4. Akgul, F., Frangopol, D.M.: Probabilistic analysis of bridge networks based on system reliability and Monte Carlo simulation. In: Der Kiureghian, A., Madanat, S., Pestana, J.M. (eds.) Applications of Statistics and Probability in Civil Engineering, pp. 1633–1637. Millpress, Rotterdam (2003) 5. Holicky, M., Markova, J.: Reliability analysis of impacts due to road vehicles. In: Der Kiureghian, A., Madanat, S., Pestana, J.M. (eds.) Applications of Statistics and Probability in Civil Engineering, pp. 1645–1650. Millpress, Rotterdam (2003) 6. Ermoliev, Y.: Stochastic quasigradient methods. In: Ermoliev, Y., Wets, R.J.-B. (eds.) Numerical Techniques for Stochastic Optimization. Springer, New York (1988) 7. Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, New York (1990) 8. Marti, K.: Stochastic Optimization Methods. Springer, Berlin (2005) 9. Ruszczynski, A., Shapiro, A.: Stochastic Programming. Elsevier, New York (2003) 10. Shapiro, A., Wardi, Y.: Convergence analysis of stochastic algorithms. Math. Oper. Res. 21, 615–628 (1996) 11. Sakalauskas, L.L.: Nonlinear stochastic programming by Monte-Carlo estimators. Eur. J. Oper. Res. 137, 558–573 (2002) 12. Royset, J.O., Der Kiureghian, A., Polak, E.: Optimal design with probabilistic objective and constraints. J. Eng. Mech. 132, 107–118 (2006) 13. Uryasev, S.: Derivatives of probability functions and some applications. Ann. Oper. Res. 56, 287–311 (1995) 14. Marti, K.: Differentiation formulas for probability functions: the transformation method. Math. Program. 75, 201–220 (1996) 15. Tretiakov, G.: Stochastic quasi-gradient algorithms for maximization of the probability function. A new formula for the gradient of the probability function. In: Marti, K. (ed.) Stochastic Optimization Techniques, pp. 117–142. Springer, New York (2002) 16. Polak, E.: Optimization Algorithms and Consistent Approximations. Springer, New York (1997) 17. Deak, I.: Three digit accurate multiple normal probabilities. Numer. Math. 35, 369–380 (1980) 18. Ditlevsen, O., Oleson, R., Mohr, G.: Solution of a class of load combination problems by directional simulation. Struct. Saf. 4, 95–109 (1987) 19. Bjerager, P.: Probability integration by directional simulation. J. Eng. Mech. 114, 1288–1302 (1988) 20. Shapiro, A.: Asymptotic properties of statistical estimators in stochastic programming. Ann. Stat. 17, 841–858 (1989) 21. Rubinstein, R.Y., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, New York (1993) 22. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, New York (1998) 23. Polak, E., Qi, L., Sun, D.: First-order algorithms for generalized semi-infinite min-max problems. Comput. Optim. Appl. 13, 137–161 (1999) 24. Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Springer, New York (1991) 25. van der Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, New York (1996) 26. Matlab, Version 6.5, Mathworks Inc., Natick, MA (2002)