Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Engr/Math/Physics 25 Chp7 Statistics-1 Bruce Mayer, PE Licensed Electrical & Mechanical Engineer [email protected] Engineering/Math/Physics 25: Computational Methods 1 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Learning Goals Use MATLAB to solve Problems in • Statistics • Probability Use Monte Carlo (random) Methods to Simulate Random processes Properly Apply InterPolation or ExtraPolation to Estimate values between or outside of know data points Engineering/Math/Physics 25: Computational Methods 2 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Histogram Histograms are COLUMN Plots that show the Distribution of Data • Height Represents Data Frequency Some General Characteristics • Used to represent continuous grouped, or BINNED, data – BIN SubRange within the Data Engineering/Math/Physics 25: Computational Methods 3 • Usually Does not have any gaps between bars • Areas represent %-of-Total Data Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx HistoGram ≡ Frequency Chart A HistoGram shows how OFTEN some event Occurs • Histograms are often constructed using Frequency Tables Engineering/Math/Physics 25: Computational Methods 4 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Histograms In MATLAB MATLAB has 6 Forms of the Histogram Cmd The Simplest hist(y) • Generates a Histogram with 10 bins Example: HI Temps at Oakland AirPort in Jul-Aug08 Engineering/Math/Physics 25: Computational Methods 5 TmaxOAK 65, 66, 73, 79, 70, 74, 77, 86, 66, 72, 82, 76, 68, 65, 70, 68, 69, 67] = [70, 75, 63, 64, 65, 65, 67, 78, 75, 71, 72, 67, 69, 69, 71, 72, 71, 74, 77, 90, 90, 70, 71, 66, 68, 73, 72, 82, 91, 75, 72, 72, 69, 70, 67, 65, 63, 64, 72, 71, 77, 65, 63, 69, The Plot Statement hist(TmaxOAK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland Airport - Jul-Aug08') Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx hist Result for Oakland Oakland Airport - Jul-Aug08 15 It was COLD in Summer 08 10 No. Days Bin Width = (91-63)/10 = 2.8 °F 5 0 60 65 70 75 80 85 90 95 Max. Temp (°F) Engineering/Math/Physics 25: Computational Methods 6 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Histograms In MATLAB Next Example: Max Temp at Stockton AirPort in Jul-Aug08 hist(y) • Generates a Histogram with 10 bins TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97, 96, 97, 101, 92, 89, 92, 93, 94] The Plot Statement hist(TmaxSTK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title(‘Stockton Airport - Jul-Aug08') Engineering/Math/Physics 25: Computational Methods 7 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx hist Result for Stockton Stockton Airport - Jul-Aug08 16 It was HOT in Summer 08 14 12 No. Days 10 Bin Width = (107-81)/10 = 2.6 °F 8 6 4 2 0 80 85 90 95 100 105 110 Max. Temp (°F) Engineering/Math/Physics 25: Computational Methods 8 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx hist Command Refinements Adjust The number Consider Summer and width of the bins 08 HI-Temp Data using from Oakland and hist(y,N) Stockton hist(y,x) • Where Make 2 Histograms – N an integer specifying the NUMBER of Bins – x A vector that Specifies Bin CENTERs Engineering/Math/Physics 25: Computational Methods 9 • 17 bins • 60F→110F by 2.5’s Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx hist Plots 17 Bins >> hist(TmaxSTK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - JulAug08')>> hist(TmaxOAK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - JulAug08') Oakland, CA - Jul-Aug08 10 9 9 8 8 7 7 6 6 No. Days No. Days Stockton, CA - Jul-Aug08 10 5 5 4 4 3 3 2 2 1 1 0 80 85 90 95 Max. Temp (°F) 100 105 Engineering/Math/Physics 25: Computational Methods 10 110 0 60 65 70 75 80 Max. Temp (°F) 85 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 90 95 hist Plots Same Scale >> x = [60:2.5:110]; >> hist(TmaxSTK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - JulAug08') >> x = [60:2.5:110]; hist(TmaxOAK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - JulAug08') Oakland, CA - Jul-Aug08 16 14 14 12 12 10 10 No. Days No. Days Stockton, CA - Jul-Aug08 16 8 8 6 6 4 4 2 2 0 60 65 70 75 80 85 Max. Temp (°F) 90 95 100 105 Engineering/Math/Physics 25: Computational Methods 11 110 0 60 65 70 75 80 85 Max. Temp (°F) 90 95 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 100 105 110 hist Numerical Output Hist can also provide numerical Data about the Histogram n = hist(y) • Gives the number of values in each of the (default) 10 Bins For the Stockton data Engineering/Math/Physics 25: Computational Methods 12 k = 2 7 5 9 1 2 10 7 16 3 We can also spec the number and/or Width of Bins >> k13 = hist(TmaxSTK,13) k13 = 2 2 4 4 6 10 10 7 5 2 6 2 2 >> k2_5s = hist(TmaxOAK,x) Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx hist Numerical Output Bin-Count and Bin-Locations (Frequency Table) for the Oakland Data >> [u, v] = hist(TmaxOAK,x) u = 0 3 11 7 15 9 6 4 1 2 1 0 3 0 0 0 0 0 0 0 0 v = 60.0000 62.5000 65.0000 72.5000 75.0000 77.5000 85.0000 87.5000 90.0000 97.5000 100.0000 102.5000 110.0000 Engineering/Math/Physics 25: Computational Methods 13 67.5000 80.0000 92.5000 105.0000 70.0000 82.5000 95.0000 107.5000 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Histogram Commands - 1 Command bar(x,y) Description Creates a bar chart of y versus x. hist(y) Aggregates the data in the vector y into 10 bins evenly spaced between the minimum and maximum values in y. hist(y,n) Aggregates the data in the vector y into n bins evenly spaced between the minimum and maximum values in y. hist(y,x) Aggregates the data in the vector y into bins whose center locations are specified by the vector x. The bin widths are the distances between the centers. Engineering/Math/Physics 25: Computational Methods 14 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Histogram Commands - 2 Command [z,x] = hist(y) Description Same as hist(y) but returns two vectors z and x that contain the frequency count and the 10 bin locations. Same as hist(y,n) but returns two [z,x] = hist(y,n) vectors z and x that contain the frequency cnt and the n bin locations. Same as hist(y,x) but returns two vectors z and x that contain the [z,x] = hist(y,x) frequency count and the bin locations. The returned vector x is the same as the user-supplied vector x. Engineering/Math/Physics 25: Computational Methods 15 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Bar vs. Hist Bar is Sequential while HIST is GROUPED Tmax in Stockton, CA • Jul-Aug08 110 Stockton Airport - Jul-Aug08 16 105 14 12 100 No. Days MaxTemp (°F) 10 95 8 6 90 4 2 85 0 80 85 90 95 100 Max. Temp (°F) 80 10 20 30 day 40 50 BAR Engineering/Math/Physics 25: Computational Methods 16 60 HIST Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 105 110 BAR construction file % Bruce Mayer, PE • 06Apr16 % ENGR25 clear, close, clc % The data TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97, 96, 97, 101, 92, 89, 92, 93, 94] % % the BAR graph bar(TmaxSTK), axis([ 1 62 80 110]), grid xlabel('day'); ylabel('MaxTemp (°F)') title('Tmax in Stockton, CA • Jul-Aug08') whitebg([0.8 1 1]) Engineering/Math/Physics 25: Computational Methods 17 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Check Default Bin Widths • The previous HandCalc of 2.8 °F CONFIRMED by MATLAB Oakland >> Tlo = min(TmaxOAK) Tlo = 63 >> Thi = max(TmaxOAK) Thi = 91 >> [n,BinCtr] = hist(TmaxOAK) n = 11 10 15 11 7 – Note use of diff command 2 2 BinCtr = 64.4000 67.2000 70.0000 72.8000 81.2000 84.0000 86.8000 89.6000 >> DelBC = diff(BinCtr) DelBC = 2.8000 2.8000 2.8000 2.8000 2.8000 2.8000 Engineering/Math/Physics 25: Computational Methods 18 2.8000 0 1 3 75.6000 78.4000 2.8000 2.8000 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Check Default Bin Widths • The previous HandCalc of 2.6 °F CONFIRMED by MATLAB Stockton >> Tlo = min(TmaxSTK) Tlo = 81 >> Thi = max(TmaxSTK) Thi = 107 >> [n,BinCtr] = hist(TmaxSTK) n = 2 5 1 10 16 – Note use of diff command 7 9 BinCtr = 82.3000 84.9000 87.5000 90.1000 97.9000 100.5000 103.1000 105.7000 >> DelBC = diff(BinCtr) DelBC = 2.6000 2.6000 2.6000 2.6000 2.6000 2.6000 Engineering/Math/Physics 25: Computational Methods 19 2.6000 2 7 3 92.7000 95.3000 2.6000 2.6000 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Data Statistics Tool - 1 Make LinePlot of Temp Data for Stockton, CA Use the Tools Menu to find the Data Statistics Tool Engineering/Math/Physics 25: Computational Methods 20 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Data Statistics Tool - 2 Use the Tool to Add Plot Lines for the Temp Data • The Mean • ±StdDev Engineering/Math/Physics 25: Computational Methods 21 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Data Statistics Tool - 3 Quite a Nice Tool, Actually The Result The Avg Max Temp Was 96.97 °F Engineering/Math/Physics 25: Computational Methods 22 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Probability Probability The LIKELYHOOD that a Specified OutCome Will be Realized • The “Odds” Run from 0% to 100% Class Question: What are the Odds of winning the California MEGA-MILLIONS Lottery? Engineering/Math/Physics 25: Computational Methods 23 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 258 890 085 ... EXACTLY???!!! To Win the MegaMillions Lottery • Pick five numbers from 1 to 75 • Pick a MEGA number from 1 to 15 The Odds for the 1st ping-pong Ball = 5 out of 75 The Odds for the 2nd ping-pong Ball = 4 out of 75, and so On The Odds for the MEGA are 1 out of 15 Engineering/Math/Physics 25: Computational Methods 24 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 258 890 085... Calculated Calc the OverAll Odds as the PRODUCT of each of the Individual OutComes 5 4 3 2 1 1 5!70! 1 Odds 75! 15 75 74 73 72 71 15 120 1 31,066,902,000 258,890,085 • This is Technically a COMBINATION Engineering/Math/Physics 25: Computational Methods 25 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 258 890 085... is a DEAL! The ORDER in Which the Ping-Pong Balls are Drawn Does NOT affect the Winning Odds If we Had to Match the Pull-Order: 1 1 1 1 1 1 70! Odds 75 74 73 72 71 15 15 71! 1 120X the Current 31,066,902,000 • This is a PERMUTATION Engineering/Math/Physics 25: Computational Methods 26 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Normal Distribution - 1 Consider Data (Freq Tab) on the Height of a sample group of 20 year old Men Plot this Frequency Data using bar Height of 20 Yr-Old Men 12 10 No. 8 6 4 2 0 62 64 66 68 70 Height (Inches) 72 74 Engineering/Math/Physics 25: Computational Methods 27 76 >> y_abs=[1,0,0,0,2,4,5, 4,8,11,12,10,9,8,7,5, 4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> bar(xbins, y_abs), ylabel('No.'), xlabel('Height (Inches'), title('Height of 20 Yr-Old Men') Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Ht (in) 64 64.5 65 65.5 66 66.5 67 67.5 68 68.5 69 69.5 70 70.5 71 71.5 72 72.5 73 73.5 74 74.5 75 No. 1 0 0 0 2 4 5 4 8 11 12 10 9 8 7 5 4 4 3 1 1 0 1 Normal Distribution - 2 We can also SCALE the Bar/Hist such that the AREA UNDER the CURVE equals 1.00, exactly The Game Plan for Scaling • Calc the Height of Each Bar To Get the Total Area = Σ([Bin Width] x [individual counts]) 𝑨 = ∆𝑨 = 𝑩𝑾𝒌 × 𝑰𝑪𝒌 = • The individual Bar Area = [Bin Width] x [individual count] • %-Area any one bar → [Bar Area]/[Total Area] Engineering/Math/Physics 25: Computational Methods 28 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Normal Distribution - 3 Use bar to construct the Scaled Histogram with Area of 1.0000 • See File Scaled_Histogram_1206.m Would need to enter all 100 raw data pts to use hist 0.1 0.08 Frequency – Again, use bar to construct histrogram Height of 20 Yr-Old Men) 0.12 0.06 0.04 0.02 0 62 Engineering/Math/Physics 25: Computational Methods 29 64 66 68 70 Height (Inches) 72 74 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 76 Probability Distribution Fcn (PDF) Because the Area Under the Scaled Plot is 1.00, exactly, The FRACTIONAL Area under any bar, or set-of-bars gives the probability that any randomly Selected 20 yr-old man will be that height Engineering/Math/Physics 25: Computational Methods 30 e.g., from the Plot we Find • 67.5 in → 4% • 68 in → 8% • 68.5 in → 11% Summing → 23 % Thus by this dataset 23% of 20 yr-old men are 67.2568.75 inches tall Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Random Variable A random variable x takes on a defined set of values with different probabilities; e.g.. • If you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with equal probability of one-sixth. • If you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 101” is a also a random variable – the %-age will be slightly differently every time you poll. Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view) Engineering/Math/Physics 25: Computational Methods 31 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Random variables can be Discrete or Continuous Discrete random variables have a countable number of outcomes • Examples: Dead/Alive, Red/Black, Heads/Tales, dice, deck of cards, etc. Continuous random variables have an infinite continuum of possible values. • Examples: Battery Current, human weight, Air Temperature, the speed of a car, the real numbers from 7 to 11. Engineering/Math/Physics 25: Computational Methods 32 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Probability Distribution Functions A Probability Distribution Function (PDF) maps the possible values of x against their respective probabilities of occurrence, p(x) p(x) is a number from 0 to 1.0, or alternatively, from 0% to 100%. The area under a probability distribution function curve is always 1 (or 100%). Engineering/Math/Physics 25: Computational Methods 33 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Discrete Example: Roll The Die x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 px 1/6 1 2 3 4 5 1 1 1 1 1 1 px 6 6 6 6 6 6 all x 1 or p x 6 so 6 all x px 1 Engineering/Math/Physics 25: Computational Methods 34 6 all x Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx x Continuous Case The probability function that accompanies a continuous random variable is a continuous mathematical function that integrates to 1. The Probabilities associated with continuous functions are just areas under a Region of the curve (→ Definite Integrals) Probabilities are given for a range of values, rather than a particular value • e.g., the probability of getting a math SAT score between 700 and 800 is 2%). Engineering/Math/Physics 25: Computational Methods 35 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Continuous Case PDF Example Recall the negative exponential function (in probability, this is called x f ( x ) e an “exponential distribution”): This Function Integrates to 1 (as required for all PDF’s) for limits of 0 → ∞ e 0 x Engineering/Math/Physics 25: Computational Methods 36 e 0 1 1 1 0 x Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Continuous Case PDF Example The probability that x is any exact value (e.g.: 1.9476) is 0 • we can ONLY assign Probabilities to possible RANGES of x For example, the probability of x falling within 1 to 2: p(x)=e-x 1 x p(x)=e-x 1 1 NO Area Under a LINE 2 p (1 x 2) e x e x x Engineering/Math/Physics 25: Computational Methods 37 2 1 e 2 e 1 .135 .368 .23 23% Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 2 1 Gaussian Curve The Man-Height HistroGram had some Limited, and thus DISCRETE, Data If we were to Measure 10,000 (or more) young men we would obtain a HistoGram like this Engineering/Math/Physics 25: Computational Methods 38 As We increase the number and fineness of the measurements The PDF approaches a CONTINUOUS Curve Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Gaussian Distribution A Distribution that Describes Many Physical Processes is called the GAUSSIAN or NORMAL Distribution Gaussian (Normal) distribution • Gaussian → famous “bell-shaped curve” – Describes IQ scores, how fast horses can run, the no. of Bees in a hive, wear profile on old stone stairs... • All these are cases where: – deviation from mean is equally probable in either direction – Variable is continuous (or large enough integers to look continuous) Engineering/Math/Physics 25: Computational Methods 39 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Normal Distribution Real-valued PDF: f(x) → −∞ < x < +∞ 2 independent fitting parameters: µ , σ (central location and width) Properties: IP • Symmetrical about Mode at µ • Median = Mean = Mode • Inflection points at ±σ Area (probability of observing event) within: • ± 1σ = 0.683 • ± 2σ = 0.955 For larger σ, bell shaped curve becomes wider and lower (since area =1 for any σ) Engineering/Math/Physics 25: Computational Methods 40 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx IP Normal Distribution Mathematically f x • Where 1 2 e ( x ) 2 2 – σ2 = Variance – µ = Mean (& Median, Mode) The Area Under the Curve f x dx 1 2 e ( x ) 2 2 Engineering/Math/Physics 25: Computational Methods 41 2 dx 1 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 2 68-95-99.7 Rule for Normal Dist 68% of the data σ σ 95% of the data 2σ 2σ 3σ 99.7% of the data Engineering/Math/Physics 25: Computational Methods 42 3σ Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 68-95-99.7 Rule in Math terms… Using Definite-Integral Numerical Calculus 1 e 2 1 x 2 dx .68 68% 2 1 x 2 3 1 x 2 1 e 2 2 1 e 3 2 Engineering/Math/Physics 25: Computational Methods 43 2 2 dx .95 95% 2 dx .997 99.7% Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Error Function (erf) & Probability Guass’s Defining Eqn for the erf erf z 2 z 0 e y2 IG dy This looks a lot Like the normal dist f x 1 2 e ( x ) 2 2 Consider the Gaussian integral Engineering/Math/Physics 25: Computational Methods 44 2 1 2 Or IG 1 2 e ( x ) 2 2 e x 2 1 x dy y dx 2 2 1 dy dx Or 2 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx dx 2 Now Let dx 2dy 2 dx Error Function (erf) & Probability Subbing for 𝑥 & 𝑑𝑥 IG 1 e 2 x 2 IG 2 dx 1 e 1 2 2 1 y2 1 IG e 2 dy erf 2 2 1 y2 As IG e dy 2 erf z ReArranging Engineering/Math/Physics 25: Computational Methods 45 y2 e dy y2 z e dy y2 0 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx dy Error Function (erf) & Probability Now the Limits This Fcn is Symmetrical about y=0 Plotting 1 f y e 0.9 y2 Recall 0.8 erf z 2 f(y) = exp(-y ) 0.7 0.6 0.5 2 z e y2 0 dy And the erf properties 0.4 0.3 • erf(0) = 0 • erf(∞) = 1 0.2 0.1 0 -3 -2 -1 0 1 2 3 y Engineering/Math/Physics 25: Computational Methods 46 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Error Function (erf) & Probability By Symmetry about y = 0 for 2 0 e y2 2 dy 1 0 e 2 −𝑦 𝑒 y2 the AUC’s dy Thus for Positive 𝐵 2 B e y2 dy 2 0 e y2 dy 2 B 0 e y2 dy So Finally integrating: −∞ → 𝐵 2 B e y2 Engineering/Math/Physics 25: Computational Methods 47 dy 1 erf ( B) Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Error Function (erf) & Probability Note That for a Continuous PDF • Probability that x is Less or Equal to b Px b b f x dx • Probability that x is between a & b b Pa x b f x dx a Engineering/Math/Physics 25: Computational Methods 48 The probability for the Normal Dist Px b b 1 2 2 dx b Pa x b But ( x ) 1 e 2 e ( x ) 2 2 1 2 e ( x ) 2 2 2 a 2 2 2 dx I G so 2 x 1 I G 2 erf 2 2 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx dx Error Function (erf) & Probability If We Scale this 1 b µ Properly we can Px b 1 erf 2 2 Cast these Eqns into the ½∙erf Form 1 bµ a µ Pa x b erf erf 2 2 2 MATLAB has the erf built-in, so if we have the POPULATION Mean & StdDev We can Calc Probabilities for Normally Distributed Quantities Engineering/Math/Physics 25: Computational Methods 49 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx MATLAB and Guassian Prob Thus MATLAB has the tools needed to Calulate any Gaussian Probability for • −∞ < 𝑏 < +∞ •𝑎 < 𝑏 1 bµ a µ Pa x b erf erf 2 2 2 Engineering/Math/Physics 25: Computational Methods 50 1 b µ P x b 1 erf 2 2 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx erf(𝒛) can be NEGATIVE For the previous erf calcs to work the erf must be NEGATIVE when 𝑏 is negative; e.g.: 1 0.73 µ P x 0.73 1 erf MUST be 0 2 2 A quick Check >> erfM73 = erf(-0.73) erfM73 = -0.6981 >> erfP73 = erf(+0.73) erfP73 = 0.6981 Bruce Mayer, PE Engineering/Math/Physics 25: Computational Methods 51 [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Estimating µ & σ (1) The Location & Width Parameters, µ & σ, are Calculated from the ENTIRE POPULATION • Mean, µ n xk n k 1 • Variance, σ2 n xk 2 n 2 • Standard Deviation, σ 2 For LARGE Populations it is usually impractical to measure all the xk In this case we take a Finite SAMPLE to ESTIMATE µ & σ k 1 Engineering/Math/Physics 25: Computational Methods 52 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Estimating µ & σ (2) Say we want to characterize Miles/Yr driven by Every Licensed Driver in the USA We Take the Mean of the SAMPLE We assume that this quantity is Normally Distributed, so we take a Sample of N = 1013 Drivers Use the SAMPLEMean to Estimate the POPULATION-Mean Engineering/Math/Physics 25: Computational Methods 53 N x xk N k 1 N µ x xk N k 1 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Estimating µ & σ (3) S Now Calc the Estimate SAMPLE Variance & • standard deviation: StdDev positive square root of N 2 S2 x k 1 k x N 1 • Number decreased from N to (N – 1) To Account for case where N = 1 – In this case 𝑥 = 𝑥1 , and the S2 result is meaningless Engineering/Math/Physics 25: Computational Methods 54 the variance – small std dev: observations are clustered tightly around a central value – large std dev: observations are scattered widely about the mean Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx All Done for Today Gaussian? Or Normal? Recall De Moivre’s Theorem z R cos jR sin Normal distribution was introduced by French mathematician A. De Moivre in 1733. • Used to approximate probabilities of coin tossing • Called it the exponential bell-shaped curve 1809, K.F. Gauss, a German mathematician, applied it to predict astronomical entities… it became known as the Gaussian distribution. Late 1800s, most believe majority of physical data would follow the distribution called normal distribution z k R k cosk j sin k Engineering/Math/Physics 25: Computational Methods 55 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Engr/Math/Physics 25 Appendix Some Normal Dist Examples Bruce Mayer, PE Licensed Electrical & Mechanical Engineer [email protected] Engineering/Math/Physics 25: Computational Methods 56 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx How Good is the Rule for Real? Check some example data: The mean, µ, of the weight of a large group of women Cross Country Runners = 127.8 lbs The standard deviation (σ) for this Group = 15.5 lbs Engineering/Math/Physics 25: Computational Methods 57 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1σ (15.5 lbs) of the mean 112.3 127.8 143.3 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 130 140 150 160 POUNDS Engineering/Math/Physics 25: Computational Methods 58 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 95% of 120 = .95 x 120 = ~ 114 runners In fact, 115 runners fall within 2σ of the mean 96.8 127.8 158.8 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 130 140 150 160 POUNDS Engineering/Math/Physics 25: Computational Methods 59 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx 99.7% of 120 = .997 x 120 = 119.6 runners In fact, all 120 runners fall within 3σ of the mean 81.3 127.8 174.3 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 130 140 150 160 POUNDS Engineering/Math/Physics 25: Computational Methods 60 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Engr/Math/Physics 25 Appendix f x 2 x 7 x 9 x 6 3 2 Bruce Mayer, PE Licensed Electrical & Mechanical Engineer [email protected] Engineering/Math/Physics 25: Computational Methods 61 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Basic Fitting Demo File % Bruce Mayer, PE % ENGR25 * 29Jun12 * Rev 27Oct14 % file = Demo_Basic_Fitting_Stockton_Temps_1410.m % % Data for Stockton AirPort from % http://www7.ncdc.noaa.gov/IPS/cd/cd.html;jsessionid=1926FA20901D9A52D64 FC06A0A449C00 TmaxSTK1107 = [93 99 100 100 102 101 98 97 90 88 82 82 79 78 80 81 81 86 89 96 96 93 91 88 89 91 95 98 93 87 92] N07 = length(TmaxSTK1107) TmaxSTK1108 = [89 93 93 86 92 91 88 91 94 95 91 92 95 95 92 94 94 95 88 86 86 90 97 97 94 96 95 96 94 89 89] N08 = length(TmaxSTK1108) % TmaxSTK11 = [TmaxSTK1107,TmaxSTK1108] Ntot = length(TmaxSTK11) nday = [1:Ntot]; plot(nday, TmaxSTK11, '-dk', 'LineWidth', 2), xlabel('No. Days after 31Jun11'),... ylabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug11'), grid Engineering/Math/Physics 25: Computational Methods 62 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Normal or Gaussian? Normal distribution was introduced by French mathematician A. De Moivre in 1733. • Used to approximate probabilities of coin tossing • Called it exponential bell-shaped curve 1809, K.F. Gauss, a German mathematician, applied it to predict astronomical entities… it became known as Gaussian distribution. Late 1800s, most believe majority data would follow the distribution called normal distribution Engineering/Math/Physics 25: Computational Methods 63 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Carl Friedrich Gauss Engineering/Math/Physics 25: Computational Methods 64 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Gaussian/Normal Distribution Eqn f x 1 2 e ( x ) 2 2 2 Calculate in MATLAB using the Error Function, 𝑒𝑟𝑓 𝑧 >> TestPerf = erf(0.41) >> TestNerf = erf(-0.41) TestPerf = 0.4380 TestNerf = -0.4380 Engineering/Math/Physics 25: Computational Methods 65 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx Ht (in) No. Area (BW*No.) No./TotArea 64 1 0.5 0.0200 1.00% 64.5 0 0 0.0000 0.00% 65 0 0 0.0000 0.00% 65.5 0 0 0.0000 0.00% 66 2 1 0.0400 2.00% 66.5 4 2 0.0800 4.00% 67 5 2.5 0.1000 5.00% 67.5 4 2 0.0800 4.00% 68 8 4 0.1600 8.00% 68.5 11 5.5 0.2200 11.00% 69 12 6 0.2400 12.00% 69.5 10 5 0.2000 10.00% 70 9 4.5 0.1800 9.00% 70.5 8 4 0.1600 8.00% 71 7 3.5 0.1400 7.00% 71.5 5 2.5 0.1000 5.00% 72 4 2 0.0800 4.00% 72.5 4 2 0.0800 4.00% 73 3 1.5 0.0600 3.00% 73.5 1 0.5 0.0200 1.00% 74 1 0.5 0.0200 1.00% 74.5 0 0 0.0000 0.00% 75 1 0.5 0.0200 1.00% Engineering/Math/Physics S 50.0 25: Computational Methods 66 BW*(No./TotArea) S 100.00% Normal Dist Data Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx SPICE Circuit Engineering/Math/Physics 25: Computational Methods 67 Bruce Mayer, PE [email protected] • ENGR-25_Lec-18_Statistics-1.pptx