Download Lect6_numpy_statistics

Numpy-statistics 1. ‘numpy.mean’ Examples, >>> x=np.arange(6).reshape(3,2) >>> x array([[0, 1], [2, 3], [4, 5]]) >>> np.mean(x) 2.5 >>> np.mean(x,axis=0) array([ 2., 3.]) >>> np.mean(x,axis=1) array([ 0.5, 2.5, 4.5]) >>> x=np.arange(24).reshape(4,3,2) >>> x array([[[ 0, 1], [ 2, 3], [ 4, 5]], [[ 6, 7], [ 8, 9], [10, 11]], [[12, 13], [14, 15], [16, 17]], [[18, 19], [20, 21], [22, 23]]]) >>> np.mean(x) 11.5 >>> np.mean(x,axis=0) array([[ 9., 10.], [ 11., 12.], [ 13., 14.]]) >>> np.mean(x,axis=1) array([[ 2., 3.], [ 8., 9.], [ 14., 15.], [ 20., 21.]]) >>> np.mean(x,axis=2) array([[ 0.5, 2.5, 4.5], [ 6.5, 8.5, 10.5], [ 12.5, 14.5, 16.5], [ 18.5, 20.5, 22.5]]) 2. ‘numpy.nanmean’ Examples, >>> a = np.array([[1, np.nan], [3, 4],[np.nan, 5]]) >>> a array([[ 1., nan], [ 3., 4.], [ nan, 5.]]) >>> np.nanmean(a) 3.25 >>> (1+3+4+5)/4 3.25 >>> np.nanmean(a,axis=0) array([ 2. , 4.5]) >>> np.nanmean(a,axis=1) array([ 1. , 3.5, 5. ]) 3. ‘numpy.average’ This command can perform weighted average, for example, >>> data = np.arange(5) >>> data array([0, 1, 2, 3, 4]) >>> np.average(data) 2.0 >>> np.average(data,weights=[1./5, 2./5, 0.5/5, 0.7/5, 0.8/5]) 1.6600000000000001 >>> 0*1/5+1*2/5+2*0.5/5+3*0.7/5+4*0.8/5 1.6600000000000001 >>> np.average(data,weights=[1, 2, 0.5, 0.7, 0.8]) 1.6600000000000001 >>> data = np.arange(6).reshape((3,2)) >>> data array([[0, 1], [2, 3], [4, 5]]) >>> np.average(data, axis=0, weights=[1./4, 3./4]) Traceback (most recent call last): File "<pyshell#40>", line 1, in <module> np.average(data, axis=0, weights=[1./4, 3./4]) File "C:\Anaconda3\Lib\site-packages\numpy\lib\function_base.py", line 951, in average "Length of weights not compatible with specified axis.") ValueError: Length of weights not compatible with specified axis. >>> np.average(data, axis=1, weights=[1./4, 3./4]) array([ 0.75, 2.75, 4.75]) >>> np.average(data, axis=1, weights=[1, 3]) array([ 0.75, 2.75, 4.75]) >>> np.average(data, axis=0, weights=[1./4, 2.5/4, 0.5/4]) array([ 1.75, 2.75]) >>> np.average(data, axis=0, weights=[1, 2.5, 0.5]) array([ 1.75, 2.75]) 4. ‘numpy.median’ To find the Median, place the numbers in value order and find the middle, for example, 3, 5, 12. The middle is 5, so the median is 5. 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56, the median is 23. 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56, the median is (21+23)/2=22. ‘numpy.median’ computes the median along the specified axis, for example, >>> a = np.array([[10, 7, 4], [3, 2, 1]]) >>> a array([[10, 7, 4], [ 3, 2, 1]]) >>> np.median(a) 3.5 >>> np.median(a, axis=0) array([ 6.5, 4.5, 2.5]) >>> np.median(a, axis=1) array([ 7., 2.]) 5. ‘numpy.nanmedian’ Compute the median along the specified axis, while ignoring NaNs, for example, >>> a = np.array([[10.0, 7, 4], [3, 2, 1]]) >>> a[0, 1] = np.nan >>> a array([[ 10., nan, 4.], [ 3., 2., 1.]]) >>> np.median(a) Warning (from warnings module): File "C:\Anaconda3\Lib\site-packages\numpy\lib\function_base.py", line 3569 RuntimeWarning) RuntimeWarning: Invalid value encountered in median nan >>> np.nanmedian(a) 3.0 >>> np.nanmedian(a, axis=0) array([ 6.5, 2. , 2.5]) >>> np.nanmedian(a, axis=1) array([ 7., 2.]) 6. ‘numpy.amin’ Return the minimum of an array or minimum along an axis, for example, >>> a = np.arange(10).reshape((2,5)) >>> a array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> np.amin(a) 0 >>> np.amin(a,axis=0) array([0, 1, 2, 3, 4]) >>> np.amin(a,axis=1) array([0, 5]) >>> a = np.random.random((3,4)) >>> a array([[ 0.1827562 , 0.45899318, 0.56772974, 0.28598367], [ 0.57311363, 0.88326133, 0.56218878, 0.00292109], [ 0.02695291, 0.07112275, 0.15821451, 0.27109731]]) >>> np.amin(a) 0.0029210862268211901 >>> np.amin(a,axis=0) array([ 0.02695291, 0.07112275, 0.15821451, 0.00292109]) >>> np.amin(a,axis=1) array([ 0.1827562 , 0.00292109, 0.02695291]) 6. ‘numpy.nanmin’ >>> a = np.array([[10.0, 7, 4], [3, 2, 1]]) >>> a[0, 1] = np.nan >>> a array([[ 10., nan, 4.], [ 3., 2., 1.]]) >>> np.amin(a) nan >>> np.nanmin(a) 1.0 >>> np.nanmin(a,axis=0) array([ 3., 2., 1.]) >>> np.nanmin(a,axis=1) array([ 4., 1.]) >>> a = np.random.random((3,4)) >>> a array([[ 0.76436355, 0.77478066, 0.89387709, 0.68377504], [ 0.33755742, 0.11248383, 0.3887192 , 0.53828905], [ 0.21264265, 0.56991184, 0.05503039, 0.02870428]]) >>> a[0, 1]=np.nan >>> a[1, 3]=np.nan >>> a array([[ 0.76436355, nan, 0.89387709, 0.68377504], [ 0.33755742, 0.11248383, 0.3887192 , nan], [ 0.21264265, 0.56991184, 0.05503039, 0.02870428]]) >>> np.amin(a) nan >>> np.nanmin(a) 0.028704284770746491 >>> np.nanmin(a,axis=0) array([ 0.21264265, 0.11248383, 0.05503039, 0.02870428]) >>> np.nanmin(a,axis=1) array([ 0.68377504, 0.11248383, 0.02870428]) 7. ‘numpy.amax’ and ‘numpy.nanmax’ >>> a = np.random.random((3,4)) >>> a array([[ 0.57914937, 0.57259201, 0.43204429, 0.37910263], [ 0.18200516, 0.6547772 , 0.52763529, 0.74640798], [ 0.38893868, 0.49642736, 0.030878 , 0.74288876]]) >>> np.amax(a) 0.74640797754123944 >>> np.amax(a,axis=0) array([ 0.57914937, 0.6547772 , 0.52763529, 0.74640798]) >>> np.amax(a,axis=1) array([ 0.57914937, 0.74640798, 0.74288876]) >>> a array([[ 0.57914937, nan, 0.43204429, 0.37910263], [ 0.18200516, 0.6547772 , 0.52763529, nan], [ 0.38893868, 0.49642736, 0.030878 , 0.74288876]]) >>> np.amax(a) nan >>> np.nanmax(a) 0.74288875585610092 >>> np.nanmax(a,axis=0) array([ 0.57914937, 0.6547772 , 0.52763529, 0.74288876]) >>> np.nanmax(a,axis=1) array([ 0.57914937, 0.6547772 , 0.74288876]) 8. ‘numpy.ptp’ Calculate Range of values (maximum - minimum) along an axis, >>> a = np.random.random((3,4)) >>> a array([[ 0.93267931, 0.615258 , 0.41370574, 0.76046136], [ 0.40763334, 0.87773845, 0.82017794, 0.5262714 ], [ 0.84591863, 0.3031923 , 0.10473461, 0.38237018]]) >>> np.ptp(a) 0.82794469494376344 >>> np.amax(a)-np.amin(a) 0.82794469494376344 9. ‘numpy.percentile’ and ‘numpy.nanpercentile’ Both compute the qth percentile of the data along the specified axis, latter ignores nan values. >>> a = np.array([[10, 7, 4], [3, 2, 1]]) >>> a array([[10, 7, 4], [ 3, 2, 1]]) >>> np.percentile(a, 50) 3.5 >>> np.percentile(a, 25) 2.25 >>> np.percentile(a, 90) 8.5 >>> np.percentile(a, 50,axis=0) array([ 6.5, 4.5, 2.5]) >>> np.percentile(a, 90,axis=0) array([ 9.3, 6.5, 3.7]) >>> np.percentile(a, 50,axis=1) array([ 7., 2.]) >>> np.percentile(a, 90,axis=1) array([ 9.4, 2.8]) 10. Numpy.argmin and numpy.argmax: find array index of minimum and maximum >>> import numpy as np >>> a=np.array([[2,4,5,8],[1,10,7,3],[4,6,9,-1]]) >>> a array([[ 2, 4, 5, 8], [ 1, 10, 7, 3], [ 4, 6, 9, -1]]) >>> i_min=np.argmin(a) >>> i_min 11 >>> i_min=np.argmin(a,axis=0) >>> i_min array([1, 0, 0, 2], dtype=int64) >>> i_min=np.argmin(a,axis=1) >>> i_min array([0, 0, 3], dtype=int64) >>> i_max=np.argmax(a) >>> i_max 5 >>> i_max=np.argmax(a,axis=0) >>> i_max array([2, 1, 2, 0], dtype=int64) >>> i_max=np.argmax(a,axis=1) >>> i_max array([3, 1, 2], dtype=int64) 11. Variance, standard deviation, covariance, and correlation coefficient Let x(t )  x  x' (t ) represent the following dataset, where x’ is the deviation of x from the mean. Then take the average, x(t )  x  x' (t ) But x  x Thus x(t )  x  x' (t ) and x ' (t )  0 Therefore, x' (t ) is not a good quantity to measure the deviations of observations from the true value. Classically, we use quantity x' (t ) 2 as a measure of variability of observations. x2  and 1 1 [ x(1) 2  ...  x(n) 2 ]  n n  x 2  x  x' n  x(i) 2 i 1  2 x 2  x' 2 2 x' x  x Take the average on both sides 2 x 2  x' 2  2 x' x  x 2 2 x 2  x  x' 2 x' 2  0 Clearly, 1 1 n x  2  [ x (1) 2  ...  x (n) 2 ]   x (i) 2 n n i 1  x2  x' 2 is called the variance  x  x' 2  standard deviation or standard error x  ( x , ) σ 2σ 3σ 4σ 5σ 0.6826895 0.954499 0.997300 0.999936 0.999999 Now consider two variables that vary with time. x(t )  x  x' (t ) y (t )  y  y ' (t ) Then, considering the product xy xy  ( x  x' )( y  y' ) xy  x y  x' y  x y' y' x' xy  x y  y ' x' 1 1 n x y   [ x(1) y (1)  ...  x (n) y (n)]   x (i) y (i ) n n i 1 This is called covariance, which can be positive or negative or even 0 Correlation coefficient: r x' y '  x y It measures how close x is correlated to y. The range of the correlation coefficient is [-1,1]. Numpy commands: ‘numpy.var’, ‘numpy.nanvar’, ‘numpy.std’, ‘numpy.nanstd’, ‘numpy.cov’,… Examples, >>> a = np.random.random((3,4)) >>> a array([[ 0.249677 , 0.56903753, 0.84385114, 0.38506039], [ 0.4294672 , 0.10921041, 0.03420204, 0.41487398], [ 0.11549864, 0.29484404, 0.25647614, 0.2902445 ]]) >>> np.var(a) 0.044846039780698808 >>> np.std(a) 0.21176883571644534 >>> np.var(a,axis=0) array([ 0.01654496, 0.03567588, 0.11666076, 0.00282349]) >>> np.var(a,axis=1) array([ 0.04957636, 0.03143622, 0.00532557]) >>> np.std(a,axis=0) array([ 0.1286272 , 0.18888059, 0.34155638, 0.05313653]) >>> np.std(a,axis=1) array([ 0.2226575 , 0.17730262, 0.07297649]) >>> a[0,2]=np.nan >>> a[1,1]=np.nan >>> a array([[ 0.249677 , 0.56903753, nan, 0.38506039], [ 0.4294672 , nan, 0.03420204, 0.41487398], [ 0.11549864, 0.29484404, 0.25647614, 0.2902445 ]]) >>> np.nanvar(a) 0.021865694859288017 >>> np.nanstd(a) 0.1478705341144341 >>> np.nanvar(a,axis=0) array([ 0.01654496, 0.01879552, 0.01235144, 0.00282349]) >>> np.nanvar(a,axis=1) array([ 0.01712971, 0.03348429, 0.00532557]) >>> np.nanstd(a,axis=0) array([ 0.1286272 , 0.13709674, 0.11113705, 0.05313653]) >>> np.nanstd(a,axis=1) array([ 0.13088052, 0.18298714, 0.07297649]) Example, >>> x=np.array([1,2,3,4,5]) >>> y=np.array([0,1,0.5,0.3,0.2]) >>> x array([1, 2, 3, 4, 5]) >>> y array([ 0. , 1. , 0.5, 0.3, 0.2]) >>> np.corrcoef(x,y) array([[ 1. , -0.12456822], [-0.12456822, 1. ]]) >>> np.std(x) 1.4142135623730951 >>> np.std(y) 0.34058772731852804 >>> x1=x-np.mean(x) >>> y1=y-np.mean(y) >>> np.mean(x1*y1)/np.std(y)/np.std(x) -0.12456821978060993

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lect6_numpy_statistics