Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Statistical Techniques in LIS Research Dr S K Savanur Senior Faculty Department of LIS Joshi-Bedekar College Thane-400601 Decisions • • • • • • • Technical, Managerial and Life-Related Routine and Special Decisions implies Unknown and Future Decisions: Resources and Estimations Decisions = Observations + Processing Better Observation = Better Data Core Theme is Decision Making • 25 May 2017 …Decision Making 2 Decision Making • • • • • • Decision about entities Entities have attributes/properties Attributes are variables Variables take different values Variables come from objectives/hypothesis Measurement of variable is Data • 25 May 2017 • Types of Variables Note: Variables, Measurement and Data. 3 Types of Variables • • • • Quantitative Vs Qualitative Continuous Vs Discrete Dependent Vs Independent [Tells what is to be measured. What is the data that need to be collected.] • 25 May 2017 Types of Variables Contd. 4 Types of Variables Contd. • Continuous Variable: Eg: Change in Tumor Volume or diameter, age, height, BP Obsolescence rate, age of manuscripts/books, etc. Commonly used point estimates: mean, median • Binary Variables: Observations (i.e., dependent variables) that occur in one of two possible states (zero or 1, Eg: improved/not improved, completed/failed task, yes or no, Male-Female, Response, Progression, >50% Reduction in Tumor Size. Reference Lending, Borrowed-on the shelf, Commonly used point estimate: proportion, relative risk, odds ratio • Time-to-Event (Survival) Variables: Eg: time to progression, time to death, time to relapse, cut-off temperature, voltage, time to discard, time to binding etc. Commonly used point estimates: median survival, k-year survival, hazard ratio • 25 May 2017 Measurement 5 Measurement • • • • Nominal Scale Ordinal Scale Interval Scale Ratio Scale • 25 May 2017 Population, Samples etc 6 Some Definitions • Population: The complete set of individuals or objects that the investigator is interested in studying • Sample: A subset of the population that is actually being studied • Essence of Statistics • 25 May 2017 • Essence of Statistics Point Estimation, Range Estimation 7 Essence of Statistics • Plural Vs Singular • Statistic: Summary measure of a sample • Parameter: Summary measure of a population • Summary Measures: Mean, Median, Mode, SD, Coefficient of Correlation, Regression Coefficient etc. 25 May 2017 8 When to use which summary measure • • • • • • • • • • • Mean Interval and Ratio Scales Median Ordinal, Interval and Ratio Scales Mode Nominal, Ordinal, Interval and Ratio Scales SD Interval and Ratio Scales Association - Coefficient of Correlation: Two Variables - Interval and Ratio Scale : Pearson - Ordinal Scale: Spearman - Nominal and Ordinal Scale: Chi-Square Regression Coefficient Single Variable: Mean, Median, Mode, SD 25 May 2017 • Two Aspects of Statistics9 Two Aspects of Statistics • Descriptive Statistics • Inferential Statistics 25 May 2017 10 Descriptive Statistics • Concerned with describing or characterizing the obtained sample data • Use of summary measures—typically measures of central tendency and spread • Measures of central tendency include the mean, median, and mode. • Measures of spread include the range, variance and standard deviation. • These summary measures of obtained from sample data are called statistic. 25 May 2017 11 Inferential Statistics • Involves using obtained sample statistics to estimate the corresponding population parameters. • Most common inference is using a sample mean to estimate a population mean. • In short, it leads to an estimation. • Sample Statistic • Population Parameter 25 May 2017 12 Point Estimation • • • • • 25 May 2017 A “Point Estimate” is a one number summary of the data. Examples: Dose Finding Trails: MTD (Maximum Tolerable Dose) Safety and Efficiency Trials: Response, Median Survival Comparative Trails: Odds Ratio, Hazard Ratio etc. 13 Odds Ratio and Hazardous Ratio • • The odds ratio is a measure of effect size, describing the strength of association or non-independence between two binary data values. It is used as a descriptive statistic, and plays an important role in logistic regression. Unlike other measures of association for paired binary data such as the relative risk, the odds ratio treats the two variables being compared symmetrically, and can be estimated using some types of nonrandom samples. The hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of a treatment/variable. In a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be 2, 25 May 2017 14 Range Estimate 25 May 2017 15 Reliability, Validity and Normality • A test is reliable if it gives the same reading every time • Eg: Reliable friend, reliable data, inputs etc • A test is valid when it is testing what it is supposed to test • Eg: Valid ticket, valid instrument [Spring balance/weighing machine] • If the frequency distribution has Mean=Median=Mode and there is no skewness • Kurtosis measures flat-topness of a curve • Parametric Vs Nonparametric 25 May 2017 16 25 May 2017 17 Parametric and Non-Parametric Tests • A parametric statistical test makes assumptions about the parameters (defining properties) of the population distribution(s) from which one's data are drawn. • A non-parametric test makes no assumptions. • "Non-parametric Test" is a null category, as all statistical tests assume one thing or another about the properties of the source population(s). • The following are non-parametric Tests • Chi-square Tests Fisher Exact Probability test The Mann-Whitney Test The Wilcoxon Signed-Rank Test The Kruskal-Wallis Test The Friedman Test Non-parametric tests are sometimes spoken of as "distribution 25 May 2017 18 Hypothesis Testing • Null Hypothesis • Alternative Hypothesis 25 May 2017 H0 Ha 19 Analysis • – – – – – – • • – – – – – – 1. Processing of Data 1.1 Editing 1.2 Coding 1.3 Classification 1.4 Tabulation 1.5 Percentages 1.6 Graphic Presentation 2. Analysis of Data 2.1 Descriptive and Causal Analysis – – – – – – – 2.1.1 Uni-dimensional Analysis 2.1.1.1 Central Tendency 2.1.1.2 Dispersion 2.1.1.3 Skewness 2.1.1.4 One-way ANOOVA 2.1.1.5 Index Number 2.1.1.6 Time Series Analysis – 2.1.2 Bi-variate Analysis – 2.1.2.1 Correlation – 2.1.2.2 Regression – 2.1.2.3 Association – 2.1.2.4 Two-way ANOVA 25 May 2017 • 2.1.3 Multi-variate analysis 2.1..3.1 Multiple Regression 2.1.3.2 Multiple Discriminant analysis 2.1.3.3 Multi-ANOVA 2.1.3.4 Canonical Analysis 2.1.3.5 Factor Analysis, Cluster Analysis etc 2.2 Inferential or Statistical Analysis – – – – – – – – – – 2.2.1 Estimation of Parameter Values 2.2.1.1 Point Estimation 2.2.1.2 Range/Interval Estimation 2.2.2 Testing of Hypothesis 2.2.2.1 Parametric Tests 2.2.2.1.1 z-test 2.2.2.1.2 t-test 2.2.2.1.3 ANOVA 2.2.2.2 Non-Parametric Tests 2.2.2.2.1 Chi-square Tests 2.2.2.2.2 Fisher Exact Probability test 2.2.2.2.3 The Mann-Whitney Test 2.2.2.2.4 The Wilcoxon Signed-Rank Test 2.2.2.2.5 The Kruskal-Wallis Test 2.2.2.2.5 The Friedman Test 20 Functional Statistics • Statistics is the judgment of choosing and sequencing the tests • It is more of understanding when to do what • Then MS-Excel or SPSS takes care of the data • First test the normality, reliability and validity • Then central tendency, if required • Choose the level of significance • Multi-criteria-Decision Making • Thank You 25 May 2017 21