Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Glossary Analytics: The extensive use of data and qualitative and quantitative analysis to help make decisions and add value across a unit or an entire organization. Analytics are tools that help find, interpret and leverage data to make better decisions. Binary Variables: Variables that can only take two values such as true or false, yes or no. Causality: The relationship between the cause (an event) and the effect (a second event) where the cause precedes the effect in time and space, and the cause is present when the effect responds. Central Limit Theorem: A large properly drawn sample (sample size of 30 or greater) will resemble the population from which the sample was pulled from. When a large enough sample is used, sampling distribution of mean will be normal or approximately normal. Central Tendency: Tendency of the observations to center around a particular value. Three measures of central tendency are mean, median and mode. Coefficient of Correlation: A number between 1 and -1 which shows the extent that two variables are related. A coefficient of 0 means there is no relationship. A positive number shows that the two variables change in the same direction. For example hitting 20 three pointers a game and winning games might be have a positive correlation coefficient. But if hitting 20 three pointers leads to more losses than wins, then there would be a negative correlation. Correlation: A measure of interdependence between two or more variables. A scatter diagram would show a positive correlation when an increase in the quantity of one variable is accompanied with an increase in the quantity of another variable. Correlation Analysis: Measures the strength and type of relationship between variables. Craft Analytics: A one-time customized search that after being done is normally discarded and never used again Creativity: The ability to develop and articulate an original and useful concept. Data: Facts, quantities, values etc. stored for analyses. The singular of data is datum. Data Mining: The process of examining large quantities of data to hopefully find patterns and commonalities. Deductive Reasoning: The drawing of inferences about an unknown part from a known whole. Descriptive Analytics: Gathering, organizing, tabulating and depicting data and then describing what the data called. 1 © Taylor & Francis 2017 Descriptive Statistics: Analysis of data that summarizes the data and what happened, such as a bowling score or batting average. Dynamic Ticket Pricing: Pricing of tickets in real time based on the changes in supply and demand in the market. Ensemble Forecast: Leverages forecasting projections from across an entire organization rather than just one department. Experimental Design: Using treatment and control groups to examine cause and effect relationships. Forecasting: Using past and present data to predict future results. Frequency Distribution: Shows the frequency of each value observed in the result distribution. Frequency Table: Shows the frequency of each value observed numerically in a table. Histogram: A diagram displaying frequency of values with rectangles/blocks that are proportional to the values’ frequency. Hypothesis Testing: Using a systematic approach to examine and test tentative beliefs about reality and see if the evidence supports that belief. Testing of a hypothesis by comparing it with a null hypothesis and drawing conclusions about a population through a sample. Independent Events: If the probability of one event occurring is not affected by the occurrence of another event (i.e. two random events x and y). The opposite is a dependent event where the probability of one event occurring is affected by the occurrence of the other event. For example the sale of water and who the starting pitcher is for a game are independent events while having the team hit .300 in a game and scoring more than five runs are dependent events. Independent Variables: A variable, the value of which is used to determine the value of another variable. Inductive Reasoning: The drawing of an inference about an unknown whole from a known part. Industrial Analytics: Requires significant upfront resources to develop a solid model, but once completed the model can be used repeatedly for future decisions in a seamless manner. Inference: Drawing conclusions based on evidence or reasoning. Inferential Statistics: Making inferences about a population by using data of a sample drawn from the population. Interval Variable: A numerical variable for which the difference between the values is equal and meaningful. 2 © Taylor & Francis 2017 Mean: One of the measures of central tendency which is calculated as an arithmetic average of scores. Mean has a major problem when “outliers” are present in a sample due to inflating/deflating the average. The average pro athlete salary is skewed by those earning the league minimum and the superstar athletes who might be earning $20 million a year. Median: The middle value that is dividing a sample/population into two equal halves. In order to identify the median, arrange observations from smallest to largest. If there are an odd number of observations, then the middle observation is the median. If there are an even number of observations, then the average of the two middle numbers is the median. Mode: Another measure of central tendency, the mode is the most frequently occurring value in a data set. If there are 30 members of a team and the most frequent age is 24 then that would be the mode (it could also possibly be the mean and/or median depending on the other values. Modeling: A selective approach to examine a chosen variable. If you are interested in examining attendance variables (dependent variable) you might have a model examining opponents, standing, weather and other independent variables. Nominal Variable: A variable with two or more categories without a natural order. Normal Distribution: A bell-shaped symmetric distribution around the mean where 68% of the observations fall within one standard deviation and 95% within two standard deviations. In a normal distribution the mode, median and mean would be in the exact center. Optimization: Leverages data to finding optimal solutions such as to reduce cost or increase prices to help lead to the highest profitability. Ordinal Variable: A variable that is similar to a categorical variable except values show an order, yet the arithmetic difference between the values is not meaningful. Poisson Distribution: The probability of a number of events that might occur in a specific time, place or volume when they follow a known average rate. For example, if we know a pitcher is perfect and 2/3 of their pitches are a strike then we would know that 66 of 100 pitches would be a strike. Predictive Analytics: Going beyond just indicating what happened, predictive analytics looks towards the future and forecasts what might happen based on past and present data. Prescriptive Analytics: Beyond just predicting what might happen in the future, prescriptive analytics prescribes what to do in the future through manipulating independent variables and controlling extraneous variables. Primary Data: Observed, collected or developed first-hand. For example, an organization can use its own data sources such as customer surveys completed after buying a product. Probability: The likelihood of events/outcomes occurring 3 © Taylor & Francis 2017 Qualitative Analytics: Leverages unstructured data from a smaller group to dig deep into the data, which is often non-statistical. Qualitative Variable: A variable that is normally described in terms of words rather than numbers. Quantitative Analytics: Examines phenomena through statistical, mathematical and computation techniques. Random Sample: A subset of a population that is chosen in such a way that each time a member is chosen to the subset, every member of the population had an equal chance of being selected. If 10% of the population is Hispanic then a random sample of the larger population should have around 10% of the set being Hispanic. Range: The difference between the lowest and the highest values in a set. Ratio Variable: An interval variable with an addition of a true zero. Regression Analysis: A statistical tool for the investigation of relationships between variables (such as smoking and cancer), while holding other important variables (such as age, weight, genetics etc.) constant. While correlation examines the strength of relationships, regression examines the nature of that relationship to help make predictions for the future. Response Bias: The tendency for survey answers to be wrong in some systematic way. Sampling Bias: A systemic error due to sample selection resulting in over or underrepresentation of certain members of a population under investigation. Sampling bias may result in data which is not representative of the population; therefore may lead into inaccurate results. Secondary Data: Readily available data that is collected by someone other than the user. Secondary data could be available from sources such as suppliers, researchers, government and e-trade associations. This data is often available online or in print form. Standard Deviation: A measure of dispersion in data. It shows how spread out values is to the mean in a data set. If the standard deviation is low then the data falls near the mean. For example, if the mean is a 90 and the standard deviation is 0 then everyone scored a 90. If the standard deviation is high then some might have 70s and others 100s. If the number is low then there is less variation among the scores. Thus a standard deviation of 6 would mean that the scores would be between 84 and 96. Statistics: The science of collecting, organizing, analyzing, interpreting and presenting data. Test of Significance: Analyzing sample data to support or reject a claim about a population (i.e. relationship between variables, difference between groups etc.). Text Mining: Analysis of data in text format to find patterns and derive information. Time Series Analysis: A statistical technique that examines data over time to spot trends. 4 © Taylor & Francis 2017