Download Glossary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Glossary
Analytics: The extensive use of data and qualitative and quantitative analysis to help make
decisions and add value across a unit or an entire organization. Analytics are tools that help
find, interpret and leverage data to make better decisions.
Binary Variables: Variables that can only take two values such as true or false, yes or no.
Causality: The relationship between the cause (an event) and the effect (a second event)
where the cause precedes the effect in time and space, and the cause is present when the
effect responds.
Central Limit Theorem: A large properly drawn sample (sample size of 30 or greater) will
resemble the population from which the sample was pulled from. When a large enough
sample is used, sampling distribution of mean will be normal or approximately normal.
Central Tendency: Tendency of the observations to center around a particular value. Three
measures of central tendency are mean, median and mode.
Coefficient of Correlation: A number between 1 and -1 which shows the extent that two
variables are related. A coefficient of 0 means there is no relationship. A positive number
shows that the two variables change in the same direction. For example hitting 20 three
pointers a game and winning games might be have a positive correlation coefficient. But if
hitting 20 three pointers leads to more losses than wins, then there would be a negative
correlation.
Correlation: A measure of interdependence between two or more variables. A scatter
diagram would show a positive correlation when an increase in the quantity of one variable is
accompanied with an increase in the quantity of another variable.
Correlation Analysis: Measures the strength and type of relationship between variables.
Craft Analytics: A one-time customized search that after being done is normally discarded
and never used again
Creativity: The ability to develop and articulate an original and useful concept.
Data: Facts, quantities, values etc. stored for analyses. The singular of data is datum.
Data Mining: The process of examining large quantities of data to hopefully find patterns
and commonalities.
Deductive Reasoning: The drawing of inferences about an unknown part from a known
whole.
Descriptive Analytics: Gathering, organizing, tabulating and depicting data and then
describing what the data called.
1
© Taylor & Francis 2017
Descriptive Statistics: Analysis of data that summarizes the data and what happened, such as
a bowling score or batting average.
Dynamic Ticket Pricing: Pricing of tickets in real time based on the changes in supply and
demand in the market.
Ensemble Forecast: Leverages forecasting projections from across an entire organization
rather than just one department.
Experimental Design: Using treatment and control groups to examine cause and effect
relationships.
Forecasting: Using past and present data to predict future results.
Frequency Distribution: Shows the frequency of each value observed in the result
distribution.
Frequency Table: Shows the frequency of each value observed numerically in a table.
Histogram: A diagram displaying frequency of values with rectangles/blocks that are
proportional to the values’ frequency.
Hypothesis Testing: Using a systematic approach to examine and test tentative beliefs about
reality and see if the evidence supports that belief. Testing of a hypothesis by comparing it
with a null hypothesis and drawing conclusions about a population through a sample.
Independent Events: If the probability of one event occurring is not affected by the
occurrence of another event (i.e. two random events x and y). The opposite is a dependent
event where the probability of one event occurring is affected by the occurrence of the other
event. For example the sale of water and who the starting pitcher is for a game are
independent events while having the team hit .300 in a game and scoring more than five runs
are dependent events.
Independent Variables: A variable, the value of which is used to determine the value of
another variable.
Inductive Reasoning: The drawing of an inference about an unknown whole from a known
part.
Industrial Analytics: Requires significant upfront resources to develop a solid model, but
once completed the model can be used repeatedly for future decisions in a seamless manner.
Inference: Drawing conclusions based on evidence or reasoning.
Inferential Statistics: Making inferences about a population by using data of a sample drawn
from the population.
Interval Variable: A numerical variable for which the difference between the values is equal
and meaningful.
2
© Taylor & Francis 2017
Mean: One of the measures of central tendency which is calculated as an arithmetic average
of scores. Mean has a major problem when “outliers” are present in a sample due to
inflating/deflating the average. The average pro athlete salary is skewed by those earning the
league minimum and the superstar athletes who might be earning $20 million a year.
Median: The middle value that is dividing a sample/population into two equal halves. In
order to identify the median, arrange observations from smallest to largest. If there are an odd
number of observations, then the middle observation is the median. If there are an even
number of observations, then the average of the two middle numbers is the median.
Mode: Another measure of central tendency, the mode is the most frequently occurring value
in a data set. If there are 30 members of a team and the most frequent age is 24 then that
would be the mode (it could also possibly be the mean and/or median depending on the other
values.
Modeling: A selective approach to examine a chosen variable. If you are interested in
examining attendance variables (dependent variable) you might have a model examining
opponents, standing, weather and other independent variables.
Nominal Variable: A variable with two or more categories without a natural order.
Normal Distribution: A bell-shaped symmetric distribution around the mean where 68% of
the observations fall within one standard deviation and 95% within two standard deviations.
In a normal distribution the mode, median and mean would be in the exact center.
Optimization: Leverages data to finding optimal solutions such as to reduce cost or increase
prices to help lead to the highest profitability.
Ordinal Variable: A variable that is similar to a categorical variable except values show an
order, yet the arithmetic difference between the values is not meaningful.
Poisson Distribution: The probability of a number of events that might occur in a specific
time, place or volume when they follow a known average rate. For example, if we know a
pitcher is perfect and 2/3 of their pitches are a strike then we would know that 66 of 100
pitches would be a strike.
Predictive Analytics: Going beyond just indicating what happened, predictive analytics
looks towards the future and forecasts what might happen based on past and present data.
Prescriptive Analytics: Beyond just predicting what might happen in the future, prescriptive
analytics prescribes what to do in the future through manipulating independent variables and
controlling extraneous variables.
Primary Data: Observed, collected or developed first-hand. For example, an organization
can use its own data sources such as customer surveys completed after buying a product.
Probability: The likelihood of events/outcomes occurring
3
© Taylor & Francis 2017
Qualitative Analytics: Leverages unstructured data from a smaller group to dig deep into the
data, which is often non-statistical.
Qualitative Variable: A variable that is normally described in terms of words rather than
numbers.
Quantitative Analytics: Examines phenomena through statistical, mathematical and
computation techniques.
Random Sample: A subset of a population that is chosen in such a way that each time a
member is chosen to the subset, every member of the population had an equal chance of
being selected. If 10% of the population is Hispanic then a random sample of the larger
population should have around 10% of the set being Hispanic.
Range: The difference between the lowest and the highest values in a set.
Ratio Variable: An interval variable with an addition of a true zero.
Regression Analysis: A statistical tool for the investigation of relationships between
variables (such as smoking and cancer), while holding other important variables (such as age,
weight, genetics etc.) constant. While correlation examines the strength of relationships,
regression examines the nature of that relationship to help make predictions for the future.
Response Bias: The tendency for survey answers to be wrong in some systematic way.
Sampling Bias: A systemic error due to sample selection resulting in over or underrepresentation of certain members of a population under investigation. Sampling bias may
result in data which is not representative of the population; therefore may lead into inaccurate
results.
Secondary Data: Readily available data that is collected by someone other than the user.
Secondary data could be available from sources such as suppliers, researchers, government
and e-trade associations. This data is often available online or in print form.
Standard Deviation: A measure of dispersion in data. It shows how spread out values is to
the mean in a data set. If the standard deviation is low then the data falls near the mean. For
example, if the mean is a 90 and the standard deviation is 0 then everyone scored a 90. If the
standard deviation is high then some might have 70s and others 100s. If the number is low
then there is less variation among the scores. Thus a standard deviation of 6 would mean that
the scores would be between 84 and 96.
Statistics: The science of collecting, organizing, analyzing, interpreting and presenting data.
Test of Significance: Analyzing sample data to support or reject a claim about a population
(i.e. relationship between variables, difference between groups etc.).
Text Mining: Analysis of data in text format to find patterns and derive information.
Time Series Analysis: A statistical technique that examines data over time to spot trends.
4
© Taylor & Francis 2017