Download Oracle`s In-Database Statistical Functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Transcript
Oracle 11g DB
Data Warehousing
Oracle's In-Database
Statistical Functions
ETL
<Insert Picture
Here>
OLAP
Statistics
Data Mining
Charlie Berger
Sr. Director Product Management,
Data Mining Technologies
Oracle Corporation
[email protected]
Copyright © 2007 Oracle Corporation
Synopsis
• Oracle has delivered on a multi-year strategy to transform the
database from a data repository to an analytical database by
bringing the "analytics" to the data (data mining, text mining, and
statistical functions)
• This new “analytical Database”, integrated with Oracle Business
Intelligence EE, opens new doors for better BI
•
•
•
•
•
Why did something happen?
What corrective actions should be taken?
Which factors are influencing your business’s key performance indicators?
Which things should I target?
What will happen in the future and where should you focus limited resources?
• Overview of SQL statistical capabilities embedded in Oracle
Database
• “Repeat what I was shown” hands-on session
Copyright © 2007 Oracle Corporation
Agenda
•
•
•
•
•
•
Introduction
Oracle’s in-Database Statistical Functions
Several Simple Demonstrations
Opportunities for Use Cases
Hands-on Exercises
User Stories
•
•
•
•
A
B
C
…
Copyright © 2007 Oracle Corporation
Market Trends
Analytics Provide Competitive Value
• Competing on Analytics, by Tom Davenport
• “Some companies have built their very businesses
on their ability to collect, analyze, and act on data.”
• “Although numerous organizations are embracing analytics, only a
handful have achieved this level of proficiency. But analytics
competitors are the leaders in their varied fields—consumer products
finance, retail, and travel and entertainment among them.”
• “Organizations are moving beyond query and reporting”
- IDC 2006
• Super Crunchers, by Ian Ayers
• “In the past, one could get by on intuition and experience.
Times have changed. Today, the name of the game is data.”
—Steven D. Levitt, author of Freakonomics
• “Data-mining and statistical analysis have suddenly become
cool.... Dissecting marketing, politics, and even sports, stuff this
complex and important shouldn't be this much fun
to read.” —Wired
Copyright © 2007 Oracle Corporation
Market Trends
Analytics Save Lives
• Super Crunchers, by Ian Ayers
• In December 2004, [Berwick] brazenly announced a plan to save 100,000
lives over the next year and a half. The “100,000 Lives Campaign” challenged
hospitals to implement six changes in care to prevent avoidable deaths.
• … He noticed that thousands of ICU patients die each year from infections
after a central line catheter is placed in their chests. About half of all intensive
care patients have central line catheters, and ICU infections are deadly
(carrying mortality rates of up to 20 percent). He then looked to see if there
was any statistical evidence of ways to reduce the chance of infection. He
found a 2004 article in Critical Care Medicine that showed that systematic
hand-washing (combined with a bundle of improved hygienic procedures such
as cleaning the patient’s skin with an antiseptic called chlorhexidine) could
reduce the risk of infection from central-line catheters by more than 90
percent. Berwick estimated that if all hospitals just implemented this one
bundle of procedures, they might be able to save as many as 25,000 lives per
year.
• —New York Times, August 23, 2007, “Attack of the Super Crunchers:
Adventures in Data Mining”, By Melissa Lafsky
Copyright © 2007 Oracle Corporation
Competitive Advantage of BI & Analytics
$$
Optimization
What’s the best that can happen?
Competitive Advantage
Predictive Modeling
What will happen next?
Forecasting/Extrapolation
Analytic$
What if these trends continue?
Statistical Analysis
Why is this happening?
Alerts
What actions are needed?
Query/drill down
Where exactly is the problem?
Ad hoc reports
How many, how often, where?
Standard Reports
What happened?
Degree of Intelligence
Source: Competing on Analytics, by T. Davenport & J. Harris
Copyright © 2007 Oracle Corporation
Access &
Reporting
Oracle Data Mining
& Statistical Functions
Copyright © 2007 Oracle Corporation
Definition: Statistics
“There are three kinds of
lies: lies, damned lies,
1
and statistics.”
1
This well-known saying is part of a phrase attributed to Benjamin
Disraeli and popularized in the U.S. by Mark Twain
http://en.wikipedia.org/wiki/Statistics
Copyright © 2007 Oracle Corporation
Definition: Statistics
Statistics is a mathematical science pertaining to the
collection, analysis, interpretation or explanation, and
presentation of data. It is applicable to a wide variety
of academic disciplines, from the physical and social
sciences to the humanities. Statistics are also used for
making informed decisions – and misused for other
reasons – in all areas of business and government.
http://en.wikipedia.org/wiki/Statistics
Copyright © 2007 Oracle Corporation
Definitions: Statistics
Statistical methods can be used to summarize or
describe a collection of data; this is called descriptive
statistics. In addition, patterns in the data may be
modeled in a way that accounts for randomness and
uncertainty in the observations, and then used to draw
inferences about the process or population being
studied; this is called inferential statistics. Both
descriptive and inferential statistics comprise applied
statistics.
http://en.wikipedia.org/wiki/Statistics
Copyright © 2007 Oracle Corporation
Statistical Concepts
Copyright © 2007 Oracle Corporation
Statistics & SQL Analytics
• Ranking functions
• Descriptive Statistics
• rank, dense_rank, cume_dist, percent_rank, ntile
• Window Aggregate functions
(moving and cumulative)
• Avg, sum, min, max, count, variance, stddev,
first_value, last_value
• average, standard deviation, variance, min, max, median
(via percentile_count), mode, group-by & roll-up
• DBMS_STAT_FUNCS: summarizes numerical columns
of a table and returns count, min, max, range, mean,
stats_mode, variance, standard deviation, median,
quantile values, +/- n sigma values, top/bottom 5 values
• Correlations
• LAG/LEAD functions
• Direct inter-row reference using offsets
• Reporting Aggregate functions
• Sum, avg, min, max, variance, stddev, count,
ratio_to_report
• Statistical Aggregates
• Correlation, linear regression family, covariance
• Linear regression
• Fitting of an ordinary-least-squares regression line
to a set of number pairs.
• Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions.
• Pearson’s correlation coefficients, Spearman's and
Kendall's (both nonparametric).
• Cross Tabs
• Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
• Hypothesis Testing
• Student t-test , F-test, Binomial test, Wilcoxon Signed
Ranks test, Chi-square, Mann Whitney test, KolmogorovSmirnov test, One-way ANOVA
• Distribution Fitting
• Kolmogorov-Smirnov Test, Anderson-Darling Test, ChiSquared Test, Normal, Uniform, Weibull, Exponential
Note: Statistics and SQL Analytics are included in Oracle
Database Standard Edition
Copyright © 2007 Oracle Corporation
Descriptive Statistics
• MEDIAN & MODE
•
> SQL
•
Median: takes numeric or datetype values and returns the middle
value
Mode: returns the most common value
A. SELECT STATS_MODE(EDUCATION) from CD_BUYERS;
B. SELECT MEDIAN(ANNUAL_INCOME) from CD_BUYERS;
C. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from
CD_BUYERS GROUP BY EDUCATION;
D. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from
CD_BUYERS GROUP BY EDUCATION ORDER BY
MEDIAN(ANNUAL_INCOME) ASC;
Copyright © 2007 Oracle Corporation
DBMS_STAT_FUNCS Package
SUMMARY procedure
• The SUMMARY procedure is used to summarize a numerical column
(ADM_PULSE); the summary is returned as record of type summaryType
> SQL
DECLARE
v_ownername varchar2(8);
v_tablename varchar2(50);
v_columnname varchar2(50);
v_sigma_value number;
type n_arr1 is varray(5) of number;
type num_table1 is table of number;
s1 dbms_stat_funcs.summaryType;
BEGIN
v_ownername
:= 'cberger';
v_tablename
:= 'LYMPHOMA';
v_columnname := 'ADM_PULSE';
v_sigma_value := 3;
dbms_stat_funcs.summary(p_ownername=> v_ownername, p_tablename=> v_tablename, p_columnname=>
v_columnname, p_sigma_value=> v_sigma_value, s=> s1);
END;
/
Copyright © 2007 Oracle Corporation
DBMS_STAT_FUNCS Package
SUMMARY procedure
• The SUMMARY procedure is used to summarize a numerical column
(ADM_PULSE); the summary is returned as record of type summaryType
> SQL
set echo off
connect CBERGER/CBERGER@ora10gr2
set serveroutput on
set echo on
declare
s DBMS_STAT_FUNCS.SummaryType;
begin
DBMS_STAT_FUNCS.SUMMARY('CBERGER','LYMPHOMA','ADM_PULSE',3,s);
dbms_output.put_line('SUMMARY STATISTICS');
dbms_output.put_line('Count: '||s.count);
dbms_output.put_line('Min: '||s.min);
dbms_output.put_line('Max: '||s.max);
dbms_output.put_line('Range: '||s.range);
dbms_output.put_line('Mean: '||round(s.mean));
dbms_output.put_line('Mode Count: '||s.cmode.count);
dbms_output.put_line('Mode:
'||s.cmode(1));
dbms_output.put_line('Variance: '||round(s.variance));
dbms_output.put_line('Stddev:
'||round(s.stddev));
dbms_output.put_line('Quantile 5 '||s.quantile_5);
dbms_output.put_line('Quantile 25 '||s.quantile_25);
dbms_output.put_line('Median
'||s.median);
dbms_output.put_line('Quantile 75 '||s.quantile_75);
dbms_output.put_line('Quantile 95 '||s.quantile_95);
dbms_output.put_line('Extreme Count: '||s.extreme_values.count);
dbms_output.put_line('Extremes: '||s.extreme_values(1));
dbms_output.put_line('Top 3:
'||s.top_5_values(1)||','||s.top_5_values(2)||','||s.top_5_values(3));
dbms_output.put_line('Bottom 3: '||s.bottom_5_values(5)||','||s.bottom_5_values(4)||','||s.bottom_5_values(3));
end;
/
Copyright © 2007 Oracle Corporation
DBMS_STAT_FUNCS Package
SUMMARY procedure
• A subset of data
that is returned
after execution of
the PL/SQL
package
“summarizes” the
use of the different
SUMMARY
procedures
Copyright © 2007 Oracle Corporation
Summary Statistics and Histograms
• Oracle Data
Miner (gui
for Oracle
Data Mining
Option)
provides
graphical
histograms
with
summary
statistics
Copyright © 2007 Oracle Corporation
Hypothesis Testing
• Parametric Tests
• Parametric tests make some
assumptions about the data—
typically that the data is
normally distributed among
other assumptions
• Oracle 10g parametric
hypothesis tests include:
• T-test
• F-test
• One-Way ANOVA
Copyright © 2007 Oracle Corporation
T-Test
• T-tests are used to measure the significance of
a difference of means.
• T-tests include the following:
•
•
•
•
One-sample T-test
Paired-samples T-test
Independent-samples T-test (pooled variances)
Independent-samples T-test (unpooled variances)
Copyright © 2007 Oracle Corporation
Basic Example
• Compare
difference in blood
pressures
between people
who eat meat
frequently vs.
don’t
Copyright © 2007 Oracle Corporation
One-Sample T-Test
STATS_T_TEST_*
The t-test functions are:
STATS_T_TEST_ONE: A one-sample t-test
STATS_T_TEST_PAIRED: A two-sample, paired t-test (also known as
a crossed t-test)
STATS_T_TEST_INDEP: A t-test of two independent groups with the
same variance (pooled variances)
STATS_T_TEST_INDEPU: A t-test of two independent groups with
unequal variance (unpooled variances)
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm
Copyright © 2007 Oracle Corporation
One-Sample T-Test
• Query compares the mean of SURVIVAL_TIME
to the assumed value of 35:
SELECT avg(SURVIVAL_TIME_MO) group_mean,
stats_t_test_one(SURVIVAL_TIME_MO, 35,
'STATISTIC') t_observed,
stats_t_test_one(SURVIVAL_TIME_MO, 35)
two_sided_p_value
FROM LYMPHOMA;
• Returns the observed t value and its related
two-sided significance
SQL Worksheet
Copyright © 2007 Oracle Corporation
Paired Samples T-Test
• Query compares the mean of LOGWT for Pig Weights in
Week 3 to Week 8, grouped by Diet:
SELECT substr(diet,1,1) as diet, avg(LOGWT3)
logwt3_mean,
avg(LOGWT8) logwt8_mean,
stats_t_test_paired(LOGWT3, LOGWT8,'STATISTIC')
t_observed,
stats_t_test_paired(LOGWT3, LOGWT8) two_sided_p_value
FROM CBERGER.PIGLETS3
GROUP BY ROLLUP(DIET)
ORDER BY 5 ASC;
• Returns the observed t value and its related two-sided
significance
SQL Worksheet
Copyright © 2007 Oracle Corporation
Independent Samples T-Test
(Pooled Variances)
• Query compares the mean of AMOUNT_SOLD between
MEN and WOMEN within CUST_INCOME_LEVEL ranges
SELECT substr(cust_income_level,1,22) income_level,
avg(decode(cust_gender,'M',amount_sold,null)) sold_to_men,
avg(decode(cust_gender,'F',amount_sold,null)) sold_to_women,
stats_t_test_indep(cust_gender, amount_sold, 'STATISTIC','F')
t_observed,
stats_t_test_indep(cust_gender, amount_sold) two_sided_p_value
FROM sh.customers c, sh.sales s
WHERE c.cust_id=s.cust_id
GROUP BY rollup(cust_income_level)
ORDER BY 1;
SQL Worksheet
Copyright © 2007 Oracle Corporation
Independent Samples T-Test
(Pooled Variances)
Copyright © 2007 Oracle Corporation
F-Test
• Query compares the variance in the SIZE_TUMOR
between MALES and FEMALES
SELECT variance(decode(GENDER,'0', SIZE_TUMOR_MM, null)) var_tumor_men,
variance(decode(GENDER,'1', SIZE_TUMOR_MM,null)) var_tumor_women,
stats_f_test(GENDER, SIZE_TUMOR_MM, 'STATISTIC', '1') f_statistic,
stats_f_test(GENDER, SIZE_TUMOR_MM) two_sided_p_value
FROM CBERGER.LYMPHOMA;
• Returns observed f value and two-sided significance
SQL Worksheet
Copyright © 2007 Oracle Corporation
F-Test
• Query compares the variance in the SIZE_TUMOR
between males and females Grouped By GENDER
SELECT GENDER,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'F_RATIO') f_ratio,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'SIG') p_value, AVG(SIZE_REDUCTION)
FROM CBERGER.LYMPHOMA
GROUP BY GENDER ORDER BY GENDER;
• Returns observed f value and two-sided significance
SQL Worksheet
Copyright © 2007 Oracle Corporation
One-Way ANOVA
• In statistics, analysis of variance (ANOVA, or—
sometimes—A.N.O.V.A.) is a collection of statistical
models, and their associated procedures, in which
the observed variance is partitioned into
components due to different explanatory variables.
• Example
• Group A is given vodka, Group B is given gin, and Group C
is given a placebo. All groups are then tested with a memory
task. A one-way ANOVA can be used to assess the effect of
the various treatments (that is, the vodka, gin, and placebo).
http://en.wikipedia.org/wiki/Statistics
Copyright © 2007 Oracle Corporation
One-Way ANOVA
• Query compares the average SIZE_REDUCTION within different
TREATMENT_PLANS Grouped By LYMPH_TYPE:
SELECT LYMPH_TYPE,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'F_RATIO') f_ratio,
stats_one_way_anova(TREATMENT_PLAN,
SIZE_REDUCTION,'SIG') p_value
FROM CBERGER.LYMPHOMA
GROUP BY LYMPH_TYPE ORDER BY 1;
• Returns one-way ANOVA significance and split by LYMPH_TYPE
Copyright © 2007 Oracle Corporation
Hypothesis Testing
(Nonparametric)
• Nonparametric tests are used when certain assumptions
about the data are questionable.
• This may include the difference between samples that are
not normally distributed.
• All tests involving ordinal scales (in which data is ranked)
are nonparametric.
• Nonparametric tests supported in Oracle Database 10g:
•
•
•
•
Binomial test
Wilcoxon Signed Ranks test
Mann-Whitney test
Kolmogorov-Smirnov test
Copyright © 2007 Oracle Corporation
Customer Example
"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)
Copyright © 2007 Oracle Corporation
?x
Correlation Functions
• The CORR_S and CORR_K
select CORR_S(AGE, WEIGHT)
functions support nonparametric or
coefficient,
rank correlation (finding correlations
CORR_S(AGE, WEIGHT,
between expressions that are ordinal
'TWO_SIDED_SIG')
scaled).
p_value,
• Correlation coefficients take on a
substr(TREATMENT_PLAN, 1,15)
value ranging from –1 to 1, where:
• 1 indicates a perfect relationship
• –1 indicates a perfect inverse
relationship
• 0 indicates no relationship
as TREATMENT_PLAN
from CBERGER.LYMPHOMA
GROUP BY TREATMENT_PLAN;
• The following query determines
whether there is a correlation
between the AGE and WEIGHT of
people, using Spearman's
correlation:
Copyright © 2007 Oracle Corporation
Cross Tabulations
• This query analyzes the strength of the association between
TREATMENT_PLAN and GENDER Grouped By LYMPH_TYPE
using a cross tabulation:
SELECT LYMPH_TYPE,
stats_crosstab(GENDER, TREATMENT_PLAN,
'CHISQ_OBS') chi_squared,
stats_crosstab(GENDER, TREATMENT_PLAN,
'CHISQ_SIG') p_value,
stats_crosstab(GENDER, TREATMENT_PLAN,
'PHI_COEFFICIENT') phi_coefficient
FROM CBERGER.LYMPHOMA
GROUP BY LYMPH_TYPE ORDER BY 1;
• Returns the observed p_value and phi coefficient significance:
Copyright © 2007 Oracle Corporation
Cross Tabulations
• STATS_CROSSTAB function takes as arguments two expressions
(the two variables being analyzed) and a value that determines which test to
perform. These values include the following:
•
•
•
•
•
•
•
CHISQ_OBS (observed value of chi-squared)
CHISQ_SIG (significance of observed chi-squared)
CHISQ_DF (degree of freedom for chi-squared)
PHI_COEFFICIENT (phi coefficient)
CRAMERS_V (Cramer’s V statistic)
CONT_COEFFICIENT (contingency coefficient)
COHENS_K (Cohen’s kappa)
• Function returns all values as specified by the third argument (default
is CHISQ_SIG)
Copyright © 2007 Oracle Corporation
Distribution-Fitting Functions
• Distribution-fitting functions in Oracle Database 10g
include the following
•
•
•
•
•
NORMAL_DIST_FIT function
UNIFORM_DIST_FIT function
POISSON_DIST_FIT function
WEIBULL_DIST_FIT function
EXPONENTIAL_DIST_FIT function
• These functions test how well a sample of values “fits” a
particular distribution
• The IN parameter of each function specifies which of the
tests to use to measure the fit
Copyright © 2007 Oracle Corporation
Copyright © 2007 Oracle Corporation
Opportunities for Use Cases
• Control charts
• Set flags on your data—e.g. when a value is above 3 sigma
Copyright © 2007 Oracle Corporation
Opportunities for Use Cases
• Construction of a
Control Chart
1.Calculate means
and ranges for
each “sample”
2.Chart
3.Apply out-ofcontrol rules e.g.
outside of 3 sigma
Copyright © 2007 Oracle Corporation
Opportunities for Use Cases
• Construction of a
Control Chart
1.Calculate means
and ranges for
each “sample”
2.Chart
3.Apply out-ofcontrol rules e.g.
outside of 3 sigma
Copyright © 2007 Oracle Corporation
Customer Example
"..Our experience suggests that Oracle 10g Statistics and Data Mining
features can reduce development effort of analytical systems by an
order of magnitude."
Sumeet Muju
Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)
Copyright © 2007 Oracle Corporation
http://www.oracle.com/technology/products/bi/stats_fns/index.html
Copyright © 2007 Oracle Corporation
In-Database Statistics
Advantages
Oracle 10g DB
• Data remains in the database at all
times…with appropriate access security
control mechanisms—fewer moving parts
• Straightforward inclusion within interesting
and arbitrarily complex queries
• Real-world scalability—available for mission critical appls
Data Warehousing
ETL
OLAP
Statistics
Data Mining
Copyright © 2007 Oracle Corporation
Industry Analysts
PREDICTIVE ANALYTICS: Extending the Value of Your
Data Warehousing Investment, By Wayne W. Eckerson
“…According to our survey, most organizations plan to significantly
increase the analytic processing within a data warehouse database in
the next three years, particularly for model building and scoring, which
show 88% climbs. The amount of data preparation done in databases
will only climb 36% in that time, but it will be done by almost two-thirds
of all organizations (60%)—double the rate of companies planning to
use the database to create or score analytical models.”
“…it’s surprising that about one-third of organizations plan to build
analytical models in databases within three years.”
“‘We leverage the data warehouse database when possible,’ says one
analytics manager. He says most analysts download a data sample to
their desktop and then upload it to the data warehouse once it’s
completed. ‘Ultimately, however, everything will run in the data
warehouse,’ the manager says.”
http://download.101com.com/pub/tdwi/Files/PA_Report_Q107_F.pdf
Copyright © 2007 Oracle Corporation
Analytics vs.
1. In-Database Analytics Engine
Basic Statistics (Free)
Data Mining
Text Mining
1. External Analytical Engine
Basic Statistics
Data Mining
Text Mining (separate: SAS EM for Text)
Advanced Statistics
2. Costs (ODM: $20K cpu)
Simplified environment
Single server
Security
2. Costs (SAS EM: $150K/5 users)
Duplicates data
Annual Renewal Fee (AUF)
3. IT Platform
SQL (standard)
Java (standard)
3. IT Platform
SAS Code (proprietary)
(~45% each year)
Oracle 11g DB
Data Warehousing
ETL
OLAP Statistics
Data Mining
Copyright © 2007 Oracle Corporation
Analytics vs.
1. In-Database Analytics Engine
Basic Statistics (Free)
Data Mining
Text Mining
1. External Analytical Engine
Basic Statistics
Data Mining
Text Mining (separate: SAS EM for Text)
Advanced Statistics
2. Costs (ODM: $20K cpu)
Simplified environment
Single server
Security
2. Costs (SAS EM: $150K/5 users)
Duplicates data
Annual Renewal Fee (AUF)
3. IT Platform
SQL (standard)
Java (standard)
3. IT Platform
SAS Code (proprietary)
Oracle 11g DB
Data Warehousing
(~45% each year)
Oracle 11g DB
Data Warehousing
ETL
ETL
OLAP Statistics
OLAP Statistics
Data Mining
Data Mining
Copyright © 2007 Oracle Corporation
SAS In-Database Processing
3-Year Road Map
• “The goal of the SAS In-Database initiative is … to
achieve deeper technical integration with database
providers, but … also … blends the best SAS data
• integration and analytics with the core strengths
of databases..”
• …Like all DBMS client applications, the SAS engine often must
load and extract data over a network to and from the DBMS. This
presents a series of challenges:
• …Network bottlenecks between SAS and the DBMS constrain
access to large volumes of data
The best practice today is to read data into the SAS environment for processing.
For highly repeatable processes, this might not be efficient because it takes time
to transfer the data and resources are used to temporarily store in the SAS
environment. In some cases, the results of the SAS processing must be
transferred back to the DBMS for final storage, which further increases the cost.
Addressing this challenge can result in improved resource utilization and enable
companies to answer business questions more quickly.
• Oracle Data Mining is available today
Source: SAS In-Database Processing White Paper—October 2007
Copyright © 2007 Oracle Corporation
SAS In-Database Processing
3-Year Road Map…
“It boils down to this simple equation:
Less data movement = faster analytics,
and faster analytics = faster delivery of
real-time BI throughout an enterprise.”
Source:
http://www.teradata.com/t/pdf.aspx?a=83673&b=178909
Use SAS® to get more power out of your database
Move key components of BI, analytics and data integration processes from the server
or desktop to inside the database and help shorten your time to intelligence
Copyright © 2007 Oracle Corporation
IDC Worldwide Business Analytics Software
Oracle
http://www.oracle.com/corporate/analyst/reports/infrastructure/bi_dw/208699e.pdf
Copyright © 2007 Oracle Corporation
References
1.
2.
3.
4.
5.
6.
7.
“Back to Basics” Understanding and Visualising Variation in
Data.Pete Ceuppens, Robert Shaw, Zhiping You. AstraZeneca R&D.
QuickStart: Oracle Statistics Release 10gR2. Charlie Berger, Oracle
Corporation. April, 2007.
Oracle® Database SQL Reference 10g Release 2 (10.2) Part Number:
B14200-02. December 2005
Applied Linear Statistical Models. John Neter, William Wasserman,
Michael H. Kutner. IRWIN 1985.
Mathematical Statistics with Applications. Mendenhall, Scheffer,
Wackley. Duxbury Press, Boston, MA. 1981
Oracle Database Data Warehousing Guide 10g Release 2 (10.2) Part
Number: B14223-02 December 2005
Oracle Technology Network:
http://www.oracle.com/technology/products/bi/stats_fns/index.html
Source: Oracle 10gR2 Statistics Functions, OLSUG08 Workshop, Henri B. Tuthill, AstraZeneca & Charlie Berger, Oracle
Copyright © 2007 Oracle Corporation
Hands-on Exercises
• Quick Start Statistics
Copyright © 2007 Oracle Corporation
<Insert Picture Here>
More Information:
Oracle Data Mining 10g
• oracle.com/technology/products/bi/odm/index.html
Oracle Statistical Functions
• http://www.oracle.com/technology/products/bi/stats_fns/index.html
Oracle Business Intelligence Solutions
• oracle.com/bi
Contact Information:
Email: [email protected]
Copyright © 2007 Oracle Corporation
Q U E S T I O N S
A N S W E R S
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”