Download to get the file

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Principal component regression wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
SIDA – AN OVERVIEW
Focus of research:
method development
• Wavelets in statistics:
(David Donald, Lachlan McKinna, Yvette
Everingham)
• Tree-based methods (Tim Hancock)
• Assembling methods for improved
performance ( Christine , Lewis)
• Categorical data analysis (Mike)
• Grid computing ( Nigel Sim)
• Management Statistics (Daniel Zamykal)
. Software development (Stefan)
Danny Coomans
Regression problem
y1
.
.
.
.
.
.
.
.
yn
x11 . . .
. .
.
.
.
.
x1d
xn1
xnd

F *  argmin E y,x ( y, F ( x))  argmin Ex[ E y (  F ( x))| x]
F
F
Activity of tamiflu = genes + life style + …
What to do?
Linear regression
Polynomial regression
Partial least squares
Least median of squares
Multivariate regression splines
Neural networks
Support vector machines
Regression trees
Bayesian linear regression
Generalised linear models
Projection pursuit regression
Deep regression
Local regression
Weighed least squares
Try them all and take the
best one ?
What to do?
Linear regression
Polynomial regression
Partial least squares
Least median of squares
Multivariate regression splines
Neural networks
Support vector machines
Regression trees
Bayesian linear regression
Generalised linear models
Projection pursuit regression
Deep regression
Local regression
Weighed least squares
……..
Try them all and take the
best one ?
WRONG
What to do?
Linear regression
Polynomial regression
Partial least squares
the
Least median of squares
Multivariate regression splines
Neural networks
Support vector machines
Regression trees
Bayesian linear regression
Projection pursuit regression
Deep regression
Local regression
Weighed least squares
……..
Try them all and take
best one ?
Answer: Combine Them!
Christine Smyth, Nigel Sim and Lewis
Anderson
GRID computing
• Ensemble learning
• Randomisation methods:
Bootstrapping
Permutation tests
Genetic Algorithms
Monte Carlo Crossvalidation
MCMC etc
Nigel Sim
NIR spectrometry
• Rapid Assessment
• Quality monitoring of Avocados,
• Sandalwood
• Wine
• Sugar Cane
Yvette, Mike,
Danny
Ron
1
Normalised absorbance
0.5
0
-0.5
-1
-1.5
400
600
800
1000
1200
1400
1600
Wavelength (nm)
1800
2000
2200
2400
NIR Spectra
Sucrose Concentration
Fructose Concentration
Glucose Concentration
250 wavelengths of NIR specta
125 training samples
21 evaluation samples
Wavelets
• Demonstrate the use of
adaptive wavelets in different
situation:
Regression
Experimental design
Clustering
Classification etc.
David Donald and Lachlan McKinna
2D Wavelet Transform of SST
Anomalies – Lachlan McKinna
Approximation A1
2D WT
Horizontal Detail H1
10
10
20
20
30
30
40
40
50
50
60
60
Smooth
20 40 60 80 100 120 140
20 40 60 80 100 120 140
Horizontal
Vertical Detail V1
Diagonal Detail D1
20
20
40
40
60
60
Vertical
20 40 60 80 100 120 140
Diagonal
20 40 60 80 100 120 140
Sustaining our natural resources:
Dairying for tomorrow.
• Analysis of survey results examining
natural resource management on
Australian dairy
farms
• Investigating the specific management
practices
responsible for producing
greater than expected yield, given the
nutritional inputs
Daniel Zamykal
Mike Steele
Main research area:
•looking at the power of goodness-of-fit tests (eg
Chi-Square, Kolmogorov-Smirnov,…)
•use Monte Carlo simulation techniques for this
as it is easy!
•Also do minor statistical consulting work. This
has led to publications with a couple of
cardiologists. More of these coming.
Multivariate profiling
Tim Hancock
Applications
• Climate variability, NIR (Yvette)
• Forensic , NIR, … (Mike)
• Clinical symptom profiling
(Danny) …
• Aerodrome weather (Keith
Ross)
Research Advancement
Programs
• Computational Life Sciences
• AVANTI = ageing,
veins,arteria,nutrition,trials,
information
• ($750,000 , 3 years)
Computational Life
Sciences
• Data mining/modeling
Tropical diseases: Qfever,
meliodosis, viral characterisation.
Images : retina/diabetes data
analysis/diabetes
• Grid computing : genetic data
Aims@jcu :
th
4
program
• Statistical data mining in Drug
Discovery
• Prediction of biological activities on
the basis of chemical fingerprints
• Prediction of biological activities on
the basis of molecular descriptors
Regression methods
Micro-arrays and Drug Discovery
• Drug activity
patterns
• Different tumor
cell lines
• Drug
molecules
• Molecular
structure
descriptors
AVANTI : statistical
modeling
• Chronic diseases
diabetes, aortic aneurism
chronic inflammation.
Symptom profiles == biological inf
Image features == disease status
Time profiles == disease status
p
m
Yij    h xhij   U hi zhij  Rij
h 0
Aortic Aneurism
h0
prediction of time to reach some critical
threshold(5cm)
Time to
critical
threshold
p
m
h 0
h0
Yij    h xhij   U hi zhij  Rij
Tim Hancock and Danny Coomans
Time profile
Risk factors:
Tabocco
HT
CAD
COPD
CRF
PAD
Diabetes etc
Are the military in Iraq properly protected
against
chemical attacks?
• Are the antidote drugs they
carry resistant to large
temperature variations ?
• Longitudinal Analysis.
New PhD student co-supervision.
Work-Flows and Data Mining
Input Dataset
•
•
Method 1
•
Method 2
Work-flows make combining
statistical algorithms easy
The results from each algorithm flow
from one node to the next, making a
combination of techniques easy.
Output and graphics can be viewed
at each intermediate stage in the
work-flow
1st Output
Data
2nd
Output
Data
Stefan Aberhard