Download How Mpgs are Affected in Vehicles

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
How Mpgs are Affected in Vehicles:
A Model Using WEKA Supervised and
Unsupervised Analysis Tools
IT523-01N:
DATA WAREHOUSING AND DATA MINING
FINAL PROJECT
INSTRUCTOR: DR. SHEILA FOURNIERBONILLA
ELEISHA BARNETT
How Mpgs are Affected in
Vehicles
THE MODEL:
A DATASET OF 398 AUTOMOBILES WITH 8
ATTRIBUTES THAT COULD POSSIBLY AFFECT
A VEHICLE’S GAS CONSUMPTION (MILES
PER GALLON) PERFORMANCE
Which Gets Better Gas Mileage?
1908 Model T Ford?
1961 Chevrolet Corvette?
The Attributes
 Number of Cylinders
 Engine Displacement
 Horsepower
 Weight
 Acceleration
 Model (model year)
 Origin (where the car was made)
 Class (luxury, sports, sedan, coupe, etc.)
PART Analysis
 I first used the WEKA Data Analyzer doing a PART
rule classification of all 398 instances with cylinders
as the output attribute as many car manufacturers
use cylinders as an indicator of power and gas
mileage, generally meaning the smaller amount of
cylinders, the better the gas mileage, but the less
power, especially in terms of horsepower.
 Horsepower is a term whose original meaning is
somewhat archaic, indicating the number of horses
it would take to put out the same amount of power as
found in an engine.
PART Analysis
 The PART Rule generator used engine displacement to
generate the rules with the cylinders. This is important
because engine displacement plays a part in the
determination of gas mileage. To explain this further,
Engine displacement is the volume swept by all the
pistons inside the cylinders of an internal combustion
engine in a single movement from top dead center to
bottom dead center. It is commonly specified in cubic
centimeters(cc), liters (l), or (mainly in North America)
cubic inches (CID). Engine displacement does not
include the total volume of the combustion chamber
(Wikipedia, 2011).
PART Analysis
As you can see, 6 rules were
generated based on the
given attributes and output.
What we are given is
generally, the greater the
displacement, the more
cylinders a vehicle has and
also, the higher the gas
consumption. For example,
the vehicles with a rule of
displacement > 70:4
(191.0/3.0) indicate a
smaller engine, therefore
less horsepower and a
higher mpg or miles per
gallon rating. Conversely,
displacement > 258:8
(104.0/1.0) indicate a larger
engine, more horsepower,
and lesser mpg.
PART Analysis
The number of correctly
classified instances shows at
384/398 showing an accuracy
rate of 96.4824%, 14
incorrectly classified at an
accuracy rate of 3.5176%. It’s
possible that the inaccuracies
came from the odd European
cars that have 3 and 5
cylinders and thus do not fit
the usual profiles. This
actually applies to 3 cylinders
as there were not
representations of 5 cylinders.
The 3 cylinders were
represented in 2 rules of
origin = 1:4 (15.0/1.0) and
displacement > 107:3
(4.0/1.0). The interesting item
to note is that these 3
cylinders engines have the
same displacement as a
smaller 6 (displacement >
107:6(6.0/1.0) cylinder and
presumably the same mpg
rating.
J48 Decision
Tree Analysis
As we can see by this J48
decision tree, the analysis
breaks down the dataset
further to show how origin
of a vehicle might influence
mpgs, however, the data
indicates that there is little
merit to this, but we will
examine this further in the
clusters analysis. In the
meantime, the J48 bears
out the same analysis as
PART but breaks the
analysis down further. In
J48 analysis, it presents a
slightly more accurate
picture than PART.
J48 Analysis
In this case, 386 (96.9849%)
instances are correctly
classified and only 12
(3.0151%) instances
incorrectly classified. This sets
our TRUE Positive rate at 1
versus a FALSE Positive rate
of o.003, which means that
we can be 100% confident in
the correlation of the data in
the rule of IF displacement
<=144 AND cylinder < 6
THEN high mpg. IF
displacement > 156 AND
cylinder <= 6 THEN low
mpg. The TP and FP rate is
calculated based on the
confusion matrix. We take the
two classifications, add them
together to get the predictive
number and then divide the
true positive number by the
predictive number.
Cluster
Analysis
In cluster analysis, we must
decide if there are
associations and if they are
worth further study. In this
case, we use a rough measure
of attribute significance to
accomplish this. Specifically,
for each attribute, subtract
the attribute means for the
two clusters and divide the
absolute value of this result
by the domain standard
deviation for the attribute.
Computations near or
greater than one indicate
attributes that have been
clearly differentiated by the
clustering. If there are no
such attributes, the
clustering is of little interest.
Cluster
Analysis
As we can see by the next
slide, the differentials of the
different attributes did not
show at or near 1 and so we
must conclude that this
cluster analysis is not worth
exploring. However, as we
see in the final analysis, it
may be a faulty line of
reasoning.
Cluster Analysis
CYLINDER AS THE OUTPUT ATTRIBUTE
DISPLACEMENT = 241.249-193.4259/104.2698 = 0.46
HORSEPOWER = 118.181-104.4694/38.1992 = 0.36
WEIGHT = 3342.1622-2970.4246/846.8418 = 0.44
ACCELERATION = 15.0564-15.5681/2.7577 = 0.19
CLASS = 20.0135-23.5146/7.816 = -0.45
Linear
Regression
Analysis
In our final analysis, we
will be looking at linear
regression. The purpose of
regression analysis is to
come up with an equation
of a line that fits through
that cluster of points with
the minimal amount of
deviations from the line.
The deviation of the points
from the line is called
"error." Once I have this
regression equation, I
could use this information
to predict class. Simple
linear regression is actually
the same as a bivariate
correlation between the
independent and
dependent variable
(Princeton, 2011).
Linear
Regression
Analysis
I can use linear regression
to predict values of one
variable, given values of
another variable. If I plot
the values on a graph, with
cylinder on the x axis and
displacement on the y axis,
for example, then the result
is a linear relationship
between cylinder and
displacement showing a
cluster of points on the
graph which slopes upward.
Linear
Regression
Analysis
However, some very
interesting results
presented here. While the
cylinder/displacement
relationship bore true,
following the slope
upward, it indicates that
there are other factors in
determining mpg. The
clusters grow stronger
through horsepower,
weight, and acceleration,
weakening in model year
and origin, and becoming
strong again in class.
Linear
Regression
Analysis
Due to incompatibility
issues with the WEKA
autompg.arff file and
Excel, I was unable to
copy and paste into
Excel and run a LINEST
analysis which is why I
ran the WEKA
visualization. However,
I was able to snip and
paste the data onto this
presentation so as to
give one the instances
and attributes used.
Conclusion
WHAT CAN WE CONCLUDE FROM THESE ANALYSES?
•E N G I N E S I Z E D O E S P L A Y A R O L E I N G A S O L I N E C O N S U M P T I O N
•H O W E V E R , O T H E R A T T R I B U T E S N E E D T O B E C O N S I D E R E D I N
DETERMINING GAS MILEAGE OR MPG.
•T H E S E A T T R I B U T E S I N C L U D E W E I G H T , A C C E L E R A T I O N ,
HORSEPOWER, AND CLASS OF VEHICLE
•I T I S P R U D E N T T O U S E M O R E T H A N O N E A N A L Y S I S T O O L
•W H I L E N E I T H E R T H E M O D E L T N O R T H E C O R V E T T E S H O W N
IN SLIDE 3 WERE PART OF THE DATASET, THE MODEL T WINS
AT 25 MPG VERSUS THE CORVETTE AT 8 MPG
Conclusion
THE FORD MODEL T USED A 177 CUBIC INCH (2.9 L)
INLINE 4 CYLINDER ENGINE. IT WAS PRIMARILY A
GASOLINE ENGINE, BUT IT HAD MULTIFUEL ABILITY AND
COULD ALSO BURN KEROSENE OR ETHANOL. IT
PRODUCED 20 HP FOR A TOP SPEED OF 45 MPH.
THE CHEVROLET CORVETTE USED A 327 CU IN (5.36 L) V8
8 CYLINDER ENGINE AND WAS STRICTLY A GAS ENGINE.
IT PRODUCED 340 HP FOR A TOP SPEED OF 130 MPH
References

http://en.wikipedia.org/wiki/Engine_displacement accessed 29 May 11

Roiger, R. J.; Geatz, M. W., Data Mining (2003). A Tutorial-Based Primer, Addison Wesley,
Boston, MA

Marakas, G. M. (2003). Modern data warehousing, mining, and visualization: core concepts.
Upper Saddle River, NJ: Prentice Hall

The University of Waikato (WEKA) http://www.cs.waikato.ac.nz/ml/weka/

http://tunedit.org/search?q=arff accessed 27 May 11

Barnett, Eleisha (2011)

Photos courtesy of Eleisha Barnett

http://en.wikipedia.org/wiki/Chevrolet_Corvette accessed 30 May 11

http://en.wikipedia.org/wiki/Ford_Model_T_engine accessed 30 May 11