Download Storage - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Dawning of the Age
of
Infinite Storage
William Perrizo
Dept of Computer Science
North Dakota State Univ.
Google 10100
Tera Bytes are Here
 1 TB costs  1k$ to buy
 1 TB costs 300k$/y to own
Management & curation are expensive
 Searching 1TB takes hours
 I’m Terrified by TeraBytes
 I’m Petrified by PetaBytes
 I’ll soon be Exafied byExaBytes
We are here
...
Yotta
1024
Zetta
1021
Exa
1018
Peta
1015
Tera
1012
Giga
109
Mega
106
 I’m too old to ever be Zettafied by ZettaBytes
Kilo
103
But you may be in your lifetime
You may even be Yottafied by YottaBytes
You probably won’t ever be Googified by GoogiBytes
But one should “never say never”.
How much information is there?
 Soon everything can be
recorded and indexed.
 Most bytes will never be
seen by humans.
 Data summarization,
trend detection,
anomaly detection,
data mining,
are key technologies
Everything
!
Recorded
All Books
MultiMedia
All books
(words)
.Movi
e
A Photo
A Book
10-24
Yocto,
10-21
zepto,
10-18
atto,
10-15
femto,
10-12
pico,
10-9
nano,
10-6
micro,
10-3
milli
Yotta
Zetta
Exa
Peta
Tera
Giga
Mega
Kilo
First Disk 1956
 IBM 305 RAMAC
 4 MB
 50x24” disks
 1200 rpm
 100 ms access
 35k$/y rent
 Included computer &
accounting software
(tubes not transistors)
Me, at13.
1.6 meters
10 years later
30 MB
The Cost of Storage about 1K$/TB
12/1/1999
Price vs disk capacity 9/1/2000
k$/TB
Price vs disk capacity
9/1/2001
Price vs disk capacity
y = 17.9x
SCSI
IDE
$
IDE
8.0
15
20
y = 13x
20
0
40
= 2.0x
80
8.0
054.0 9.0
0 7.0 10
03.0 8.0
06.07.0
2.0
$
200
y=x
0
50
100
150
Raw Disk
unit Size
50
100
150GB
200
Raw Disk unit Size GB
20
rawSCSI 6
raw
IDE k$/TB
20 k$/TB
GB
30
40
50
40
Disk unit size GB
200
250
5.0
4.0
0
3.04.0
2.03.0
1.02.0
1.0
0.0
0.0
0
60
60
80
SCSI
6.0
0.0
50
100
150
Raw Disk unit Size GB
200
0
10.0
1.05.0
IDE
y = 2x
0 0
5
10
5.0
11/4/2003
y=x
400
10.0
7.0
IDE
raw
k$/TB
6.09.0
60
y
20
40
60
Raw Disk unit Size GB
SCSI
SCSI
10
15
y = 6.7x
SCSI
25
$
y = 7.2x
SCSI
0
800
600
20 9.0
SCSI
IDE
raw
k$/TB
10.0
25
30
$
$
200
30
35
Price vs
disk capacityy = 6x
IDE
SCSI
IDE
y = 3.8x
GB
$
400
35
40
4/1/2002
Price vs disk capacity
800 200
600
40
$
$
$
$
1000
900
1000
800
900
700
800
1400 600
700
500
1200 600
400
500
300
14001000 400
200
800 300
100
12001400200
600 0
100
10001200 0 0
400
0
1000
50
SCSI
IDE
100
150
Disk unit size GB
200
IDE
0
50
50
100
150
200
Disk unit size GB
Disk100
unit size150
GB
200
250
E.g., A recent Purchase Order
Company:
Date:
System Board:
Processor:
Hard Drives:
Controller:
2nd IDE Controller:
Video:
Diskette Drive:
Memory:
CD/DVD Drive:
Sound:
Case:
Keyboard:
Mouse:
Operating System:
Network Cards:
Price:
NDSU
8/7/03
Intel D865 GBFL system board w/LAN 800mhz FSB
Intel Pentium 4 2.6 GHz
4 x 250 GB IDE (total = 1 TB)
Onboard IDE Controller
Main expense is here
Integrated
1.44 MB
4 GB 400 mhz memory
DVD/CDRW
Integrated AC97 Audio w/Soundmax
Performance Minitower ATX w/300 Watt PS
Microsoft 104 Internet keyboard
Microsoft Intellimouse Optical
none
Integrated Intel 10/100 Ethernet w/D845GEBV2L board
$2,899.00
Kilo
Mega
Giga
Tera
Peta
Exa
Zetta
Yotta
Disk Evolution
Memex
As We May Think, Vannevar Bush, 1945
“A memex is a device in which an
individual stores all his books, records,
and communications, and which is
mechanized so that it may be consulted
with exceeding speed and flexibility”
“yet if the user inserted 5000 pages of
material a day it would take him
hundreds of years to fill the repository,
so that he can enter material freely”
Trying to fill a terabyte in a year
Item
Items/TB
Items/day
300 KB JPEG
3M
9,800
1 MB Doc
1M
2,900
1 hour 256 kb/s MP3
audio
9K
26
1 hour 1.5 Mbp/s MPEG
video
290
0.8
The Personal Terabyte
How Will We Find Anything?
 Need Queries, Indexing, Data Mining,
Pivoting, Scalability, Backup, Replication,
Online update, Set-oriented access.
 If you don’t use a DBMS, you will
implement one!
 Need Data Mining, Machine Learning!
 80% of data is personal/individual
 20% is Corporate, Governmental
SQL ++
DBMS
Related documents