Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ. Google 10100 Tera Bytes are Here 1 TB costs 1k$ to buy 1 TB costs 300k$/y to own Management & curation are expensive Searching 1TB takes hours I’m Terrified by TeraBytes I’m Petrified by PetaBytes I’ll soon be Exafied byExaBytes We are here ... Yotta 1024 Zetta 1021 Exa 1018 Peta 1015 Tera 1012 Giga 109 Mega 106 I’m too old to ever be Zettafied by ZettaBytes Kilo 103 But you may be in your lifetime You may even be Yottafied by YottaBytes You probably won’t ever be Googified by GoogiBytes But one should “never say never”. How much information is there? Soon everything can be recorded and indexed. Most bytes will never be seen by humans. Data summarization, trend detection, anomaly detection, data mining, are key technologies Everything ! Recorded All Books MultiMedia All books (words) .Movi e A Photo A Book 10-24 Yocto, 10-21 zepto, 10-18 atto, 10-15 femto, 10-12 pico, 10-9 nano, 10-6 micro, 10-3 milli Yotta Zetta Exa Peta Tera Giga Mega Kilo First Disk 1956 IBM 305 RAMAC 4 MB 50x24” disks 1200 rpm 100 ms access 35k$/y rent Included computer & accounting software (tubes not transistors) Me, at13. 1.6 meters 10 years later 30 MB The Cost of Storage about 1K$/TB 12/1/1999 Price vs disk capacity 9/1/2000 k$/TB Price vs disk capacity 9/1/2001 Price vs disk capacity y = 17.9x SCSI IDE $ IDE 8.0 15 20 y = 13x 20 0 40 = 2.0x 80 8.0 054.0 9.0 0 7.0 10 03.0 8.0 06.07.0 2.0 $ 200 y=x 0 50 100 150 Raw Disk unit Size 50 100 150GB 200 Raw Disk unit Size GB 20 rawSCSI 6 raw IDE k$/TB 20 k$/TB GB 30 40 50 40 Disk unit size GB 200 250 5.0 4.0 0 3.04.0 2.03.0 1.02.0 1.0 0.0 0.0 0 60 60 80 SCSI 6.0 0.0 50 100 150 Raw Disk unit Size GB 200 0 10.0 1.05.0 IDE y = 2x 0 0 5 10 5.0 11/4/2003 y=x 400 10.0 7.0 IDE raw k$/TB 6.09.0 60 y 20 40 60 Raw Disk unit Size GB SCSI SCSI 10 15 y = 6.7x SCSI 25 $ y = 7.2x SCSI 0 800 600 20 9.0 SCSI IDE raw k$/TB 10.0 25 30 $ $ 200 30 35 Price vs disk capacityy = 6x IDE SCSI IDE y = 3.8x GB $ 400 35 40 4/1/2002 Price vs disk capacity 800 200 600 40 $ $ $ $ 1000 900 1000 800 900 700 800 1400 600 700 500 1200 600 400 500 300 14001000 400 200 800 300 100 12001400200 600 0 100 10001200 0 0 400 0 1000 50 SCSI IDE 100 150 Disk unit size GB 200 IDE 0 50 50 100 150 200 Disk unit size GB Disk100 unit size150 GB 200 250 E.g., A recent Purchase Order Company: Date: System Board: Processor: Hard Drives: Controller: 2nd IDE Controller: Video: Diskette Drive: Memory: CD/DVD Drive: Sound: Case: Keyboard: Mouse: Operating System: Network Cards: Price: NDSU 8/7/03 Intel D865 GBFL system board w/LAN 800mhz FSB Intel Pentium 4 2.6 GHz 4 x 250 GB IDE (total = 1 TB) Onboard IDE Controller Main expense is here Integrated 1.44 MB 4 GB 400 mhz memory DVD/CDRW Integrated AC97 Audio w/Soundmax Performance Minitower ATX w/300 Watt PS Microsoft 104 Internet keyboard Microsoft Intellimouse Optical none Integrated Intel 10/100 Ethernet w/D845GEBV2L board $2,899.00 Kilo Mega Giga Tera Peta Exa Zetta Yotta Disk Evolution Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely” Trying to fill a terabyte in a year Item Items/TB Items/day 300 KB JPEG 3M 9,800 1 MB Doc 1M 2,900 1 hour 256 kb/s MP3 audio 9K 26 1 hour 1.5 Mbp/s MPEG video 290 0.8 The Personal Terabyte How Will We Find Anything? Need Queries, Indexing, Data Mining, Pivoting, Scalability, Backup, Replication, Online update, Set-oriented access. If you don’t use a DBMS, you will implement one! Need Data Mining, Machine Learning! 80% of data is personal/individual 20% is Corporate, Governmental SQL ++ DBMS