* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IBM Presentations: Blue Pearl DeLuxe template
Survey
Document related concepts
Transcript
IBM Research Division The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin - Madison IBM T.J. Watson Research Center July 22 and 23, 2008 July 22, 2008 © 2007 IBM Corporation IBM Research Division 50B Transistors on a Chip? History – 1997 IEEE Computer Special Issue, 1B T/chip by 2007 • • • • • 3 papers advocate single fast core – CMU, Michigan, Wisconsin IRAM – Berkeley RAW – MIT SMT – Washington Multicore – Stanford 11 years later, 50x more transistors – We still need faster cores : computation • Fundamentally constrained by power – Will get more than one core : communication • Need efficient interconnects and coherent caches – Will get lots of on-chip memory • Need to think about new algorithms and new approaches to use it 2 July 22, 2008 IBM Research Division (1) What Will We Do With 50B Transistors? 50B transistors/chip dramatically alters data centers E.g. Nokia moving aggressively into services – Google, Yahoo, MSN each provision ~1M servers – Now provision for 10x installed base (phone vs. PC) • Witness recent problems with Iphone/MobileMe Impossible to anticipate applications – Youtube/Facebook/Flickr/Twitter – Unstructured real world data – Organize, search, extract semantic knowledge, mashups, … Existing and future server apps all benefit 3 July 22, 2008 IBM Research Division (2) How Will We Design Chips with 50B Transistors Three things that processors need to be good at: – Computation – Communication – Storage/Memory Focus on cost and nature of computation Focus on cost of communication Shift emphasis to memory 4 July 22, 2008 IBM Research Division Cost of Computation Less than 10% of energy spent on useful work – EPI overhead has gotten out of hand – Need to rethink operand delivery [ICCD’07], queues [ISPLED’07], caches, register files, control, … Exploit program attributes – Solve hard problems via elimination • Macro-ops : no single-cycle operations [MICRO’03, HPCA’06] – Do the hard parts with narrow values [JILP’07] Eliminate redundancy, excessive pipelines – Clever clock gating [ISLPED’06, ICCD’07] – Remove renaming, register file, clocked scheduler, pipelines [submitted] Goal: reduce EPI by 10x at fixed process technology and MIPS 5 July 22, 2008 IBM Research Division Cost of Communication Reduce coherence overhead and speculation – Region coherence [ISCA’05, ASPLOS’06, HPCA’08] Exploit locality of communication patterns – Switched circuits [CALetters’07, NOCS’08] – On-chip multicasting [ISCA’08] – Multicast coherence [submitted] New technologies – Nanophotonic rings [HP Labs collaboration] – Massive bandwidth, speed-of-light latency – Lots of interesting problems to solve 6 July 22, 2008 IBM Research Division Emphasis on Memory In future processes, memory will be easier than logic – Reliability, variability: well-known solutions (ECC, sparing) – Interesting new technologies (PCRAM, etc.) – Not caches -- diminishing returns Return to more regular, “memory-like” devices and logic? – Gate array, LUT, PLA Majority of 50B T must not be switching – Remembering is cheaper than computing • Revisit value locality/reuse/memoization? – New search algorithms: • TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM! 7 July 22, 2008 IBM Research Division Unstructured Real-World Data Internet is exploding with data – Text – Semantic knowledge – Photo, video, audio It is all in digital form but all we can do is view and copy it Algorithms for analysis range from poor to nonexistent – Machine learning? Why not learn from nature? 8 July 22, 2008 IBM Research Division Brains Human brain Von Neumann machine – Face recognition: <500ms – Neurons are slow: • – Critical path is a handful of “gates” Fundamentally different computational model Made of shoddy, unreliable parts “…neurons are noisy, unreliable devices, … the nervous system averages over many cells to compensate for these shoddy components.” -Christof Koch We can build it. We have the technology. Dec. 3, 2007 MICRO’-40 Panel: Computing Beyond Von Neumann 9 IBM Research Division Brains (2) Human neocortex: – ~20B neurons, ~200T synapses – Structurally homogenous – Hypothesis: runs common algorithm Apply architecture 101? – Abstraction layers – Hierarchy and replication – Simulation/analysis/synthesis –Let’s Build Brains! – Massively parallel fault-tolerant hardware Best news: no need for parallel programming – Train vs. program Dec. 3, 2007 MICRO’-40 Panel: Computing Beyond Von Neumann 10 IBM Research Division Summary Computation : – Reduce cost (EPI) by 10x – New algorithms Communication – Streamline coherence protocols, interconnects – Exploit new technologies Storage/Memory – Reliability/variability – Logic in memory/new algorithms Brain computing for unstructured real-world data 11 July 22, 2008 IBM Research Division Questions? http://www.ece.wisc.edu/~pharm