Download POWER Roadmap

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fault tolerance wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Transcript
IBM System p
POWER Roadmap
Bradley McCredie
IBM Systems &Technology Group, Development
IBM Fellow
© 2006 IBM Corporation
IBM System p
IBM POWER SYSTEMS
Consistent Predictable Delivery
2007
2006
2004
2003
2001
POWER6
POWER5+
POWER5
POWER4+
POWER4
2
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
POWER6™
Innovations in all areas of system design:
ƒ Technology
ƒ Performance
ƒ Flexibility
ƒ RAS
ƒ Features and function
ƒ System Energy Management
ƒ Current development Status
3
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Key 65nm technology features
ƒ 30% performance improvement over 90nm at constant power
– 65nm SOI technology with 10 levels of metal (low-k dielectric on first 8 levels)
– Pitches: 180nm M1, 200nm M2 to M4, 400nm M5-M6, 800nm M7-M8, and 1600nm M9M10
– 250nm contacted gate pitch
– 35nm gate length, 1.05nm gate dielectric thickness
– dual stress liners
– 0.65um2 high-performance SRAM cell
4
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Technology power/performance features
ƒ Additional technology features to optimize low power design:
– Tailored transistors for ultra low power
– Tailored array cells for high performance and high density applications
– All array cells use separate voltage supply to enable low voltage application
– Broad voltage range of technology operation for low power and high performance
application
Functional Vmin vs. FO3 Inverter Delay
1
0.95
Vmin (V)
0.9
Vmin
0.85
0.8
0.75
0.7
FO3 Inverter Delay
5
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
POWER6 Core
ƒ POWER6 processor is ~2X frequency of POWER5 (4-5GHz)
ƒ POWER6 instruction pipeline depth equivalent to POWER5
– Minimize power
– Scale performance with frequency
Instruction Fetch
Instruction Buffer/Decode
Instruction Dispatch/Issue
Data Fetch/Execute
~6ns/instr
~3ns/instr
FXU Dependent execution
Load Dependent execution
ƒ POWER6 Extends functionality of POWER5 Core
–
–
–
–
–
6
64K I Cache, 64K D Cache, 2 FXU, 2 FPU, 1 Branch execution unit
Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread)
Decimal Unit
VMX Unit
Recovery Unit
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
POWER6 scales chip capabilities with core performance
ƒ Cache highlights
– 4MB Private L2 Cache per Core
– 32MB Non-sectored L3 Cache per chip
ƒ Fabric highlights
– Three Intra-Node SMP buses for 8-way Node
– Two Inter-Node SMP buses for up to 8 Nodes
– Multiplexed Address/Data SMP buses
ƒ New prefetching capabilities
80GB/s
– Coherent Multi-Cacheline Data Prefetch Operations
Core
Core
(SMT+, VMX)
(SMT+, VMX)
Corona
2X2B A
– Prefetching on stores
L3
Cache
32MB
2X8B D
2X8B D
4X1B
4X1B
4X2B
4X1B
4X1B
4X2B
A
D
D
A
D
D
L2 Cache
(8.0MB)
L3 Dir
L3 Cntl
20GB/s
GX+ Cntl
Mem Cntl
Mem Cntl
4B
4B
A/D
A/D
Fabric
A/D
8B
A/D
8B
GX+ Bus
A/D
75 GB/s
8B
A/D
A/D A/D
A/D A/D
8B 8B
8B 8B
A/D A/D
8B
8B
8B
Intra-Node
8W SMP Buses
Total = 300 GB/s
7
© 2006 IBM Corporation
50GB/s
80GB/s
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Flex System to optimize low end to
high end server designs
ƒ SMP busses can be configured in two modes
2 socket
– Cost/performance trade-offs
– On node busses are 8B or 2B
– Off node busses are 8B or 4B
ƒ Numerous memory controller BW options
– 1 or 2 memory controllers are available
– Memory controllers can be configured to full width or ½
width
ƒ L3 cache is supported in three configurations
4 socket
– On module High Bandwidth configuration
– Optional off module configuration
– No L3 option
ƒ Fully interconnected two-tier SMP fabric
– Reduced latencies vs. POWER5
– New two tier memory coherency protocol
32 socket
8
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Bullet-proof computing
Core
ƒ Recovery Capability
– Array error
• Error correction (ECC)
• Arrays with parity
– Processor restarts
– Instruction flow and Data flow Error
• Processor restarts
– Control Error
• Processor restarts
Instruction Fetch
Decode
Execution Units
Core error collection
Load/
Store
Recovery
unit
Core restart
ƒ System Resiliency
– Processor states are check pointed and protected with ECC
– Processor states can be moved from one processor to another upon
unsuccessful recovery restart
9
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Bullet-proof computing
ƒ System RAS with recovery unit
– Every measure possible taken to preserve application execution
– Retry soft errors
– Change hardware for hard errors
Processor architected state check pointed
Every 1 cycle
ECC & Non-ECC protected circuitry checked
Every cycle
No error found
Error found
Processor restarts from last saved checkpoint
Error found
No error found
Soft error case
Processor workload moved to another CPU
Hard error case
10
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Features and functions
IN-CORE
IN-COREHARDWARE
HARDWAREACCELERATORS
ACCELERATORS
Decimal
DecimalFloating
Floatingpoint
pointand
andAltivec
Altivec(VMX)
(VMX)
VIRTUALIZATION
VIRTUALIZATIONENHANCEMENTS
ENHANCEMENTS
33rdrdGENERATION
GENERATIONMULTI-THREADING
MULTI-THREADING
11
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Decimal Floating Point accelerators
ƒ Binary floating-point unsuitable for commercial
or human-centric applications
– Cannot meet legal and financial requirements
ƒ Survey of numeric data in commercial databases is largely decimal data
– 55% of numeric data in databases is BCD data
– The next 43% of data is integers, often held as decimal integers
ƒ Example: performance improvement of decimal hardware vs. decimal software
– Telco billing application -- 1 million calls (2 minutes) read from file, priced, taxed, and
printed:
Java BigDecimal
w/ DEC number
C, C#
packages
Integer
hand-tuned
% execution time in
decimal operations
93.2%
72 – 78%
45%
Speedup with
hardware DFP*
< 7X
4X
2X
* IBM projection
12
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Decimal Floating-Point Unit
Architecture:
Added ~50 instruction to POWER ISA:
Add, Multiply, Divide
Conversions (to/from Integer, BCD & DFP)
Insert Exponent (scaling)
Quantize (single unit)
Reround (round to less precision once)
Supports IEEE 754R standard which is going to ballot this month
Adder
2 by 18 digit Mult
1 by 36 digit Quad
2 cycle Pipelineable
Expand DPD to BCD
Expand DPD to BCD
Every digit is conditional sum
A+B+7 A+B+6 A+B+1 A+B
Minimal Hardware Implementation:
Separate unit from Binary Floating-Point Unit
Reuses BFUs Register File (FPRs) to minimize area
Reuses FPSCR (status and control)
Separate Decimal Rounding Mode ( Bankers rounding)
Quadprecision dataflow
34 digit + 2 guard, 144 bit wide, compresses to 128bit in memory
Both 34 digit and 16 digit datatypes supported in hardware
36 digit adder
2 cycle latency
1 cycle throughput at CPU cycle
13
© 2006 IBM Corporation
Multiple creator (2X, 5X)
Rotate cycle1
R Register
Rotate cycle2
AH register
BH register
AL register
BL register
4D 4D 4D 4D2D 4D 4D 4D 4D2D
ADD register
ADD register
+1 +1 +1 +1 +1 +1 +1 +1 +1 +1
WH register
WL register
Compress BCD to DPD
CH register
CL register
144 bits or 36 digits wide
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Standards & software rallying around DFP standardization
ƒ Numerous software and standards activities in flight inside and outside IBM
–
–
–
–
–
–
–
–
14
Java BigDecimal (compatible with 754r)
C# and .Net ECMA and ISO standards arithmetic changed to match 745r decimal128
XML Schema 1.1 draft now has pDecimal compatible with 754r
ISO C and C++ are jointly adding decimal floating-point as first-class primitive types
GCC almost complete
Cobal is adding a datatype to support 754r
ANSI/ISO SQL … new types accepted in principle (draft about to be submitted)
Strong support expressed by Microsoft, SHARE, academia, SAP and many others
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Other core features
ƒ Virtualization enhancements
– Hardware support for increased virtualization granularity
• Up to 1024 partitions are supported
– Support virtual partition of memory
• Memory can be reconfigured and moved for partitions (memory packing)
• Allow concurrent maintenance by reconfigure and isolate memory failures
– Support virtual page key protection
• Data page protected by keys to prevent unauthorized access
ƒ Simultaneous Multithreading Enhancements
– SMT provides excellent power performance
• Reuse of existing transistors vs. performance from additional transistors
– Second thread/core delivers tremendous throughput
• 55% performance on OLTP
• 40% performance on integer application
– Performance improvements achieved with architectural enhancements
• Enhanced cache sizes and associativity to minimize thrashing effect from the 2 threads
• Improve dispatch bandwidth for SMT
15
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
PowerExecutive™ Extensions for POWER6
Energy Management Policies
Example Energy Management Policies:
PowerExecutive
ƒ Energy cost management
– Monitor System workloads/power consumption
• If: System utilization reduces reduce system power/performance
• If: Multiple Systems go below utilization threshold consolidate workloads
• If: System power budgets exceed allocation cap power
ƒ Acoustic optimization
– Monitor Systems temperature
• If system temperatures go below threshold reduce fan speeds
ƒ Performance optimization
– Monitor system temperature/power consumption
• If temperatures/power consumption go below threshold increase performance
Energy Management Policies Enable Customers To Maximize The Compute
Capability Of Their Datacenter Or Minimize Energy Costs
16
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
POWER6 EMPATH System Control
Extended System Functions For PowerExecutive Policies
ƒ Thermal / Power Measurement
IBM Director
PowerExecutive
Hardware
Management
Module
– Read thermal data from processor chip thermal sensors
– Measure power data from system level sensors
– Report data via PowerExecutive
ƒ Power Capping
– Use of Hardware controls to keep system power under a specified limit
IBM
POWER6
ƒ Power Saving
Service
Processor
Power Control Firmware
AME API
Power Mgmt Policies
Power
EMPATH
Controller
Power
Measurement
Circuits
– Operation at reduced power when workload and policy allows
• Can be a static policy (e.g. overnight reduction)
• Can be dynamic (when absolute max performance is not
always required)
ƒ System health monitoring
– Use of hardware sensors to ensure system is operating within safe
predefined bounds
ƒ Performance-Aware Power Management
– Use of dedicated performance counters to guide power and thermal
management tradeoffs
Modules
17
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
POWER6 implementation status
ƒ Production-level chip design is in the lab
ƒ Numerous systems are in various stages of bringup, debug and test
– High End
– Midrange
P6 Fmax vs. Channel Length
– Low End
– Cluster/Scientific
Frequency (MHz)
ƒ On target for
mid ’07 GA
fmax
target
Channel Length
18
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM
IBM System p
Summary & Conclusions
ƒ POWER6 development and delivery is on schedule
ƒ POWER6 doubles frequency and bandwidth of POWER5™
– Same pipe depth
– Same power envelope
ƒ POWER6 scales chip/system performance with core performance
ƒ POWER6 provides new capabilities
– Decimal Floating Point
– Processor recovery
ƒ System p™ will begin delivery of system power management with
POWER6
ƒ POWER6 is on track to deliver high frequency capabilities on schedule
19
© 2006 IBM Corporation
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals
and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create
liability or obligation for IBM.
DRAFT: IBM Confidential
IBM Systems
Systems
IBM