Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Teradata Architectural Summit 2007 Yasir Hassan Maken Professional Services EMEA (GCC – Pakistan) What Data Skew can cause? Why I cannot insert 50 GB evenly distributed data in a database having 500 GB unused space? Why we are not able to move only 10% of the unused space from a database? Why most of my requests are Spooling out? Why most of the queries running on this database usually have high CPU SKEW? 2 > 5/22/2017 Teradata Confidential Because • Most of the database space is being wasted • Almost all allocated space for a database on at least one of the AMPs has been occupied • The chunk of upcoming data that needs to go on the highly congested AMP, after distribution, do not finds a space for accommodation • If space needs to be moved, the chunk of space that is supposed to be fetched from the congested AMP would not be allowed. As, equal space will be fetched from each of the AMPs and the activity will fail if one of the AMPs does not have FREE SPACE=“Total Space to be moved/# of AMPs” • During processing of any request on the database, there would be high chances of spooling out on the congested AMP as “MaxPerm – CurrPerm” will be the available Spool which will be less due to high Skew on the AMP • In some cases, the CPU Skew of the sql request on the skewed tables of the database will be more due to uneven distribution of data and different CPU required on different AMPs 3 > 5/22/2017 Teradata Confidential How to make things work normally? Identify the Skewed tables Identify the Skewed tables and plan Primary Index Revision process for the tables having high skew and huge Skew savings Analyze the DBQL to find the most frequently used columns in the sql executed on highly skewed tables (Choose the columns in WHERE clause and JOIN conditions) Check the row distribution per AMP for the most frequently used column or group of columns (Composite) being used in the WHERE cause and JOIN conditions. Also check the new columns for Hash collisions Check the row distribution per amp and find the Skew with the suggested Index SkewFactor = 100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100) Suggest the Primary Index on the basis of best distribution and high frequency of use in order to make sure that data gets evenly distributed and almost equal amount of space is free/utilized across all AMPs 4 > 5/22/2017 Check for Hash collisions Teradata Confidential Improvements Now I can easily insert enough data in the database Now the Spool out issue has improved a lot Now I can easily move space from the database if there is any available space The CPU Skew has decreased a lot BEFORE xx GB xx GB xx GB AFTER xx GB xx GB xx GB xx GB xx GB xx GB xx GB 10 0 10 0 90 90 80 80 70 70 60 60 50 50 U sed S p ace U sed S p ace 40 40 30 30 20 20 10 10 0 0 A M P1 5 > 5/22/2017 A M P3 A M P1 A M P5 Teradata Confidential A M P3 A M P5