BasevsDIStudio

50 %
50 %
Information about BasevsDIStudio
Education

Published on February 5, 2008

Author: Berta

Source: authorstream.com

Base SAS® vs. SAS® Data Integration Studio:  Base SAS® vs. SAS® Data Integration Studio Greg Nelson and Danny Grasse Slide2:  Outline Overview of Understanding ETL What SAS approaches do we have? 38 Best Practices Key Areas for Comparison Design and data profiling. Source data extraction. Transformation and loading. Change data capture. Quality review, auditing and exception handling/management. Integration with the production environment and business process components. Summary and Conclusions Overview:  Overview ETL Data Warehousing 101 Data Integration Studio “Consistent” version of the truth Credible information versus Data quality Corporate Information Factory:  Corporate Information Factory Ralph Kimball History Excellence:  Ralph Kimball History Excellence Father of “dimensional” data warehouse design The Data Warehouse Toolkit (I and II) The Data Warehouse Lifecycle Toolkit The Data Warehouse ETL Toolkit The Data Integration Process:  The Data Integration Process 38 Subsystems:  38 Subsystems 38 Sub-systems define your ETL strategy Design and data profiling Source data extraction Transformation and loading Change data capture Quality review, auditing and exception handling/management Integration with the production environment and business process components 38 Subsystems: Category 1:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 1 Productionizatation Design and Data Profiling:  Design and Data Profiling Design is often played by “soft skills” in the SAS ecosystem Data profiling Base SAS – frequencies, crosstabs, macros and toolkits DataFlux software – data profiling on steroids DI Studio – data profiling currently supported through generalized metadata Data Profiling is an analysis exercise, not a technical one 38 Subsystems: Category 2:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 2 Productionizatation Source Data Extraction:  Source Data Extraction Source data adapters (including data conversions and filters) SAS/Access products Data Step, SQL and DI Studio Push/pull/dribble How we move the data Filtering & Sorting How we select the data to be moved Data staging (versus accessing) How we land the data 38 Subsystems: Category 3:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 3 Productionizatation Transformation and Loading:  Transformation and Loading 5. Data conformer. 9. Surrogate key creation system. 12. Fixed hierarchy dimension builder. 13. Variable hierarchy dimension builder. 14. Multivalued dimension bridge table builder. 15. Junk dimension builder. 16. Transaction grain fact table loader. 17. Periodic snapshot grain fact table loader. 18. Accumulating snapshot grain fact table loader. 19. Surrogate key pipeline. 20. Late arriving fact handler. 21. Aggregate builder. 22. Multidimensional cube builder. 23. Real-time partition builder. 24. Dimension manager system. 25. Fact table provider system. Transformation and Loading:  Transformation and Loading Conforming dimensions and facts Good old SAS code (Data Step, SQL, Formats) – all available in DI Studio Good design Creation of surrogate keys Hand coded in Base SAS “Automagic” in DI Studio Building summary tables and cubes Just another “target” in DI Studio 38 Subsystems: Category 4:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 4 Productionizatation Change Data Capture:  Change Data Capture Change Data Capture:  Change Data Capture When we see new data coming in from the operational system, we have to make a decision about how to handle that change. We have three options: We can overwrite or update the old value (Type I) We can create a new record and create some mechanism for recreating historical references to that data – depending on the date for the report that is being requested (Type II). We can retain both as alternative “realities”. For the latter, we usually create a new column and put the old value in the new column to allow for alternatives to reporting. (Type III) Change Data Capture:  Change Data Capture SAS Approaches Base SAS – very robust using macros Can control everything about the load DI Studio has limited coverage SAS does support CRC-based record comparisons (MD5 function) DI Studio 3 types of loading techniques: update, refresh, append Type I & II are dropdowns; Type II SCD is a transform Doesn’t support Type 3 outside of transform code 38 Subsystems: Category 5:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 5 Productionizatation Quality Handling:  Quality Handling Quality Handling:  Quality Handling Detecting errors Handling them Providing audit records Quality Management:  Quality Management Detecting errors SAS errors versus data errors DataFlux Data rationalization At the point of data entry Base SAS If then else routines (lookup tables, formats) DI Studio Not much other than BASE SAS Audit trail:  Audit trail Base SAS Log parsing routines DI Studio Workspace server logs Event System Detailed logs, summary logs and event triggers Exception Handling:  Exception Handling Base SAS Macros, put statements in log file DI Studio Simple email, exception tables and log file Event System Subscribe to events Responds to errors, warnings, notes and custom assertions 38 Subsystems: Category 6:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 6 Productionizatation Productionization of SAS ETL:  Productionization of SAS ETL 26. Job scheduler. 27. Workflow monitor. 28. Recovery and restart system. 29. Parallelizing/pipelining system. 30. Problem escalation system. 31. Version control system. 32. Version migration system. 33. Lineage and dependency analyzer. 34. Compliance reporter. 35. Security system. 36. Backup system. 37. Metadata repository manager. 38. Project management system. Productionization of SAS:  Productionization of SAS Version control, change control, promotion, backup and recovery None is available in BASE SAS Version control – minimal for multi-developer access Change management –partially available in DI Studio Automated promotion – weak and/or not available Backup – metadata server can be backed up (but no rollback feature) Productionization of SAS:  Productionization of SAS Scheduling, dependency management and restartability, including parallelization. Provided by LSF Scheduler Managed by person doing the scheduling not writing the code LSF provides parallelization, but also 'grid' computing with the associated 'pipelining' of steps Productionization of SAS:  Productionization of SAS Metadata management and impact analysis. Very good in DI Studio Productionization of SAS:  Productionization of SAS Project management and problem escalation. Not in scope for DI Studio Summary:  Summary Di Studio is a code generator Can do everything Base SAS can do Metadata and security is the key for why we want to use DI Studio DI Studio writes “better” code in some cases Challenges: change control and what happens when things go bad Slide32:  ThotWave Technologies Thinking Data Danny Grasse Senior Consultant dgrasse@thotwave.com Greg Nelson CEO and Founder greg@thotwave.com

Add a comment

Related presentations

Related pages

BasevsDIStudio (1) - scribd.com

BasevsDIStudio (1) - Download as Powerpoint Presentation (.ppt), PDF File (.pdf), Text File (.txt) or view presentation slides online.
Read more

File:BasevsDIStudio.ppt - sasCommunity

File:BasevsDIStudio.ppt. From sasCommunity. Jump to: navigation, search. File; File history; File usage; BasevsDIStudio.ppt ...
Read more

Sas data integration studio tutorial pdf - WordPress.com

sas data integration studio tutorial pdf BasevsDIStudio.pptA robust data integration DI process had to be developed that.
Read more

Base SAS versus SAS Data Integration Studio - sasCommunity

Base SAS versus SAS Data Integration Studio. From sasCommunity. Jump to: navigation, search. Paper 099-31. ... MEDIA:BasevsDIStudio.ppt.
Read more

Resume Komal - scribd.com

Resume Komal - Download as Word Doc (.doc), PDF File (.pdf), Text File (.txt) or read online.
Read more

Sas di studio tutorial pdf - WordPress.com

Sas di studio tutorial pdf The correct bibliographic citation for this manual is as follows: SAS Institute Inc. Metadata Import and Export in SAS Data ...
Read more

Sas Questions - pt.scribd.com

Scribd is the world's largest social reading and publishing site.
Read more

Sas Questions - es.scribd.com

NO. 1 The following SAS program is submitted: date view=sauser.ranch; describe; run; What is the result? A. The program creates a DATA step view called ...
Read more

grc - es.scribd.com

GRC AC 10. 0 Certification Exam Questions and Answers - part1 These questions and answers are collected from web and friends who is preparing for GRC 10.0
Read more