Data Quality Testing Generic (http://www.geektester.blogspot.com/)

75 %
25 %
Information about Data Quality Testing Generic (http://www.geektester.blogspot.com/)
Technology

Published on June 17, 2008

Author: raj.kamal13

Source: slideshare.net

Description

http://www.geektester.blogspot.com/

“ Virtually everything in business today is an undifferentiated commodity, except how a company manages its information. How you manage information determines whether you win or lose.” Bill Gates [email_address] -Narendra Parihar - Bhoomika Goyal - Raj Kamal (rajkamal) Data Quality Testing

“ Virtually everything in business today is an undifferentiated commodity, except how a company manages its information. How you manage information determines whether you win or lose.” Bill Gates

[email_address]

-Narendra Parihar

- Bhoomika Goyal - Raj Kamal (rajkamal)

Agenda Data Quality Overview Testing :: DQ Categories / Checks Testing :: DQ Case Study DQ Test Management DQ Benefits & Challenges Q & A DQ Management Overview DQ Testing Case Study Close

Data Quality Overview

Testing :: DQ Categories / Checks

Testing :: DQ Case Study

DQ Test Management

DQ Benefits & Challenges

Q & A

Overview: DQ Definition Data are of high quality "if they are fit for their intended uses in operations , decision making and planning " (J.M. Juran). The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. DQ Impact : Organizations with poor data quality spend time working with conflicting reports and flawed business plans, resulting in erroneous decisions that are made with outdated, inconsistent, and invalid data DQ Management Overview DQ Testing Case Study Close

Data are of high quality "if they are fit for their intended uses in operations , decision making and planning " (J.M. Juran).

The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.

DQ Impact : Organizations with poor data quality spend time working with conflicting reports and flawed business plans, resulting in erroneous decisions that are made with outdated, inconsistent, and invalid data

Overview: DQ Stats “ End users spend as much as 40-50% of a typical IT budget reworking data in one application to make it work with another”. The high cost of low data quality. The Data Warehouse Institute estimates that bad customer data costs American companies upwards of $600billion dollars per year By Wayne W. Eckerson POOR Data Quality can kill your business!!!! DQ Management Overview DQ Testing Case Study Close

“ End users spend as much as 40-50% of a typical IT budget reworking data in one application to make it work with another”. The high cost of low data quality.

The Data Warehouse Institute estimates that bad customer data costs American companies upwards of $600billion dollars per year By Wayne W. Eckerson

POOR Data Quality can kill your business!!!!

Testing :: DQ CheatSheet DQ Management Overview DQ Testing Case Study Close

Rule #1: Row Counts Count of records at Source and Target should be same at a given point of time. DQ Management Missing Records Extra Records Overview DQ Testing Case Study Close

# Example 1 DQ Management Source_Dept Target_Dept Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 1 Human Resource 22-Aug-2007 2 Finance 12-June-1978 3 Operations 11-May-1752

Rule #1: Row Counts Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752

Rule #2: Completeness All the data under consideration at the Source and Target should be same at a given point of time satisfying the business rules. DQ Management Source Table Target Table Overview DQ Testing Case Study Close

Rule #2: Completeness Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target Mismatched Records: Which contain at least one different value for the same record between Source and Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752 DeptID DeptName DeptStartDate DifferenceType 2 Finance 12-June-1988 At Source 2 Finance 12-June-1978 At Target

Rule #3: Consistency This ensures that each user observes a consistent view of the data, including changes made by transactions There is data inconsistency between the Source & Target if the same data is stored in different formats or contain different values at different places. DQ Management Overview DQ Testing Case Study Close

# Example 2 DQ Management Source_Dept Warehouse_Dept Data Mart_Dept Overview DQ Testing Case Study Close DeptID DeptName Revenue ($) DeptStartDate 1 HR 100 22-Aug-2007 2 Finance 200 12-June-1988 DeptID DeptName Revenue (Euro) DeptStartDate 1 HR 70 22/08/2007 2 Finance 140 12/06/1978 DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2007 2 Finance 999999 12/06/1978

Rule #3: Consistency Example #1: Zip code / Date / Currency formats a) b) DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to Revenue & Currency format 1 HR 70 22/08/2007 Same data, Inconsistent due to Revenue & Currency format DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to different format of Department name 1 Human Resource 70 22/08/2007 Same data, Inconsistent due to different format for department name

Rule #3: Consistency Example #2: Regional Setting e.g. Language Example #3: Different values at different points DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 Human Resource 100 22/08/2007 Same data, Inconsistent due to different language used 1 人的資源 100 22/08/2007 Same data, Inconsistent due to different language used DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 2 Finance 140 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart 2 Finance 999999 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart

Rule #4: Validity Validity is the correctness and reasonableness of data. A valid measure must be reliable, but a reliable measure need not be valid. Questions: -> Is Information Reliable? -> How is Information measured ? DQ Management Overview DQ Testing Case Study Close

Validity is the correctness and reasonableness of data.

A valid measure must be reliable, but a reliable measure need not be valid.

Questions:

-> Is Information Reliable?

-> How is Information measured ?

Rule #4: Validity Example #1: Measuring “Unemployment” in a country -> Statistics are collected reliably month-on-month -> Definition of collecting “Unemployment” remains same. e.g. Definition of “unemployment” has changed in past 25 years hence we can’t compare old data with current data as comparison is not valid Example #2: Values falling outside a range DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2255 2 Finance 999999 12/06/1752

Rule #4: Validity Example #3: Dates having valid MM, DD, YYYY Example #4: Birth date > Death Date  DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 13/13/2007 EmpId EmpName DOB DOE 1 Jack 13/01/2008 24/11/1996

Rule #5: Redundancy Physical Duplicates: All the columns values repeating for at least 2 records in a table Logical Duplicates: Business Key (list of column) values are repeating for at least 2 records in a table DQ Management Logical Dups Physical Dups Overview DQ Testing Case Study Close

# Example 3 DQ Management Employee Example #1: Physical Duplicates Example #2: Logical Duplicates Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 7 Jack #23, Jackson St., NY 41 NULL EmpID EmpName EmpAddress Age DeptID 2 Sam A302, Woodsvilla, WA 28 2 2 Sam A302, Woodsvilla, WA 28 2 EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 5 Jim #22, Jackson St., NY 23 1

Rule #6: RI If there are child records for which no corresponding parent records existing then they are called “Orphan Records” Logical relationship rules between parent & child tables should be defined by business. DQ Management Overview DQ Testing Case Study Close

# Example 4 DQ Management Child Table:: Employee Parent Table:: Department Orphan Records Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID (FK) 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 7 Jack #23, Jackson St., NY 41 NULL DeptID (PK) DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 3 Operations 11-May-1752 EmpID EmpName EmpAddress Age DeptID 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 7 Jack #23, Jackson St., NY 41 NULL

Rule #7: Domain Integrity Possible values that can be allowed in a data element. DQ Management Overview DQ Testing Case Study Close

Possible values that can be allowed in a data element.

Rule #7: Domain Integrity Example #1: Invalid Lookup Table Values (Valid:: HR, Finance, Operations) Example #2: Truncation::Data Types, Data Length etc DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (Varchar(50)) 1 HR 2 Finance 3 Operations DeptID (PK) DeptName (Varchar (2)) 1 HR 2 Fi 3 Op

Example #1: Invalid Lookup Table Values (Valid:: HR, Finance, Operations)

Example #2: Truncation::Data Types, Data Length etc

Rule #7: Domain Integrity Example #3: Constraints: NOT NULL, CHECK, PK, UK etc DQ Management Source Table Target Table Overview DQ Testing Case Study Close DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 Operations 4 Invalid Dept DeptID (PK) DeptName (NOT NULL) 1 HR 2 Finance 3 NULL 4 NULL

Example #3: Constraints: NOT NULL, CHECK, PK, UK etc

Rule #8: Accuracy Degree to which data reflects Real World objects Accuracy is generally measured by comparing against something defined as “true” source of information DQ Management Accuracy Overview DQ Testing Case Study Close

Rule #9: Usability Describes the relevance and the meaning of data Example #: Denotes the ease with which data can be used DQ Management Represented As Mart Table ReportingTable Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Fin 3 Ops DeptID (PK) DeptName 1 Human Resources 2 Finance 3 Operations

Rule #10: Timeliness Defines if data required is available when required as per SLA Example #1: Data Freshness If everyday data is pulled 24 times and target doesn’t get even for one cycle, “data freshness” get impacted and users see old data which can impact business decisions. For decision making & mission critical system, timely availability of information is must. DQ Management Overview DQ Testing Case Study Close

Defines if data required is available when required as per SLA

Example #1: Data Freshness

If everyday data is pulled 24 times and target doesn’t get even for one cycle, “data freshness” get impacted and users see old data which can impact business decisions.

For decision making & mission critical system, timely availability of information is must.

Testing :: DQ Case Study ADQC (Automated Data Quality Check) v2.0 DQ Management Overview DQ Testing Case Study Close

DQ Test Management DQ Test Management: DQ Management Overview DQ Testing Case Study Close

DQTM: Test Planning DQ Test Management: Planning 1. Ensure DQ Requirement are covered in following documents: BRD FSD Test Plan 2. Ensure DQ Requirements are clarified by Business / PDMs DQ Management Overview DQ Testing Case Study Close

DQ Test Management: Planning

1. Ensure DQ Requirement are covered in following documents:

BRD

FSD

Test Plan

2. Ensure DQ Requirements are clarified by Business / PDMs

DQTM: Test Design DQ Test Management: Test Case Design 1. Ensure DQ Requirement are covered in Test Scenarios and Test Cases 2. Ensure DQ Test cases are automated. DQ Management Overview DQ Testing Case Study Close

DQ Test Management: Test Case Design

1. Ensure DQ Requirement are covered in Test Scenarios and Test Cases

2. Ensure DQ Test cases are automated.

DQTM: Test Execution DQ Test Management: Test Execution 1. Ensure Test Cases related to DQ Requirements are executed in Test cycles 2. Ensure DQ Test results & DQ Bugs are shared with the Business / PDM in the triage meeting to understand the correct priority based on the impact. DQ Management Overview DQ Testing Case Study Close

DQ Test Management: Test Execution

1. Ensure Test Cases related to DQ Requirements are executed in Test cycles



2. Ensure DQ Test results & DQ Bugs are shared with the Business / PDM in the triage meeting to understand the correct priority based on the impact.

DQTM: Test Monitoring DQ Test Management: Test Monitoring 1. Regularly collect DQ Metrics to depict the trend 2. If DQ Issues Trend is upward, immediate action need to be taken DQ Management Overview DQ Testing Case Study Close

DQ Test Management: Test Monitoring

1. Regularly collect DQ Metrics to depict the trend



2. If DQ Issues Trend is upward, immediate action need to be taken

DQ Challenges DQ Management Overview DQ Testing Case Study Close

DQ Best Practices DQ Management Overview DQ Testing Case Study Close

DQ Jargons DATA GOVERNANCE Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise Data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures DATA STEWARDS Data Stewards are those individuals ultimately responsible for the definition, management, control, integrity or maintenance of Enterprise data. DATA INTEGRITY Data integrity is the assurance that data is correct and consistent--that the data correctly reflects the "real" world. DQ Management Overview DQ Testing Case Study Close

DATA GOVERNANCE

Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise

Data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures

DATA STEWARDS

Data Stewards are those individuals ultimately responsible for the definition, management, control, integrity or maintenance of Enterprise data.

DATA INTEGRITY

Data integrity is the assurance that data is correct and consistent--that the data correctly reflects the "real" world.

References www.infoimpact.com http://www.idma.org/valuePropositionGeneral.pdf http://www.intelligententerprise.com/showArticle.jhtml?articleID=17701630 http://www.sociology.org.uk/p1mc5n1a.htm http://blogs.sun.com/emmyp/entry/ensuring_the_validity_of_your http://www.dmreview.com/dmdirect/20021108/6019-1.html DQ Management Overview DQ Testing Case Study Close

www.infoimpact.com

http://www.idma.org/valuePropositionGeneral.pdf

http://www.intelligententerprise.com/showArticle.jhtml?articleID=17701630

http://www.sociology.org.uk/p1mc5n1a.htm

http://blogs.sun.com/emmyp/entry/ensuring_the_validity_of_your

http://www.dmreview.com/dmdirect/20021108/6019-1.html

Questions & Answers DQ Management Overview DQ Testing Case Study Close

Thank you DQ Management Overview DQ Testing Case Study Close

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Resurrecting the Prodigal Son – Data Quality “Rise from ...

Resurrecting the Prodigal Son – Data Quality ... in decision making as “reconciling” data. Data quality testing is a sea with ...
Read more

GUIDELINES FOR THE TEMPLATE FOR A GENERIC NATIONAL QUALITY ...

Monetary Fund’s Data Quality Assessment Framework ... the development of a generic, ... a generic national quality assurance framework template , ...
Read more

Tester Testifies

Tester Testifies For want ... teams ship more frequently with less testing, lower (initial) quality, ... do our testing more with production data rather ...
Read more

Data Conversion & Testing Plan-Generic - scribd.com

Data Conversion, Testing, and Cut ... (Access to transaction detail TBD). and Financial Statements Data Conversion & Testing Plan-generic.Perform year end ...
Read more

Adventures with Testing BI/DW ... - msdn.microsoft.com

Adventures with Testing BI/DW Application: ... Data Quality and Performance Acceptance Criteria: ... Report Testing, Data Integration Testing etc. ...
Read more

IS 15820 - 2009 Generic Quality Manual

GENERIC QUALITY MANUAL QUALITY MANUAL ... 2.9 Control of nonconforming testing work 4.9 14 ... Quality control of critical operations, ...
Read more