TeamStation AI System Report LATAM IT Salaries 2024
How you manage information determines whether you win or lose
1.
2.
3.
4.
5. Testing :: DQ CheatSheet DQ Management Overview DQ Testing Case Study Close
6. Rule #1: Row Counts Count of records at Source and Target should be same at a given point of time. DQ Management Missing Records Extra Records Overview DQ Testing Case Study Close
7. # Example 1 DQ Management Source_Dept Target_Dept Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 1 Human Resource 22-Aug-2007 2 Finance 12-June-1978 3 Operations 11-May-1752
8. Rule #1: Row Counts Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752
9. Rule #2: Completeness All the data under consideration at the Source and Target should be same at a given point of time satisfying the business rules. DQ Management Source Table Target Table Overview DQ Testing Case Study Close
10. Rule #2: Completeness Missing Records: Records which are only present at Source Extra Records: Records which are only present at Target Mismatched Records: Which contain at least one different value for the same record between Source and Target DQ Management Overview DQ Testing Case Study Close DeptID DeptName DeptStartDate 4 Admin 1-May-1999 5 IT 2-June-1997 DeptID DeptName DeptStartDate 3 Operations 11-May-1752 DeptID DeptName DeptStartDate DifferenceType 2 Finance 12-June-1988 At Source 2 Finance 12-June-1978 At Target
11. Rule #3: Consistency This ensures that each user observes a consistent view of the data, including changes made by transactions There is data inconsistency between the Source & Target if the same data is stored in different formats or contain different values at different places. DQ Management Overview DQ Testing Case Study Close
12. # Example 2 DQ Management Source_Dept Warehouse_Dept Data Mart_Dept Overview DQ Testing Case Study Close DeptID DeptName Revenue ($) DeptStartDate 1 HR 100 22-Aug-2007 2 Finance 200 12-June-1988 DeptID DeptName Revenue (Euro) DeptStartDate 1 HR 70 22/08/2007 2 Finance 140 12/06/1978 DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2007 2 Finance 999999 12/06/1978
13. Rule #3: Consistency Example #1: Zip code / Date / Currency formats a) b) DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to Revenue & Currency format 1 HR 70 22/08/2007 Same data, Inconsistent due to Revenue & Currency format DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 HR 100 22-Aug-2007 Same data, Inconsistent due to different format of Department name 1 Human Resource 70 22/08/2007 Same data, Inconsistent due to different format for department name
14. Rule #3: Consistency Example #2: Regional Setting e.g. Language Example #3: Different values at different points DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 1 Human Resource 100 22/08/2007 Same data, Inconsistent due to different language used 1 人的資源 100 22/08/2007 Same data, Inconsistent due to different language used DeptID DeptName Revenue ($ or Euro ) DeptStartDate Difference Point 2 Finance 140 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart 2 Finance 999999 12/06/1978 Same data, Inconsistent value for Revenue between Warehouse & Mart
15.
16. Rule #4: Validity Example #1: Measuring “Unemployment” in a country -> Statistics are collected reliably month-on-month -> Definition of collecting “Unemployment” remains same. e.g. Definition of “unemployment” has changed in past 25 years hence we can’t compare old data with current data as comparison is not valid Example #2: Values falling outside a range DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 22/08/2255 2 Finance 999999 12/06/1752
17. Rule #4: Validity Example #3: Dates having valid MM, DD, YYYY Example #4: Birth date > Death Date DQ Management Overview DQ Testing Case Study Close DeptID DeptName Revenue (Euro) DeptStartDate 1 Human Resource 70 13/13/2007 EmpId EmpName DOB DOE 1 Jack 13/01/2008 24/11/1996
18. Rule #5: Redundancy Physical Duplicates: All the columns values repeating for at least 2 records in a table Logical Duplicates: Business Key (list of column) values are repeating for at least 2 records in a table DQ Management Logical Dups Physical Dups Overview DQ Testing Case Study Close
19. # Example 3 DQ Management Employee Example #1: Physical Duplicates Example #2: Logical Duplicates Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 7 Jack #23, Jackson St., NY 41 NULL EmpID EmpName EmpAddress Age DeptID 2 Sam A302, Woodsvilla, WA 28 2 2 Sam A302, Woodsvilla, WA 28 2 EmpID EmpName EmpAddress Age DeptID 1 Jim #22, Jackson St., NY 23 1 5 Jim #22, Jackson St., NY 23 1
20. Rule #6: RI If there are child records for which no corresponding parent records existing then they are called “Orphan Records” Logical relationship rules between parent & child tables should be defined by business. DQ Management Overview DQ Testing Case Study Close
21. # Example 4 DQ Management Child Table:: Employee Parent Table:: Department Orphan Records Overview DQ Testing Case Study Close EmpID EmpName EmpAddress Age DeptID (FK) 1 Jim #22, Jackson St., NY 23 1 2 Sam A302, Woodsvilla, WA 28 2 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 5 Jim #22, Jackson St., NY 23 1 7 Jack #23, Jackson St., NY 41 NULL DeptID (PK) DeptName DeptStartDate 1 HR 22-Aug-2007 2 Finance 12-June-1988 3 Operations 11-May-1752 EmpID EmpName EmpAddress Age DeptID 4 Samuel No. AA, Andrew Street, Redmond, WA 22 999 7 Jack #23, Jackson St., NY 41 NULL
22.
23.
24.
25. Rule #8: Accuracy Degree to which data reflects Real World objects Accuracy is generally measured by comparing against something defined as “true” source of information DQ Management Accuracy Overview DQ Testing Case Study Close
26. Rule #9: Usability Describes the relevance and the meaning of data Example #: Denotes the ease with which data can be used DQ Management Represented As Mart Table ReportingTable Overview DQ Testing Case Study Close DeptID (PK) DeptName 1 HR 2 Fin 3 Ops DeptID (PK) DeptName 1 Human Resources 2 Finance 3 Operations
27.
28. Testing :: DQ Case Study ADQC (Automated Data Quality Check) v2.0 DQ Management Overview DQ Testing Case Study Close
29. DQ Test Management DQ Test Management: DQ Management Overview DQ Testing Case Study Close
30.
31.
32.
33.
34. DQ Challenges DQ Management Overview DQ Testing Case Study Close
35. DQ Best Practices DQ Management Overview DQ Testing Case Study Close