2. Data Quality
The data becomes outdated over time and degrades to become unusable. Data Quality tools and processes act on the data to correct its deterioration.
What is Data Quality?
Data Quality consists of validating all data held on individuals and companies (customers, users, members of a club or association, employees, etc.), correct them whenever possible, standardize them according to the standards of the Post Office Ensure delivery) and eliminate duplicates, leaving a single instance of each person.
What do I gain with Data Quality?
- Cost savings
- Efficient cooperation
- Consistent basis for decision making
- Risk minimization
- Revenue boost
Finally, all my clients have the data validated and clean: I can already know the exact number of customers; I can segment them correctly, I can locate them in maps with geographic tools, I can know them better and better to offer services totally adapted to their needs.
No more duplicate customers: no more letters sent in duplicate to no avail, no more unnecessary mail returns, no more wrong phone calls, no more money thrown away.
In order to solve the problem of data quality, an analysis must be performed for each of the data quality dimensions, thus solving each of the doubts in the process and thus reducing the risks of failure in the projects. this type.
The six core dimensions of data quality are:
- Completeness: Completeness is defined as expected comprehensiveness. Data can be complete even if optional data is missing. As long as the data meets the expectations then the data is considered complete.
Example: customer’s first name and last name are mandatory but middle name is optional; so a record can be considered complete even if a middle name is not available.
- Timeliness: The degree to which data represent reality from the required
point in time.
Example: Tina Jones provides details of an updated emergency contact
number on 1st June 2013 which is then entered into the Student
database by the admin team on 4th June 2013. This indicates a
delay of 3 days. This delay breaches the timeliness constraint as
the service level agreement for changes is 2 days.
- Consistency: The absence of difference, when comparing two or more representations of a
thing against a definition.
Example: School admin: a student’s date of birth has the same value and format in the
school register as that stored within the Student database.
- Validity: Conformity means the data is following the set of standard data definitions like data type, size and format. For example, date of birth of customer is in the format “mm/dd/yyyy”
- Integrity: Integrity means validity of data across the relationships and ensures that all data in a database can be traced and connected to other data.
For example, in a customer database, there should be a valid customer, addresses and relationship between them. If there is an address relationship data without a customer then that data is not valid and is considered an orphaned record.
- Accuracy: is the degree to which data correctly reflects the real world object OR an event being described. Examples:
Sales of the business unit are the real value.
Address of an employee in the employee database is the real address.
So to summarize, data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. When data is of excellent quality, it can be easily processed and analyzed, leading to insights that help the organization make better decisions. High-quality data is essential to business intelligence efforts and other types of data analytics, as well as better operational efficiency.