Learning Based Data Quality Enhancement
Application Description:
Data needs to be consistent, complete and accurate before analysis can be performed. Otherwise, Garbage In Garbage Out.Data Science is used to de-duplicate data, deal with missing values, make information consistent across the dataset.
What’s different
- Provide an end-to-end data quality enhancement pipeline
- Data quality score tracking to ensure it is maintained above threshold
- Data processing, transformation and integration from heterogeneous sources
- Adopt efficient methods to handle missing data in a variety of approaches
- Robust and rapid quality enhancement with minimal impact on downstream steps
- Advanced machine learning methods to improve quality score on a continual basis
- Consistency of de-duplication to ensure removal of a maximum number of duplicates