The questionnaire data collected from cohort participants may have some errors introduced during collection, digitization and/or incorporation into our databases. In a conventional way for small data, these errors could be simply detected by visual inspection of original paper questionnaires. However, for our biobank-scale data, we need more effective and unlaborious way for detecting and noting such candidate errors instead.
We develop machine learning/artificial intelligence-based methods to complete the data cleaning in our biobank-scale data. We also develop sophisticated methods to detect outliers in other types of data in our large-scale genomic cohort.
See details >