Purpose

Our goal is to make a simple web app that allows users to seamlessly integrate data from different Silos into one data lake. With disjoint silos such as these, we often run into to the issue of duplicate data types where column heads from different silos have slightly different names for example 'Date of Birth' and 'DOB'. It can be tedious for a user to comb through hundreds or even thousands of column heads.

Enter the Lucky7 Data Classifier. The Data Classifier uses machine learning to pick out column headers from different silos that seem to encompass the same data types. Once similar data types have been found, we wrap the column headers into a classification. A classification is a group of similar column types. Later, a user will review the classification to see if the fields in it are accurate

Last updated

Was this helpful?