We live in the world of data. Our devices are smart, our surroundings are full of various switches and sensors and our data is being analysed and used in many different ways. However, humans are not present in these processes and a lot of this data is actually dirty. How do we sift through it to find what actually matters? Scientists the University of Waterloo, University of Wisconsin and Stanford University developed a tool called HoloClean, which can recognize and remove dirty data.
Dirty data is essentially noise that is collected by various sensors or algorithms. Imagine a system which is analysing data of your websites. It can access all kinds of information, but not all of it is relevant. In fact, some of it is not even real – it is noise, which naturally occurs in all electronic systems. HoloClean is world’s first artificial intelligence-based technology, designed to recognize dirty data and correct it before passing it on for processing. Scientists say that this tool could become useful for various organizations that are working with vast amounts of data.
Scientists note that banks, utility companies and many other enterprises are working with a lot of data. Inevitably, some of it is bad – it can be inaccurate, false or simply irrelevant. HoloClean can be trained to find errors and correct them on its own. Of course, training AI is a long process in itself, but eventually HoloClean would go to town on that data, separate errors and correct them. Or exclude them from the data pool if that is the best decision. This would provide users with a cleaner dataset to use in their analytics. The end goal is and easier analysis with more accurate, dependable results.
Up until today incorrect data has to be identified and corrected manually. It is a long and expensive process, which is not even entirely accurate. Scientists hope that HoloClean could speed up this job, make it easier and more accurate. Ihab Ilyas, one of the developers of HoloClean, said: “This system addresses the problem where the information is out there, and people are using it to run analytics, but it is not correct. It doesn’t provide information that was not there, but instead corrects information you assume is correct”.
Operating on accurate data is hugely important. Only in this way you can hope to reach accurate results and make meaningful decisions. This is one of those jobs that is probably better off in the hands of artificial intelligence. Such system can be trained to sift through a lot of data, recognize errors and correct them, and this process can be both speedy and accurate.
Source: University of Waterloo