Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
During this week, I focused on addressing the issue of artificial information in datasets, specifically targeting cases where it appears in the trailing mantissa bits. To tackle this problem, I employed two parameters, namely CDF (cumulative distribution function) and bitinformation, to develop effective mitigation strategies.
For variables where the CDF starts becoming constant, I utilized this behavior as an indicator of potential artificial information. By identifying the point where the CDF becomes constant, I considered it as a threshold and removed the trailing mantissa bits beyond that point.
In cases where the bitinformation starts to become zero, I employed this observation to determine the keepbits. The point where the bitinformation becomes zero represents a potential threshold for distinguishing artificial information. By selecting the bit where bitinformation starts to decline and setting it as the keepbits, I discarded the trailing bits, assuming they likely contained artificial information.
What is coming up next?
Will try to tackle artificial information in variables where artificial information dosen't necessarily pop in just trailing mantissa bits.