Weekly Blog Post #12
Ishaanj18
Published: 08/15/2023
Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
Throughout this week, I focused on enhancing our understanding of the datasets and exploring potential artificial information. I approached this task from two angles: Firstly, I experimented with a new method of chunking variables. By modifying the way variables are chunked, I aimed to uncover any hidden patterns or anomalies that might indicate the presence of artificial information. Secondly, I leveraged the power of matplotlib to visualize the data on a map. I plotted the variables' data points onto a map, providing a spatial perspective that could help reveal geospatial trends or irregularities.
What is coming up next?
In the upcoming week, I plan to delve deeper into the findings from these experiments.
View Blog Post
Weekly Blog Post #11
Ishaanj18
Published: 08/07/2023
Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
During this week, I made a significant improvement to my function's methodology for removing artificial information from datasets. Instead of monitoring the frequency of CDF values and selecting the mantissa bit with the highest frequency as the keepbit, I adopted a gradient-based approach. I introduced a calculation to determine the gradient at each mantissa bit, and then identified the bit where the gradient becomes zero as the keepbit. This new approach proved to be more accurate in identifying and retaining relevant data while effectively removing artificial information
What is coming up next?
Will introduce maps to showcase the distribution of variables across United States and add a feature to showcase the true keepbit in bitinformation plot.
View Blog Post
Weekly Blog Post #10
Ishaanj18
Published: 07/31/2023
Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
During this week, my main focus was on testing the function responsible for removing artificial information from various datasets. The function utilizes two essential parameters, namely the Cumulative Distribution Function (CDF) and bit information, to accurately identify and eliminate artificial information, returning the true keepbits.
I then applied the function to each dataset and carefully analyzed the behavior of the CDF and bit information for each case. By closely examining the CDF, I identified instances where it started becoming constant. In such situations, I utilized the function to cut out the trailing bits beyond the point where the CDF stabilized. Furthermore, I observed datasets where the bit information started becoming zero. For these cases, I utilized the true keepbits to truncate the trailing bits, effectively eliminating the artificial information from those datasets.
The insights gained from this testing phase will be crucial as we move forward to integrate the function into our data processing pipeline for our project.
What is coming up next?
Will try to tackle artificial information in variables where artificial information dosen't necessarily pop in just trailing mantissa bits.
View Blog Post
Weekly Blog Post #9
Ishaanj18
Published: 07/25/2023
Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
During this week, I focused on addressing the issue of artificial information in datasets, specifically targeting cases where it appears in the trailing mantissa bits. To tackle this problem, I employed two parameters, namely CDF (cumulative distribution function) and bitinformation, to develop effective mitigation strategies.
For variables where the CDF starts becoming constant, I utilized this behavior as an indicator of potential artificial information. By identifying the point where the CDF becomes constant, I considered it as a threshold and removed the trailing mantissa bits beyond that point.
In cases where the bitinformation starts to become zero, I employed this observation to determine the keepbits. The point where the bitinformation becomes zero represents a potential threshold for distinguishing artificial information. By selecting the bit where bitinformation starts to decline and setting it as the keepbits, I discarded the trailing bits, assuming they likely contained artificial information.
What is coming up next?
Will try to tackle artificial information in variables where artificial information dosen't necessarily pop in just trailing mantissa bits.
View Blog Post
Weekly Blog Post #8
Ishaanj18
Published: 07/20/2023
Hello! My name is Ishaan Jain, a Information Technology undergrad at Manipal University Jaipur. I will be working on developing an information theoretic approach to filter out artificial information and real information in geospatial datasets for Xbitinfo during Google Summer of Code 2023.
What did I do this week?
Throughout this week, I devoted my efforts to categorizing variables within the CONUS404 dataset based on the presence of artificial information. After categorizing the variables, I focused on developing effective methods to address the artificial information for each category.
What is coming up next?
Will try working on methods which will work on most of the categories.
View Blog Post