The first part of GSoC 2018 is over now with the completion of the First Evaluations, and I am really thankful to my mentor for passing me in it.
I had planned my project in an organized manner while writing the proposal and I am happy to be on track now with a few remaining parts. The first month, according to the plan, was to focus on the SARIMAX model and their model selection while the second month, which is now, is to focus on the Exponential Smoothing models and their model selection. Last week has been spent in deciding the parameters that would be valuable to keep while selecting an ES model for automatically forecasting the time-series data.
To get into the details, I have referred to the ets function of the forecast package in R and had run a few unit tests to check if the results match with them. Furthermore, we are following a brute force approach to fit the Exponential Smoothing models and running a check for their in-sample Information Criteria like AIC to choose the best model. The best model is then returned to use it for the forecasts.
Apart from these, One of my tasks is to figure out a way to connect the various modules and classes, that I have built during the first month, with the ES modules so that they all work together.
I’ll keep on posting more on this project after I complete the module and make it a really functional project.
It’s almost one month since I officially started coding for Statsmodels as a part of Google Summer of Code. The journey till now has been very challenging and thrilling until now. The milestones which I cover every week has taught me a lot regarding code practice, statistics and open source. I am sharing a few of the work that I have done during the last two weeks which I feel were the most challenging milestones during my first month of contribution.
The third week of my code contribution was targetted at expanding my auto_order function(created during the first week) to support computing seasonality order and intercepts. This included developing the code to check for all the different possibilities of AR, MA parameters along with the seasonal parameters which would provide the least AIC for a particular input time-series.
The fourth week was focussed on building an auto_transformation module which would help in automatically transforming the time-series into a stationary time-series. Since statsmodels already includes the Box-Cox transformation functionality, my focus was to create a module which would predict the parameters for this transformation. The book by Draper and Smith – “Applied Regression Analysis” provided some useful techniques to do that. The parameters(lambda) for the Box-Cox transformation was predicted by checking the value of lambda that maximizes the likelihood of linear regression.
The functions a module that I have developed are now to be tested with real-life examples against other modules and package(like the forecast package in R).