Project status after a month of coding

It’s almost one month since I officially started coding for Statsmodels as a part of Google Summer of Code. The journey till now has been very challenging and thrilling until now. The milestones which I cover every week has taught me a lot regarding code practice, statistics and open source. I am sharing a few of the work that I have done during the last two weeks which I feel were the most challenging milestones during my first month of contribution.

The third week of my code contribution was targetted at expanding my auto_order function(created during the first week) to support computing seasonality order and intercepts. This included developing the code to check for all the different possibilities of AR, MA parameters along with the seasonal parameters which would provide the least AIC for a particular input time-series.

The fourth week was focussed on building an auto_transformation module which would help in automatically transforming the time-series into a stationary time-series. Since statsmodels already includes the Box-Cox transformation functionality, my focus was to create a module which would predict the parameters for this transformation. The book by Draper and Smith – “Applied Regression Analysis” provided some useful techniques to do that. The parameters(lambda) for the Box-Cox transformation was predicted by checking the value of lambda that maximizes the likelihood of linear regression.

The functions a module that I have developed are now to be tested with real-life examples against other modules and package(like the forecast package in R).

 

Two weeks into Google summer of code

As I have now completed two full weeks as a Google summer of code student, things have got so much better now. I have learned quite a lot about open source communities and how good software is written. Most importantly I have learned a lot about the Python programming language.

As I have mentioned in my previous blogs that my project is based on building up an Automatic Forecasting module for the Statsmodels package which would help in automatically setting up time series models. I have been able to successfully meet my targets for the same. During the first week, my objective was to complete a simple module that would take a given range of parameter for SARIMAX models and would select the best combinations of parameters (p, q, i.e., Autoregressive and moving average parameters respectively) based on AIC(Akaike Information criteria) values.

The target for my second week was to design the classes and the supporting functions that the end user would require to use the models. For this part, we had split our dataset into two sets, i.e., the training set(on which the models are built) and the testing set(on which the models are validated). A part of this also included calculating different accuracy measures like MAE(Mean Absolute Error), RMSE(Root Mean Squared Error), MAPE(Mean Absolute Percentage Error), etc. which would be used to check the accuracy of our models. These accuracy measures were performed on the testing set, and the above measures were used to validate the models.

All my commits are present in a single pull request, and the fork to which I am pushing my changes can be found here at

https://github.com/abhijeetpanda12/statsmodels/tree/auto-forecast-1

Please provide me any feedback that would help in doing better for my GSoC project and my blog here.

checkout my code contributions

In response to the recent requirement of putting up a blog for posting my code publically (with new code at least once a week), I would like to provide the link of the branch where I commit my code

https://github.com/abhijeetpanda12/statsmodels/tree/auto-forecast-1

This branch contains all my code contributions to the local forked statsmodels repository. So far I am even with my first-week target and looking forward to work on my second-week milestone.

My project is about building an automatic forecasting module for the statsmodels package. This module would help in automatically determining the parameters for different time series models(SARIMAX and ES).

Say hello to the summer of code

This Summer is going to be great. It was my first attempt at GSoC and I’m glad I made it through.
I have been selected as a Google Summer of Code 2018 student at Statsmodels under the Python Software Foundation where I’ll be responsible for developing an Automatic forecasting model for time-series data.
The aim of the project is to implement an automatic forecasting infrastructure for statsmodels similar to auto.arima()/ets() of the ‘forecast’ package in R. The goals will be to use the existing models of statsmodels like SARIMAX and ES to build a forecasting method that would automatically detect the best model and forecast values based on that model.
Automatic forecasting algorithms determine an appropriate time series model, estimate the parameters and compute the forecasts. They are appropriate for various time series patterns, and applicable to large numbers of series without user intervention.
As of now, I have planned to start my project by first creating a modular infrastructure for the complete automatic forecasting process which I should be able to fit in any new models or variations as per requirements.
I have prepared myself with all the basic requirements that I need to have in terms of theory(Statistics background knowledge) and a good hands-on the python language which would help me to give a kickstart to my project.
I’ll be updating my whereabouts for this project here on this blog.