As I have now completed two full weeks as a Google summer of code student, things have got so much better now. I have learned quite a lot about open source communities and how good software is written. Most importantly I have learned a lot about the Python programming language.
As I have mentioned in my previous blogs that my project is based on building up an Automatic Forecasting module for the Statsmodels package which would help in automatically setting up time series models. I have been able to successfully meet my targets for the same. During the first week, my objective was to complete a simple module that would take a given range of parameter for SARIMAX models and would select the best combinations of parameters (p, q, i.e., Autoregressive and moving average parameters respectively) based on AIC(Akaike Information criteria) values.
The target for my second week was to design the classes and the supporting functions that the end user would require to use the models. For this part, we had split our dataset into two sets, i.e., the training set(on which the models are built) and the testing set(on which the models are validated). A part of this also included calculating different accuracy measures like MAE(Mean Absolute Error), RMSE(Root Mean Squared Error), MAPE(Mean Absolute Percentage Error), etc. which would be used to check the accuracy of our models. These accuracy measures were performed on the testing set, and the above measures were used to validate the models.
All my commits are present in a single pull request, and the fork to which I am pushing my changes can be found here at
Please provide me any feedback that would help in doing better for my GSoC project and my blog here.
In response to the recent requirement of putting up a blog for posting my code publically (with new code at least once a week), I would like to provide the link of the branch where I commit my code
This branch contains all my code contributions to the local forked statsmodels repository. So far I am even with my first-week target and looking forward to work on my second-week milestone.
My project is about building an automatic forecasting module for the statsmodels package. This module would help in automatically determining the parameters for different time series models(SARIMAX and ES).
This Summer is going to be great. It was my first attempt at GSoC and I’m glad I made it through.
I have been selected as a Google Summer of Code 2018 student at Statsmodels under the Python Software Foundation where I’ll be responsible for developing an Automatic forecasting model for time-series data.
The aim of the project is to implement an automatic forecasting infrastructure for statsmodels similar to auto.arima()/ets() of the ‘forecast’ package in R. The goals will be to use the existing models of statsmodels like SARIMAX and ES to build a forecasting method that would automatically detect the best model and forecast values based on that model.
Automatic forecasting algorithms determine an appropriate time series model, estimate the parameters and compute the forecasts. They are appropriate for various time series patterns, and applicable to large numbers of series without user intervention.
As of now, I have planned to start my project by first creating a modular infrastructure for the complete automatic forecasting process which I should be able to fit in any new models or variations as per requirements.
I have prepared myself with all the basic requirements that I need to have in terms of theory(Statistics background knowledge) and a good hands-on the python language which would help me to give a kickstart to my project.
I’ll be updating my whereabouts for this project here on this blog.