Check-in: 13th and final week of GSoC (Aug 19 - Aug 25)
JoseAlanis
Published: 08/26/2019
1. What did you do this week?
- As this was the final week of GSoC, I have written and posted a final report of the project here.
- In addition, I made a major overhaul of the project's website. Wich now contains a "gallery of examples" for some of major advancements and tools developed during the GSoC period.
- See this PR for a more detailed list of contributions made this week.
2. What is coming up next?
- There are a couple of open questions that concern the integration of these tools and analysis techniques to MNE's API.
- For instance, we've been using scikit-learn's linear regression module to fit the models. One of the main advantages of this approach consists in having a linear regression "object" as output, increasing the flexibility for manipulation of the linear model results, while leaving MNE's linear regression function untouched (for now). However, we believe that using a machine learning package for linear regression might lead to confusion among users on the long run.
- Thus, the next step is to discuss possible ways of integration to MNE-Python. Do we want to modify, simplify, or completely replace MNE's linear regression function to obtain similar output..
I really enjoyed working on this project during the summer and would be glad to continue working on extending the linear regression functionality of MNE-Python after GSoC.
3. Did you get stuck anywhere?
- Not really. Although the final week included a lot of thinking about what the most practical API might be for the tools developed during the GSoC period. We want to continue this discussion online (see here) and hopefully be able to fully integrate this advancements in the released version of MNE-Python soon.
Thanks for reading and please feel free to contribute, comment or post further ideas!
View Blog Post
Blogpost: 12th week of GSoC (Aug 12 - Aug 18)
JoseAlanis
Published: 08/19/2019
The final week of Google Summer of Code is almost here. Time to make some final adjustments and wrap up the project.
What are the major achievements?
Goal of the GSoC project was to enhance the capabilities of MNE-Python in terms of fitting linear-regression models and the inference measures that these deliver. Of particular importance for the project was the estimation of group-level effects.
During the first part of the GSoC, I focused on implementing a series of examples for fitting linear models on single subject data in general. This was meant to provide a perspective for future API-related questions, such as what kind of output should be delivered by a low-level linear regression in MNE-Python? Do we want to compute inference measures such as T- or P-values within this function? Or should the output be more simple (i.e., just the beta coefficients) in order to allow it to interface with 1) other, more specific functions (e.g., bootstrap) and 2) group-level analysis pipelines. Thus the first part of the GSoC provided a basis for testing and deriving considerations for group-level analysis techniques. Feel free to look at some of the examples on the GitHub site of the project the for more information on how to fit linear models to single subjects’ data.
In the second part of the GSoC, I focused on translate and extend these single subject analysis tools to allow group-level analyses, i.e., the estimation linear regression effects over a series of subjects. Here, a typical pipeline for second-level analysis would work as follows:
- Fit a single subject regression and extract the beta coefficients for the desired predictor. This is done to each subject in the dataset, thus creating a an array of single-subjects beta coefficients.
- Sample (with replacement) random subjects from this single-subjects beta coefficient array and compute t-values based in the random sample.
- Transform T-values to F-values.
- Now, spatiotemporal clustering is used to find clusters containing effects higher than some arbitrary threshold (for instance an F-value equivalent to an effect significant at p < 0.05) and record the cluster mass (sum of F values within a cluster) of cluster with the maximum cluster mass (i.e., the cluster mass H0).
- This approach is repeated several times, resulting in a distribution of max cluster-mass values.
- Finally, spatiotemporal clustering is used to find clusters in the original data. The trick is to threshold the observed clusters from original data based on their mass using the previously computed the cluster mass H0.
What still needs to be done.
There are a couple of open issues (see for instance the questions stated above) that concern the integration of these tools into MNE's API, which we want to make a subject of discussion during the final week of GSoC. However, I feel like we have make some very cool and useful advances in enhancing the statistical inference capabilities of the linear regression module in MNE-Python.
View Blog Post
Check-in: 11th week of GSoC (Aug 5 - Aug 11)
JoseAlanis
Published: 08/12/2019
1. What did you do this week?
Last week, I took a little break from GSoC and spend a couple of days away on vacation. Therefore, I wasn't able to get much of the actual coding work done during the week. However, as in other projects there is a wide variety of tasks not directly related to coding, which I was able to focus during the week. These included for instance:
1. Reviewing the documentation and code of other, similar projects, with the purpose of gaining helpful insights for our own data analysis pipelines.
2. Reading literature of particular relevance for the project (e.g., implementation of correction methods for multiple testing in the field of neuroscience).
3. and last but not least, testing different approaches for implementing these methods.
In addition, I was able to set up a short tutorial for implementing the spatiotemporal clustering using threshold free cluster enhancement (TFCE) and bootstrap in MNE-LIMO. See this PR for instance.
2. What is coming up next?
Next week I will continue working on the spatiotemporal clustering technique for dealing with the multiple testing issue.
3. Did you get stuck anywhere?
I didn’t feel stuck with anything in particular. However, it was somewhat complicated to understand the implementation of spatiotemporal clustering techniques for time-series neural data. However, after doing some further reading and playing around with the LIMO-dataset I think I was able to set up a good basis for the final weeks.
View Blog Post
Blogpost: 10th week of GSoC (Jul 29 - Aug 04)
JoseAlanis
Published: 08/05/2019
During this last week I focused on an implantation of the classical bootstrap, as well as the bootstrap-t technique (see previous post for a detailed description of the latter), to provide a robust estimate of significance for the results of the group-level linear regression analysis framework for neural time-series we've been working on during the last few weeks.
In particular, this week I was able to put together a set of functions in a tutorial that shows how the second-level (i.e., group-level) regression analysis can be extended to estimate the moderating effects of a continuous covariate on subject-level predictors. In other words, how variability in the strength of the effect of a primary predictot can be attributed to inter-subject variability on another, putative secondary variable (the subject’s age, for instance).
On a first step, the linear model is fitted each subject’s data (i.e., first level analysis) and the regression coefficients are extracted for the predictor in question. Then the approach consists in sampling with replacement an n number of second level design matrices, with n being the number of subjects in the original sample. Here, the link between subjects and covariate values is maintained, so for simplicity the subject indices (or IDs) are sampled. Thus, the linear model is fitted on the previously estimated subject-level regression coefficients of a given predictor variable, this time however, with the covariate values on the predicting side of the equation.
Next the second-level coefficients sorted in ascending order and the 95% confidence interval is computed. In the added tutorial (see here), we use 2000 bootstraps, although "as little as" 599 bootstraps have been previously shown to be enough to control for false positives in the inference process (see for instance here).
One challenge is however that no P-values can be computed with this technique. One was to derive a decision on the the statistical significance significance of this effect cane be achieved via the confidence interval of the regression coefficients: a regression coefficient is significant if the confidence interval does not contain zero.
View Blog Post
Check-in: 9th week of GSoC (Jul 22 - Jul 28)
JoseAlanis
Published: 07/29/2019
1. What did you do this week?
This week I was able to make some good progress on the group-level inference part for my GSoC project.
This included:
- Estimate group-levels effects of a continuous variable for the full data space using linear-regression.
- First, this approach required the fitting of a linear regression model for each subject in the data set (i.e., first level analysis) and extracting the estimated beta-coefficients for the variabale in question. This part of the analysis picked up on the tools we've been working during the last few weeks.
- Second, with the beta-coefficents form each subject (i.e., the original betas), a bootstrap-t (or studentized bootstrap, see for instance here) was carried out. Here, t-values where calculated for each "group of betas" sampled with replacement from the original betas. These t-values where then used to estimate more robust confidence intervals and provide a measure of "significance" or consistency of the observed effects on a group level.
- For further discussion, see this PR on GitHub.
2. What is coming up next?
Next week, I will be working on an extension of this analysis technique for other group-level hypothesis-testing scenarios that can be derived from the linear regression framework.
In addition, one challenge for the next few weeks relies on the estimation of group-level p-values (significance testing) and correcting these for multiple testing. I particular we want to use spatiotemporal clustering techniques and bootstrap to achieve this.
3. Did you get stuck anywhere?
I wouldn't say I was stuck with anything in particular. Although, understanding the bootstrap-t technique and its implementation for the analysis of neural time-series data was somewhat challenging and required some more reading than the usual. However, after discussion and review with my mentors, I feel confident and believe our advancement are going in the right direction.
View Blog Post