The final week of Google Summer of Code is almost here. Time to make some final adjustments and wrap up the project.
What are the major achievements?
Goal of the GSoC project was to enhance the capabilities of MNE-Python in terms of fitting linear-regression models and the inference measures that these deliver. Of particular importance for the project was the estimation of group-level effects.
During the first part of the GSoC, I focused on implementing a series of examples for fitting linear models on single subject data in general. This was meant to provide a perspective for future API-related questions, such as what kind of output should be delivered by a low-level linear regression in MNE-Python? Do we want to compute inference measures such as T- or P-values within this function? Or should the output be more simple (i.e., just the beta coefficients) in order to allow it to interface with 1) other, more specific functions (e.g., bootstrap) and 2) group-level analysis pipelines. Thus the first part of the GSoC provided a basis for testing and deriving considerations for group-level analysis techniques. Feel free to look at some of the examples on the GitHub site of the project the for more information on how to fit linear models to single subjects’ data.
In the second part of the GSoC, I focused on translate and extend these single subject analysis tools to allow group-level analyses, i.e., the estimation linear regression effects over a series of subjects. Here, a typical pipeline for second-level analysis would work as follows:
- Fit a single subject regression and extract the beta coefficients for the desired predictor. This is done to each subject in the dataset, thus creating a an array of single-subjects beta coefficients.
- Sample (with replacement) random subjects from this single-subjects beta coefficient array and compute t-values based in the random sample.
- Transform T-values to F-values.
- Now, spatiotemporal clustering is used to find clusters containing effects higher than some arbitrary threshold (for instance an F-value equivalent to an effect significant at p < 0.05) and record the cluster mass (sum of F values within a cluster) of cluster with the maximum cluster mass (i.e., the cluster mass H0).
- This approach is repeated several times, resulting in a distribution of max cluster-mass values.
- Finally, spatiotemporal clustering is used to find clusters in the original data. The trick is to threshold the observed clusters from original data based on their mass using the previously computed the cluster mass H0.
What still needs to be done.
There are a couple of open issues (see for instance the questions stated above) that concern the integration of these tools into MNE's API, which we want to make a subject of discussion during the final week of GSoC. However, I feel like we have make some very cool and useful advances in enhancing the statistical inference capabilities of the linear regression module in MNE-Python.