JoseAlanis's Blog

Blogpost: 12th week of GSoC (Aug 12 - Aug 18)

Published: 08/19/2019

The final week of Google Summer of Code is almost here. Time to make some final adjustments and wrap up the project.

What are the major achievements?

Goal of the GSoC project was to enhance the capabilities of MNE-Python in terms of fitting linear-regression models and the inference measures that these deliver. Of particular importance for the project was the estimation of group-level effects.

During the first part of the GSoC, I focused on implementing a series of examples for fitting linear models on single subject data in general. This was meant to provide a perspective for future API-related questions, such as what kind of output should be delivered by a low-level linear regression in MNE-Python? Do we want to compute inference measures such as T- or P-values within this function? Or should the output be more simple (i.e., just the beta coefficients) in order to allow it to interface with 1) other, more specific functions (e.g., bootstrap) and 2) group-level analysis pipelines. Thus the first part of the GSoC provided a basis for testing and deriving considerations for group-level analysis techniques. Feel free to look at some of the examples on the GitHub site of the project the for more information on how to fit linear models to single subjects’ data.

In the second part of the GSoC, I focused on translate and extend these single subject analysis tools to allow group-level analyses, i.e., the estimation linear regression effects over a series of subjects. Here, a typical pipeline for second-level analysis would work as follows:

  • Fit a single subject regression and extract the beta coefficients for the desired predictor. This is done to each subject in the dataset, thus creating a an array of single-subjects beta coefficients.
  • Sample (with replacement) random subjects from this single-subjects beta coefficient array and compute t-values based in the random sample.
    1. Transform T-values to F-values.
    2. Now, spatiotemporal clustering is used to find clusters containing effects higher than some arbitrary threshold (for instance an F-value equivalent to an effect significant at p < 0.05) and record the cluster mass (sum of F values within a cluster) of cluster with the maximum cluster mass (i.e., the cluster mass H0).
    3. This approach is repeated several times, resulting in a distribution of max cluster-mass values.
  • Finally, spatiotemporal clustering is used to find clusters in the original data. The trick is to threshold the observed clusters from original data based on their mass using the previously computed the cluster mass H0.

What still needs to be done.

There are a couple of open issues (see for instance the questions stated above) that concern the integration of these tools into MNE's API, which we want to make a subject of discussion during the final week of GSoC. However, I feel like we have make some very cool and useful advances in enhancing the statistical inference capabilities of the linear regression module in MNE-Python.


View Blog Post

Check-in: 11th week of GSoC (Aug 5 - Aug 11)

Published: 08/12/2019

1. What did you do this week?

Last week, I took a little break from GSoC and spend a couple of days away on vacation. Therefore, I wasn't able to get much of the actual coding work done during the week. However, as in other projects there is a wide variety of tasks not directly related to coding, which I was able to focus during the week. These included for instance:

1. Reviewing the documentation and code of other, similar projects, with the purpose of gaining helpful insights for our own data analysis pipelines.
2. Reading literature of particular relevance for the project (e.g., implementation of correction methods for multiple testing in the field of neuroscience).
3. and last but not least, testing different approaches for implementing these methods.

In addition, I was able to set up a short tutorial for implementing the spatiotemporal clustering using threshold free cluster enhancement (TFCE) and bootstrap in MNE-LIMO. See this PR for instance.

2. What is coming up next?

Next week I will continue working on the spatiotemporal clustering technique for dealing with the multiple testing issue.

3. Did you get stuck anywhere?

I didn’t feel stuck with anything in particular. However, it was somewhat complicated to understand the implementation of spatiotemporal clustering techniques for time-series neural data. However, after doing some further reading and playing around with the LIMO-dataset I think I was able to set up a good basis for the final weeks.

View Blog Post

Blogpost: 10th week of GSoC (Jul 29 - Aug 04)

Published: 08/05/2019

During this last week I focused on an implantation of the classical bootstrap, as well as the bootstrap-t technique (see previous post for a detailed description of the latter), to provide a robust estimate of significance for the results of the group-level linear regression analysis framework for neural time-series we've been working on during the last few weeks.

In particular, this week I was able to put together a set of functions in a tutorial that shows how the second-level (i.e., group-level) regression analysis can be extended to estimate the moderating effects of a continuous covariate on subject-level predictors. In other words, how variability in the strength of the effect of a primary predictot can be attributed to inter-subject variability on another, putative secondary variable (the subject’s age, for instance).

On a first step, the linear model is fitted each subject’s data (i.e., first level analysis) and the regression coefficients are extracted for the predictor in question. Then the approach consists in sampling with replacement an n number of second level design matrices, with n being the number of subjects in the original sample. Here, the link between subjects and covariate values is maintained, so for simplicity the subject indices (or IDs) are sampled. Thus, the linear model is fitted on the previously estimated subject-level regression coefficients of a given predictor variable, this time however, with the covariate values on the predicting side of the equation.

Next the second-level coefficients sorted in ascending order and the 95% confidence interval is computed. In the added tutorial (see here), we use 2000 bootstraps, although "as little as" 599 bootstraps have been previously shown to be enough to control for false positives in the inference process (see for instance here).

One challenge is however that no P-values can be computed with this technique. One was to derive a decision on the the statistical significance significance of this effect cane be achieved via the confidence interval of the regression coefficients: a regression coefficient is significant if the confidence interval does not contain zero.

View Blog Post

Check-in: 9th week of GSoC (Jul 22 - Jul 28)

Published: 07/29/2019

1. What did you do this week?

This week I was able to make some good progress on the group-level inference part for my GSoC project.

This included:

  • Estimate group-levels effects of a continuous variable for the full data space using linear-regression.
    • First, this approach required the fitting of a linear regression model for each subject in the data set (i.e., first level analysis) and extracting the estimated beta-coefficients for the variabale in question. This part of the analysis picked up on the tools we've been working during the last few weeks.
    • Second, with the beta-coefficents form each subject (i.e., the original betas), a bootstrap-t (or studentized bootstrap, see for instance here) was carried out. Here, t-values where calculated for each "group of betas" sampled with replacement from the original betas. These t-values where then used to estimate more robust confidence intervals and provide a measure of "significance" or consistency of the observed effects on a group level.
    • For further discussion, see this PR on GitHub.

2. What is coming up next?

Next week, I will be working on an extension of this analysis technique for other group-level hypothesis-testing scenarios that can be derived from the linear regression framework.

In addition, one challenge for the next few weeks relies on the estimation of group-level p-values (significance testing) and correcting these for multiple testing. I particular we want to use spatiotemporal clustering techniques and bootstrap to achieve this.

3. Did you get stuck anywhere?

I wouldn't say I was stuck with anything in particular. Although, understanding the bootstrap-t technique and its implementation  for the analysis of neural time-series data was somewhat challenging and required some more reading than the usual. However, after discussion and review with my mentors, I feel confident and believe our advancement are going in the right direction.



View Blog Post

Blogpost: 8th week of GSoC (Jul 15 - Jul 21)

Published: 07/22/2019

With this week, the second month of GSoC also comes to an end. It feels like a good moment for providing a summary of the progress made so far and also an outlook, considering the tasks and challenges that remain ahead.

Building MNE-LIMO, a "workbench" for linear regresssion analyis in MNE-Python

At the beginning of this GSoC project we set the primary goal of extending the functionality of the linear regression module in MNE-Python. During the last two months, I've focussed my work on developing functions, example code, and methodological tutorials with the purpose of providing MNE-users with a learning-oriented environment, or "workbench", for specifying and fitting linear regression models for experimental designs that are commonly used by the neuroscience community.

Here, we have achieve a some important milestones, such as showing how the python package scikit-learn can be used to fit linear-regression models on neural time-series data and how MNE-Python can be used in interface with the output of these tools to compute and visualize measures of inference and quality of the model's results.

In addition, we have recently launched a website, MNE-LIMO (short for MNE-LInear Regression MOdel), where we provide more background information and details on these tools and methods.

Bringing MNE-LIMO to the next level.

The challenge for the next month is to extend the tools and methods we've developed so far to allow the estimation of linear regression effects over multiple subjects. The goal here is to use a mass-univariate analysis approach, where the analysis methods not only focus on a average data for a few recording sites, but rather take the full data space into account.

This approach also represents the greatest challenge of the project as it rises a series of implications for further analyses and assesment of the results, such as controlling for an increased number of potential false positive results due to multiple testing.

In the following weeks I'll focus on adapting the tutorials and examples developed during the last few weeks to allow this kind of analyses.




View Blog Post