Articles on JoseAlanis's Blog

Check-in: 13th and final week of GSoC (Aug 19 - Aug 25)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 26 Aug 2019 14:15:16 +0000

1. What did you do this week?

As this was the final week of GSoC, I have written and posted a final report of the project here.
In addition, I made a major overhaul of the project's website. Wich now contains a "gallery of examples" for some of major advancements and tools developed during the GSoC period.
See this PR for a more detailed list of contributions made this week.

2. What is coming up next?

There are a couple of open questions that concern the integration of these tools and analysis techniques to MNE's API.
For instance, we've been using scikit-learn's linear regression module to fit the models. One of the main advantages of this approach consists in having a linear regression "object" as output, increasing the flexibility for manipulation of the linear model results, while leaving MNE's linear regression function untouched (for now). However, we believe that using a machine learning package for linear regression might lead to confusion among users on the long run.
Thus, the next step is to discuss possible ways of integration to MNE-Python. Do we want to modify, simplify, or completely replace MNE's linear regression function to obtain similar output..

I really enjoyed working on this project during the summer and would be glad to continue working on extending the linear regression functionality of MNE-Python after GSoC.

3. Did you get stuck anywhere?

Not really. Although the final week included a lot of thinking about what the most practical API might be for the tools developed during the GSoC period. We want to continue this discussion online (see here) and hopefully be able to fully integrate this advancements in the released version of MNE-Python soon.

Thanks for reading and please feel free to contribute, comment or post further ideas!

Blogpost: 12th week of GSoC (Aug 12 - Aug 18)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 19 Aug 2019 21:37:58 +0000

The final week of Google Summer of Code is almost here. Time to make some final adjustments and wrap up the project.

What are the major achievements?

Goal of the GSoC project was to enhance the capabilities of MNE-Python in terms of fitting linear-regression models and the inference measures that these deliver. Of particular importance for the project was the estimation of group-level effects.

During the first part of the GSoC, I focused on implementing a series of examples for fitting linear models on single subject data in general. This was meant to provide a perspective for future API-related questions, such as what kind of output should be delivered by a low-level linear regression in MNE-Python? Do we want to compute inference measures such as T- or P-values within this function? Or should the output be more simple (i.e., just the beta coefficients) in order to allow it to interface with 1) other, more specific functions (e.g., bootstrap) and 2) group-level analysis pipelines. Thus the first part of the GSoC provided a basis for testing and deriving considerations for group-level analysis techniques. Feel free to look at some of the examples on the GitHub site of the project the for more information on how to fit linear models to single subjects’ data.

In the second part of the GSoC, I focused on translate and extend these single subject analysis tools to allow group-level analyses, i.e., the estimation linear regression effects over a series of subjects. Here, a typical pipeline for second-level analysis would work as follows:

Fit a single subject regression and extract the beta coefficients for the desired predictor. This is done to each subject in the dataset, thus creating a an array of single-subjects beta coefficients.
Sample (with replacement) random subjects from this single-subjects beta coefficient array and compute t-values based in the random sample.
1. Transform T-values to F-values.
2. Now, spatiotemporal clustering is used to find clusters containing effects higher than some arbitrary threshold (for instance an F-value equivalent to an effect significant at p < 0.05) and record the cluster mass (sum of F values within a cluster) of cluster with the maximum cluster mass (i.e., the cluster mass H0).
3. This approach is repeated several times, resulting in a distribution of max cluster-mass values.
Finally, spatiotemporal clustering is used to find clusters in the original data. The trick is to threshold the observed clusters from original data based on their mass using the previously computed the cluster mass H0.

What still needs to be done.

There are a couple of open issues (see for instance the questions stated above) that concern the integration of these tools into MNE's API, which we want to make a subject of discussion during the final week of GSoC. However, I feel like we have make some very cool and useful advances in enhancing the statistical inference capabilities of the linear regression module in MNE-Python.

Check-in: 11th week of GSoC (Aug 5 - Aug 11)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 12 Aug 2019 21:12:12 +0000

1. What did you do this week?

Last week, I took a little break from GSoC and spend a couple of days away on vacation. Therefore, I wasn't able to get much of the actual coding work done during the week. However, as in other projects there is a wide variety of tasks not directly related to coding, which I was able to focus during the week. These included for instance:

1. Reviewing the documentation and code of other, similar projects, with the purpose of gaining helpful insights for our own data analysis pipelines.
2. Reading literature of particular relevance for the project (e.g., implementation of correction methods for multiple testing in the field of neuroscience).
3. and last but not least, testing different approaches for implementing these methods.

In addition, I was able to set up a short tutorial for implementing the spatiotemporal clustering using threshold free cluster enhancement (TFCE) and bootstrap in MNE-LIMO. See this PR for instance.

2. What is coming up next?

Next week I will continue working on the spatiotemporal clustering technique for dealing with the multiple testing issue.

3. Did you get stuck anywhere?

I didn’t feel stuck with anything in particular. However, it was somewhat complicated to understand the implementation of spatiotemporal clustering techniques for time-series neural data. However, after doing some further reading and playing around with the LIMO-dataset I think I was able to set up a good basis for the final weeks.

Blogpost: 10th week of GSoC (Jul 29 - Aug 04)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 05 Aug 2019 20:13:57 +0000

During this last week I focused on an implantation of the classical bootstrap, as well as the bootstrap-t technique (see previous post for a detailed description of the latter), to provide a robust estimate of significance for the results of the group-level linear regression analysis framework for neural time-series we've been working on during the last few weeks.

In particular, this week I was able to put together a set of functions in a tutorial that shows how the second-level (i.e., group-level) regression analysis can be extended to estimate the moderating effects of a continuous covariate on subject-level predictors. In other words, how variability in the strength of the effect of a primary predictot can be attributed to inter-subject variability on another, putative secondary variable (the subject’s age, for instance).

On a first step, the linear model is fitted each subject’s data (i.e., first level analysis) and the regression coefficients are extracted for the predictor in question. Then the approach consists in sampling with replacement an n number of second level design matrices, with n being the number of subjects in the original sample. Here, the link between subjects and covariate values is maintained, so for simplicity the subject indices (or IDs) are sampled. Thus, the linear model is fitted on the previously estimated subject-level regression coefficients of a given predictor variable, this time however, with the covariate values on the predicting side of the equation.

Next the second-level coefficients sorted in ascending order and the 95% confidence interval is computed. In the added tutorial (see here), we use 2000 bootstraps, although "as little as" 599 bootstraps have been previously shown to be enough to control for false positives in the inference process (see for instance here).

One challenge is however that no P-values can be computed with this technique. One was to derive a decision on the the statistical significance significance of this effect cane be achieved via the confidence interval of the regression coefficients: a regression coefficient is significant if the confidence interval does not contain zero.

Check-in: 9th week of GSoC (Jul 22 - Jul 28)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 29 Jul 2019 21:45:13 +0000

1. What did you do this week?

This week I was able to make some good progress on the group-level inference part for my GSoC project.

This included:

Estimate group-levels effects of a continuous variable for the full data space using linear-regression.
- First, this approach required the fitting of a linear regression model for each subject in the data set (i.e., first level analysis) and extracting the estimated beta-coefficients for the variabale in question. This part of the analysis picked up on the tools we've been working during the last few weeks.
- Second, with the beta-coefficents form each subject (i.e., the original betas), a bootstrap-t (or studentized bootstrap, see for instance here) was carried out. Here, t-values where calculated for each "group of betas" sampled with replacement from the original betas. These t-values where then used to estimate more robust confidence intervals and provide a measure of "significance" or consistency of the observed effects on a group level.
- For further discussion, see this PR on GitHub.

2. What is coming up next?

Next week, I will be working on an extension of this analysis technique for other group-level hypothesis-testing scenarios that can be derived from the linear regression framework.

In addition, one challenge for the next few weeks relies on the estimation of group-level p-values (significance testing) and correcting these for multiple testing. I particular we want to use spatiotemporal clustering techniques and bootstrap to achieve this.

3. Did you get stuck anywhere?

I wouldn't say I was stuck with anything in particular. Although, understanding the bootstrap-t technique and its implementation for the analysis of neural time-series data was somewhat challenging and required some more reading than the usual. However, after discussion and review with my mentors, I feel confident and believe our advancement are going in the right direction.

Blogpost: 8th week of GSoC (Jul 15 - Jul 21)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 22 Jul 2019 16:22:57 +0000

With this week, the second month of GSoC also comes to an end. It feels like a good moment for providing a summary of the progress made so far and also an outlook, considering the tasks and challenges that remain ahead.

Building MNE-LIMO, a "workbench" for linear regresssion analyis in MNE-Python

At the beginning of this GSoC project we set the primary goal of extending the functionality of the linear regression module in MNE-Python. During the last two months, I've focussed my work on developing functions, example code, and methodological tutorials with the purpose of providing MNE-users with a learning-oriented environment, or "workbench", for specifying and fitting linear regression models for experimental designs that are commonly used by the neuroscience community.

Here, we have achieve a some important milestones, such as showing how the python package scikit-learn can be used to fit linear-regression models on neural time-series data and how MNE-Python can be used in interface with the output of these tools to compute and visualize measures of inference and quality of the model's results.

In addition, we have recently launched a website, MNE-LIMO (short for MNE-LInear Regression MOdel), where we provide more background information and details on these tools and methods.

Bringing MNE-LIMO to the next level.

The challenge for the next month is to extend the tools and methods we've developed so far to allow the estimation of linear regression effects over multiple subjects. The goal here is to use a mass-univariate analysis approach, where the analysis methods not only focus on a average data for a few recording sites, but rather take the full data space into account.

This approach also represents the greatest challenge of the project as it rises a series of implications for further analyses and assesment of the results, such as controlling for an increased number of potential false positive results due to multiple testing.

In the following weeks I'll focus on adapting the tutorials and examples developed during the last few weeks to allow this kind of analyses.

Check-in: 7th week of GSoC (Jul 08 - Jul 14)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 15 Jul 2019 21:34:30 +0000

1. What did you do this week?

This week I continued to work on the documentation website for my GSoC project.

This included:

Adding configuration files and changing the structure of the repo (see here) to be more learning-oriented and shows some of the basic functionality of MNE-LIMO (short for "using LInear regression MOdels in MNE-Python").
Currently, we are implementing a series of examples to fit linear models on single subject data.
In addition, we have started to develop method to translate single subject analysis to group-levels analysis, i.e., estimating linear regression effects over a series of subjects.
These sections in the documentation, single subject vs. group-level analyses, cover useful analysis techniques that can be used to estimate linear regression effects and derive inferential measures to evaluate the estimated effects.

2. What is coming up next?

Next week, I'll focus on extending the group-level section of the repo.

3. Did you get stuck anywhere?

Aside from minor issues and initial difficulties while writing the configuration files needed to build the sphinx-documentation. It all went pretty smooth.

Blog post: 6th week of GSoC (Jul 01 - Jul 07)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 08 Jul 2019 21:29:21 +0000

This week I was able to make good progress and finalize on a couple of PRs on my GSoC project repository on GitHub.

In particular, this weeks work was focused on visualization of the results of the linear regression analysis for neural time series data that I've been working on during the last few weeks, with special concentration on inferential measures, such as p-values, t-values, and indices of goodness of fit for the model, such as R-squared.

In addition I started to build up the documentation for my GSoC project. The idea is to use the documentation site as a gallery of examples and practical guides or "walkthroughs" that focus on typical analysis scenarios and research questions. This way MNE-users can easily adapt the code to their needs and make use of the linear regression framework more flexibly.

Next week, I will continue to build up the site and documentation for my GSoC project repository on GitHub and extend the examples to perform second-level analyses.

Check-in: 5th week of GSoC (June 24 - June 30)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 01 Jul 2019 21:41:57 +0000

Most of this week's work was related to continue the development of scripts and example code files that look at possible ways of computing interesting parameters and visualizing the results of linear regression algorithms for neural times series data.

1. What did you do this week?

Thus, this week's progress is easily summarized with the following bullets and links:

Write an example script that shows how to compute and visualize the coefficient of determination, i.e., the proportion of explained variance by a set of predictors in a linear model. The nice thing about this is that we can visualize this effects in an "EEG" fashion, i.e., in a way that MNE-users will probably will find appealing (see here).
Other work was related on computing inferential measures for the same effects, such as p- and t-values, that might help interpret the significance of effects on a more straight forward manner.

2. What is coming up next?

I believe that we have established a good basis of analyses during the last few weeks and hope to be able to tackle second level analysis next week (i.e., the estimation of linear regression effects on set of data from different individuals).

3. Did you get stuck anywhere?

I don't feel like I got stuck with anything in particular. Although, the more I get into this GSoC-project, the more I find my self needing recap some basic linear and matrix algebra lessons. Although I remember some of these things from my schoolwork, it is from time to time challenging. Nevertheless, I feel like I'm "relearning" a lot and deepening my understanding of the mathematical basis of the tools we're trying to implement during this GSoC.

Blog post: 4th week of GSoC (June 17 - June 23)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 24 Jun 2019 21:36:56 +0000

This week I focused on setting up a couple of example scripts to demonstrate the functionality of the bootstrap procedure for the analysis of time-series neural data using MNE-Python.

Bootstrap refers to a widespread and acknowledged method for statistical inference. In essence, bootstrapping relies on the idea that using the data at hand can help draw better (i.e., more robust) conclusions about the effects we are trying to estimate.

Compared to more traditional methods, which often rely on (sometimes unjustified) assumptions about the distributional characteristics of the data, bootstrapping provides a method for accounting for biases specific to the nature of the observed data (e.g., skewness, clustering).

The main idea in bootstrapping relies on the drawing of random samples with replacement from the data set. Ideally, when we carry out an experiment, we would like to replicate its results in a series of further experiments and this way derive a certain measure of confidence for the effects observed. With bootstrap we can use the data at hand to simulate what the results of these experiments might be if the experiment was repeated over and over again with random samples.

During this week, my work focused on implementing a bootstrap procedure to derive confidence intervals for an average scalp-recorded brain signal over a certain period of time. In other words, the goal was to plot the average time series of brain activity along with the range of possible average values one could expect if the experiment was repeated a certain number of times (e.g. 2000 times). An example of the results is shown below.

In the image above, time is depicted on the X-axis and the strength of activation on the Y-axis. The average time series is depicted by the solid line. The 95 % confidence interval is depicted by the shaded ribbon surrounding the line. In other words, 95 % of the averages computed from the bootstrap samples where within this area.

After some initial difficulties, which I believe we’re mainly rooted on my initial misinterpretation of the bootstrapping results, I was able to set to make some progress on two example scrips for deriving bootstrap confidence intervals for the average evoked brain response in a particular condition (see here) and for the coefficients of a linear model estimator (see here). I look forward to continuing this work next week.

Check-in: 3rd week of GSoC (June 10 - June 16)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 17 Jun 2019 16:49:37 +0000

1. What did you do this week?

This week, I continued working on a script that looks at altenative methods for fitting a linear model to data from a single subject (i.e. "first level" analysis).

In particular, I focussed on using the linear_model module from SciKit Learn to replicate the functionality of the current linear_regression function provided by MNE-Python (see this PR on my project's sandbox repository on GitHub for further details).

2. What is coming up next?

During the upcoming week I'll be working with Stefan Appelhoff, another GSoC student working with MNE-Python this summer. We'll focus on restructuring the LIMO dataset to comply with the Brain Imaging Data Structure (BIDS) specification. BIDS refers to a set of standards used for organizing and sharing brain imaging study data within and between laboratories (see the preprint on PsyArXiv), or visit Stefan's Blog to learn more about BIDS for EEG data.
I'll also keep working on the "second-level" analysis (i.e., group-level inference) part of my GSoC Project.

3. Did you get stuck anywhere?

This week was a little bit slow, but I didn't feel like I got stuck with anything in particular.

I'm definitely looking forward to working with Stefan on the BIDS adaption of the LIMO dataset and making some progress on the second-level analysis part of my project.

Blog post: 2nd week of GSoC (June 3 - June 9)

alanis.jcg@gmail.com (JoseAlanis) — Mon, 10 Jun 2019 11:08:35 +0000

1. What did you do this week?

This week I focused on putting up a GitHub repository for my GSoC project and adding some example code and auxiliary functions that will help me test and validate additions to MNE-Python’s API.

Furthermore, I made improvements to the code featured in last week’s post and extended some of it’s features.

Here a quick summary of this week’s progress:

Set up MNE-stats repository on GitHub.
Add plot_design_matrix function for visualization of design matrices.
Add example code for the inspection of group-level effects in the LIMO dataset.
Propose changes to mne.stats.linear_regression function (e.g., fitting of the linear model).
Extend mne.datsets.limo module in MNE-Python to allow download of the complete dataset.

2. What is coming up next?

Next, I'll continue extending the mne.stats.linear_regression function, mainly focusing on group level inference.

In particular I will work on the following aspects:

Extend mne.stats.linear_regression function to allow for example:
- The fitting of a "first-level" or subject-level linear model over a series of time samples and recording sites for each individual in a group,
- Uses this first-level information to carry out inference o a "second-level" or group-level.
Visualize group-level effects and confidence of prediction.
Improve the "robustness" of this method, for instance, by enabling the algorithm to account for outliers in the dataset.

2. Did you get stuck anywhere?

Not really, this week was pretty straightforward and I feel like I'm getting a little bit of "flow" and progressing well with my project.

Although, going from first to second level analyses in linear regression might represent a sticking point in the project. In particular because this has rarely been implemented (at least in Python) for the analysis of entire magneto- and electro-encephalography datasets, which often include a wide variety of samples, recording sites and subjects.

Overview of project and summary of first week

alanis.jcg@gmail.com (JoseAlanis) — Tue, 04 Jun 2019 19:59:37 +0000

Hello everyone,

In this post, I would like to tell you a little bit more about my Google Summer of Code (GSoC) Project and give you a quick summary of the progress I’ve made so far.

About my project.

I’m a PhD student from Germany. In my PhD work I focus on the analysis of brain activity patterns and how these are influenced by individuals' personality and other situational factors. Thus, some of the ideas for my GSoC project are, at least to some extend, rooted in issues I've come across while analyzing data, looking for ways to describe the relationship between a set variables and patterns of brain activity.

In short, the core of my project consist in developing a set of tools and tutorials that extend the capabilities for regression analysis in MNE-Python, the premier toolbox for analyzing neural time series in Python. In statistics, linear regression is typically used for describing the relationship between predictors and response variables or targets. In particular, by determining the strength of the relationship between these variables, linear regression algorithms can help identify variables and/or subsets of data that contain relevant information about the things we would like to predict (e.g., in my case, patterns of brain activity).

To date, linear regression functionality in MNE-Python is capable of handling regression designs mostly characterized by the introduction of categorical predictors based on ordinary least squares estimation. Even though this approach can be used to inspect relationships between a wide variety of predictors and targets, the limited options for specifying more complex regression models, such as those based on robust and hierarchical estimation algorithms (see for instance here) are currently preventing users from making use of the functionality of MNE’s linear regression at a larger scale, and from using more elaborated regression tools commonly implemented in multiple scientific fields for which MNE is relevant.

The major goal of my GSoC Project is to provide a certain degree of flexibility and allow users to fit different types of models in accordance to their research questions. Feel free to the visit the wiki-page on GitHub, if you’d like to learn more about the project.

What I've done so far.

During the first week of GSoC, I've worked on integrating open data resources in MNE-Python and wrote a set of functions that allow for an easy handling of these resources (see previous post and this this PR on GitHub for further details). This kind of "open data sets", are fundamental to my project, since I plan to validate new implementations of the linear regression framework on them. Furthermore, I added some initial example code to explain linear regression functionality on the basis of this, newly integrated, data set (see here).

There were some issues along the way, specially when it came down to integrating my code in MNE's API. Thus, a big chunk of work from last week was related to fixing errors and making improvements to my code. However, I believe I learned a lot during the process and I'm looking forward to further consolidate my proposal of the API for statistical modeling in MNE-Python next week.

Stay tuned!

Weekly check-in: 1st week (May 27 - June 2)

alanis.jcg@gmail.com (JoseAlanis) — Sun, 02 Jun 2019 21:56:32 +0000

1. What did you do this week?

Most of this week's work was focused on finalizing a set of functions that provide support for the LIMO dataset inside MNE-Python. The LIMO dataset is an openly available collection of files that contain neural time series data (i.e., EEG data). One goal of my GSoC project is to use this kind of open data resources for testing and validating new tools for linear regression analysis.

This week's work included:

Setting up an OSF project to facilitate the download of individual files.
Writing a python function that accesses OSF's API to retrieve the files.
Writing a python function that brings the retrieved files into MNE-Python compatible data structures.
Setting up a tutorial to explain the functionality of the added functions.
- Here, I've included some first linear regression results to replicate analyses of the LIMO dataset, which have been documented elsewhere.
- In addition, I started formulating a set of functions that allow the visualization of design matrices.
Fixing bugs and making improvements to the code after discussion with mentors and community.

2. What is coming up next?

One important step for the next week is to create a new repository: MNE-Stats. The plan is to carry out most part of the developing work for statistical modeling tools in this repo. This way, we hope to improve flexibility during the project.

Furthermore, I aim to further consolidate my proposal of the API for statistical modeling in MNE. I will focus on the following issues during the next week:

Facilitate the building of the design matrix.
- Here, I will continue to work on these week’s code (see 4. above).
- Handling of predictor and target variables (e.g., scaling).
- Dealing with interaction terms and visualization of the effects.
- Regularization and robustness of prediction.

3. Did you get stuck anywhere?

This week brought some challenging tasks, but I didn't feel particularly stuck during the process of solving them. For instance, one milestone of this week was that I was able to merge my first “big” pull request on MNE-Python. Of course, some changes I made introduced a couple of minor bugs and errors. So, from time to time, I felt like most of my work was focused on fixing errors and bringing the code to actually run smoothly. It probably sounds worse than it actually is, but I think one challenge for the next few weeks will be keep focused on the overall goal and don’t get too frustrated by errors and other issues.

Hello World!

alanis.jcg@gmail.com (JoseAlanis) — Mon, 20 May 2019 09:49:24 +0000

Hello everyone, my name is José García Alanis from Germany. I will be participating in this year's GSoC with MNE Python, a sub-org of the Python Software Foundation and provider of the homonymous toolbox for the analysis of neural time-series data in Python. My project will focus on improving linear regression analysis in MNE-Python.

During the upcoming weeks I will be documenting my progress here, so stay tuned!