lakshmi97's Blog

Week 4 - June 19th, 2023

lakshmi97
Published: 06/22/2023

Diffusion research continues: Week 4

============================

 

What I did this week

~~~~~~~~~~~~~~~~~~~~

As discussed last week, I completed researching on StableDiffusion(SD). Currently we're looking for unconditional image reconstruction/denoising/generation using SD. I completed putting together keras implementation of unconditional SD. Since I couldn't find official implementation of unconditional SD code, I collated DDPM diffusion model codebase, VQ-VAE codebase separately. DDPM code uses Attention based U-Net for noise prediction. The basic code blocks of the U-Net are ResidualBlock & AttentionBlock. ResidualBlock is additionally conditioned on the diffusion timestep, DDPM implements this conditioning by adding diffusion timestep to the input image, whereas DDIM performs a concatenation. Downsampling & Upsampling in the U-Net are performed 4 times with decreasing & increasing widths respectively. Each downsampling layer consists of two ResidualBlocks, an optional AttentionBlock and a convolutional downsampling(stride=2) layer. At each upsampling layer, there's a concatenation from respective downsampling layer, three ResidualBlocks, an optional AttentionBlock, keras.layers.Upsampling2D and a Conv2D layers. The Middle layer consists of two ResidualBlocks with an AttentionBlock in between resulting in no change in the output size. The final output of Upsampling layer is followed by a GroupNormalization layer, Swish Activation layer and Conv2D layer to provide an output with desired dimensions.

Due to personal reasons, I took a couple of days off this week and will be continuing rest of the work next week.

 

What Is coming up next week

~~~~~~~~~~~~~~~~~~~~~~~~~~~

I will be running experiements on CIFAR10 on SD, NFBS on 3D VQ-VAE.

 

View Blog Post

Week 3 - June 12th, 2023

lakshmi97
Published: 06/22/2023

VQ-VAE results and study on Diffusion models : Week 3
=====================================================
What I did this week
~~~~~~~~~~~~~~~~~~~~
I continued my experiments with VQ-VAE on MNIST data to see the efficacy of Prior training in the generated outputs. The output of encoder for every input image delivers a categorical index of a latent vector for every pixel in the output. As discussed in the previous blog post, prior has been trained separately using PixelCNN (without any conditioning) in the latent space. If PixelCNN is a bunch of convolutions, then what makes it a generative model? This is an important question to ask and the answer to it is the sampling layer used on pixelCNN outputs during inference. The official code in Keras uses a tfp.layers.DistributionLambda(tfp.distributions.Categorical) layer as its sampling layer. Without this sampling layer PixelCNN outputs are deterministic and collapse to single output. Also similarly, sampling layer alone, i.e., without any PixelCNN trained prior, on the pre-determined outputs of encoder is deterministic. This is due to the fact that latent distances are correctly estimated by the pre-trained encoder and during inference categorical sampling layer would always sample the least distance latent, i.e., the one closest to the input. Therefore, the autoregressive nature of PixelCNN combined with a sampling layer for every pixel delivers an effective generative model. The outputs for all my experiments are shown in the image below -
<figure class="image" style="display: inline-block;">VQ-VAE results <figcaption>Caption</figcaption> </figure>

Based on qualitatively analysis, PixelCNN outputs may require some extra work. This leads me to the next step in my research - to explore Diffusion models. The first breakthrough paper on Diffusion models is by DDPM - Denoising Diffusion Probabilistic models. Inspired by previous work on nonequilibrium thermodynamics, they show that training of diffusion models while maximizing the posterior likelihood in an image generation task is mathematically equivalent to denoising score matching. In simple terms, there are two processes in diffusion modelling - forward & reverse. Forward process iteratively produces noisy images using noise schedulers. This can be reduced to one step noisy image through reparametrization technique. During training in the reverse process a U-Net is trained to estimate the noise in the final noisy image. During inference/sampling, noise is iteratively estimated and removed from a random noisy image to generate a new unseen image. The L2 loss used to estimate the noise during training is mathematically equivalent to maximizing the posterior likelihood i.e., maximizing the distribution of final denoised image. You can find more details in this paper. Stable Diffusion paper moves the needle by making diffusion model more accessible, scalable, trainable using a single Nvidia A100 GPU. Earlier diffusion models were difficult to train requiring 100s of training days, instability issues and restricted to image modality. Stable Diffusion achieved training stability with conditioning on multimodal data by working in latent space. A pre-trained image encoder such as that of VQ-VAE is used to downsample and extract imperceptible details of an input image. These latents are used to trained Diffusion model discussed above. Doing so separates the notion of perceptual compression and generative nature of the whole network. Later the denoised latents can be passed through a VQ-VAE trained decoder to reconstruct images in pixel space. This results in lesser complex model, faster training and high quality generative samples. 


What is coming up next week
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Setting up of Big Red 200 HPC account. Training of Diffusion model using MNIST latent from VQ-VAE in tensorflow without any conditioning.

View Blog Post

Week 2 blog - June 5th, 2023

lakshmi97
Published: 06/22/2023

Deep Dive into VQ-VAE : Week 2

What I did this week

~~~~~~~~~~~~~~~~~~~~

This week I took a deep dive into VQ-VAE code. Here's a little bit about VQ-VAE -

VQ-VAE is discretized VAE in latent space that helps in achieving high quality outputs. It varies from VAE by two points - use of discrete latent space, performing separate Prior training. VAE also showed impressive generative capabilities across data modalities - images, video, audio.

By using discrete latent space, VQ-VAE bypasses the 'posterior collapse' mode seen in traditional VAE. Posterior collapse is when latent space is not utilized properly and collapses to similar vectors independent of input, thereby resulting in not many variations when generating outputs.

Encoder, Decoder weights are trained along with L2 updates of embedding vectors. A categorical distribution is assumed of these latent embeddings and to truly capture the distribution of these vectors, these latents are further trained using PixelCNN model.

In the original paper, PixelCNN has shown to capture the distribution of data while also delivering rich detailing in generated output images. In the image space, PixelCNN decoder reconstructs a given input image with varying visual aspects such as colors, angles, lightning etc. This is achieved through autoregressive training with the help of masked convolutions. Auto regressive training coupled with categorical distribution sampling at the end of the pipeline facilitates PixelCNN to be an effective generative model.

A point to be noted here is that the prior of VQ-VAE is trained in latent space rather than image space through PixelCNN. So, it doesn't replace decoder as discussed in the original paper, rather trained independently to reconstruct the latent space. So, the first question that comes to my mind - How does latent reconstruction help in image generation? Is prior training required at all? What happens if not done?

My findings on MNIST data shows that trained prior works well only with a right sampling layer(tfp.layers.DistrubutionalLambda), that helps with uncertainty estimation. Therefore, PixelCNN autoregressive capabilities are as important as defining a distribution layer on top of them. Apart from this, I've also been researching and collating different MRI datasets to work on in the future.

 

What Is coming up next week

~~~~~~~~~~~~~~~~~~~~~~~~~~~

My work for next week includes checking insights on CIFAR dataset, brushing up on Diffusion Models.

 

Did I get stuck anywhere

~~~~~~~~~~~~~~~~~~~~~~~~

Working with VQ-VAE code required digging in a little bit before drawing conclusions on results obtained. I reached out to the author of the Keras implementation blog to verify a couple of things. And conducted couple more experiments than estimated and presented the same work at the weekly meeting.


 

View Blog Post

Week 1 blog - 29th May, 2023

lakshmi97
Published: 06/01/2023

Community Bonding period ended last week and my first blog is based on the work carried out in the last week. My meeting with GSOC mentors at the start of the week helped me chalk out an agenda for the week. As the first step, I familiarized myself with Tensorflow  operations, functions and distribution strategies. My previous experience with PyTorch as  well as `website tutorials` on basic Deep Learning models helped me quickly learn Tensorflow. As the next step, I read VQ-VAE paper &  understood the tensorflow open source implementation. VQ-VAE addresses 'posterior collapse'  seen in traditional VAEs and overcomes it by discretizing latent space. This in turn also  improved the generative capability by producing less blurrier images than before.  Familiarizing about VQ-VAE early on helps in understading the latents used in Diffusion models in later steps. I also explored a potential dataset - `IXI (T1 images) ` - and performed some exploratory data analysis, such as age & sex distribution. The images contain  entire skull information, it may require brain extraction & registration. It maybe more useful  to use existing preprocessed datasets & align them to a template. For next week, I'll be  conducting further literature survey on Diffusion models.

View Blog Post

Week 0 blog - 19th May, 2023

lakshmi97
Published: 06/01/2023

GSOC 2023 week 0 blog

 

While applying for the GSOC 2023 DIPY sub-project titled “Creating Synthetic MRI”, I knew this would be the right one for me for two reasons. Keep reading to know more!
 

As nervous and not-so-optimistic as I am about applying for academic competitions, I pushed myself to apply for GSOC out of a necessity for summer job more than anything. This got me out of my comfort zone and I ventured into open source development. During the time of application I was a Master’s student from NYU(current status - graduated) with focus on Deep Learning Applications in Healthcare. I was so involved in research in Computer Vision during school, I decided to pursue career in the same field going forward. Fortunately, I came across a college senior’s post on LinkedIn regarding getting accepted as a mentor for GSOC 2023 during that time. This prompted me to look further into GSOC and its list of projects for this year. I have only heard of GSOC during my undergrad, during which I never could muster courage to pursue something outside college. But this time around, I decided to put a confident front and take the leap.

 

As I searched through the list of available projects, I got iteratively definitive about what I wanted to work on - looked for python projects first, filtered out machine learning projects next, narrowed down to a couple of relevant projects. In the process, I came across the list of DIPY projects. Firstly, I was looking to further my research knowledge in ML by exploring Generative AI. Secondly, I have worked with MRI datasets in the context of Deep Learning previously, so ‘Creating Synthetic MRI’ project seemed the right fit. These reasons got me hooked to DIPY sub-organization. I thoroughly enjoyed exploring DIPY applications and began the process for the application preparation soon. With the wonderful help from the mentors, I successfully submitted an application, later got an interview call and voila, I got in!

 

I am very happy about participating in GSOC this year. What started out as a necessity has now become a passion project. I hope to enjoy the journey ahead, looking forward to learning and implementing few things along the way!

 

View Blog Post