Articles on lakshmi97's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on lakshmi97's BlogenWed, 23 Aug 2023 02:50:13 +0000Week 12 & Week 13 - August 21, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-12-week-13-august-21-2023/<p>Finalized experiments using both datasets: Week 12 &amp; Week13</p> <p>============================================================</p> <p> </p> <p>What I did this week</p> <p>~~~~~~~~~~~~~~~~~~~~</p> <p>Monai's VQVAE results on T1-weighted NFBS dataset, 125 samples, for batch_size=5 were qualitatively and quantitatively superior to all previous results. I continued the same experiments on the T1-weighted CC359(Calgary-Campinas-359) public dataset consisting of 359 anatomical MRI volumes of healthy individuals. Preprocessed the data using existing `transform_img` function -</p> <p>1. skull-strips the volume using the respective mask</p> <p>2. dipy's `resize` &amp; scipy's `affine_transform` scale the volume to (128,128,128,1) shape</p> <p>3. MinMax normalization to limit the range of intensities to (0,1)</p> <p>Using existing training parameters, carried out two experiments, one on CC359 alone &amp; another on both datasets combined. Additionally, I made a slight modification in the loss definition by attributing different weights of 0.5 &amp; 1 to background &amp; foreground pixels compared to equal weights from previous experiments. This resulted in faster convergence as shown in the red, blue &amp; purple lines in the combined plot below-</p> <p><img alt="Combined training plots for all experiments" src="https://github.com/lb-97/dipy/blob/blog_branch_week_12_13/doc/_static/vqvae3d-monai-training-plots.png"></p> <p> </p> <p>Inference results on the best performing model - B12-both - is as follows-</p> <p><img alt="VQVAE-Monai-B12-both reconstructions &amp; originals showing equally spaced 5 slices for 2 samples" src="https://github.com/lb-97/dipy/blob/blog_branch_week_12_13/doc/_static/vqvae-monai-B12-both.png"></p> <p> </p> <p>This shows that our training not only converged quickly but also improved visually. Here's a comparison of our current best performing model i.e., VQVAE-Monai-B12-both &amp; the previous one i.e., VQVAE-Monai-B5-NFBS. The test reconstruction loss is 0.0013 &amp; 0.0015 respectively.</p> <p><img alt="VQVAE reconstruction comparison for B12-both &amp; B5-NFBS" src="https://github.com/lb-97/dipy/blob/blog_branch_week_12_13/doc/_static/vqvae-reconstructions-comparison.png"></p> <p> </p> <p>I also carried Diffusion Model training for the bets performing B12-both model for 300 &amp; 500 diffusion steps and the training curve obtained is as follows-</p> <p><img alt="Diffusion Model training plots for 300 &amp; 500 diffusion steps" src="https://github.com/lb-97/dipy/blob/blog_branch_week_12_13/doc/_static/dm3d-monai-training-curves.png"></p> <p> </p> <p>These curves seemed to converge pretty quickly but the sampling outputs in the generation pipeline are still pure noise.</p> <p> </p> <p>What is coming up next week</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>Wrapping up documentation &amp; final report</p> <p> </p> <p>Did I get stuck anywhere</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>Yes, I carried out debugging to understand the generation pipeline of the Diffusion Model. Cross-checked implementations of posterior mean &amp; variance in the code base with respective formulas from the paper, as well as with MONAI's DDPM implementation. Didn't come across any error, yet the generated samples are erroneous.</p> <p><br> <br> <br> <br> <br>  </p>lakshmibayanagari@gmail.com (lakshmi97)Wed, 23 Aug 2023 02:50:13 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-12-week-13-august-21-2023/Week 10 & Week 11 - August 7, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-10-week-11-august-7-2023/<p>Carbonate issues, GPU availability, Tensorflow errors: Week 10 &amp; Week 11</p> <p>=========================================================</p> <p> </p> <p>What I did this week</p> <p>~~~~~~~~~~~~~~~~~~~~</p> <p>Recently, I've been an assigned RP(Research Project) account on University of Bloomington's HPC cluster - Carbonate. This account lets me access multiple GPUs for my experiments in a dedicated account.</p> <p>Once I started configuring my sbatch file accordingly, I started running into issues like GPU access. My debug print statements revealed that I'm accessing 1 CPU despite configuring the sbatch job for more than 1 GPUs. I double checked my dataloader definition, DistributionStrategy, train function. I read through IU's blogs as well as other resources online to see if I'm missing something.</p> <p>Nothing worked, my mentor eventually asked me to raise a IT request on Carbonate, the IT personnel couldn't help either. This could only mean that Tensorflow is picking upon assigned GPUs. So, on my mentor's suggestion, I loaded an older version of the deeplearning module 2.9.1(used 2.11.1 earlier). This worked!</p> <p>This also meant using a downgraded version of tensorflow(2.9). This meant I ran into errors again, time taking yet resolvable. I made some architectural changes - replaced GroupNorm with BatchNorm layers, tensor_slices based DataLoader to DataGenerator - to accommodate for the older tensorflow version. Additionally, I also had to change the model structure from a list of layers to ``tensorflow.keras.Sequential`` set of layers with input_shape information defined in the first layer. Without this last change, I ran into ``None`` object errors.</p> <p>Once all my new code was in place, the week ended, hahahah. And also GPU's were in scarcity in the same week. I'm glad I got some work done though.</p> <p> </p> <p>What Is coming up next week</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>Run more experiments!</p> <p> </p> <p>Did I get stuck anywhere</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>All I did was get stuck again &amp; again :P</p> <p>But all is well now.</p> <p> </p>lakshmibayanagari@gmail.com (lakshmi97)Wed, 23 Aug 2023 02:42:55 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-10-week-11-august-7-2023/Week 8 & Week 9 - July, 24, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-8-week-9-july-24-2023/<p>VQVAE MONAI models &amp; checkerboard artifacts: Week 8 &amp; Week 9<br> ============================================================</p> <p><br> What I did this week<br> ~~~~~~~~~~~~~~~~~~~~</p> <p>We observed in our previous results that the Diffusion Model's performance may depend on better and effective latents from VQVAE. After playing around with convolutional &amp; residual components in the existing architecture that yielded unsatisfactory results, we decided to move to a more proven model on 3D MRIs. It is not necessary that a model that worked well on MNIST dataset would also deliver similarly on 3D MRI datasets, owing to the differences in complexity of the data distributions. Changing the convolutions to 3D filters alone clearly did not do the job.</p> <p><br> MONAI is an open source organization for Machine Learning in Medical Imaging, it has repositories and tutorials for various high performing networks tested on multiple Medical Image datasets. We adopted the deep learning architecture for VQVAE from MONAI's PyTorch implementation that was trained &amp; tested on BRATS(400 data elements). The predominant difference is that the encoder &amp; the decoder of VQVAE use Residual units differently than our existing setup. These Residual units are alternated between downsampling/upsampling convolutions in the encoder/decoder. Additionally, MONAI's VectorQuantizer uses non-trainable embeddings with statistical updates(Laplace Smoothing) on them at every iteration.</p> <p><br> I implemented MONAI's VQVAE architecture in Tensorflow from scratch, excluding the VectorQuantizer. This architecture has 46.5M trainable parameters. The training objective is to minimize the sum of reconstruction &amp; quantization loss - same training paradigm as our previous experiments. In addition, to address the checkerboard artifacts, I referred to the Sub-Pixel Convolution paper&lt;https: 1707.02937="" abs="" arxiv.org=""&gt;. This paper proposes two methods to overcome the deconvolution overlap, a phenomenon that causes checkerboarded outputs in deconvolution/upsampling layers. These two methods are - Sub Pixel Convolution &amp; NN Resize Convolution. For an upsampling rate :math:`r`, Sub Pixel Convolution outputs :math:`3r^2` output channels &amp; later reshuffles channel dimension along spatial dimensions (upsamples them by :math:`r` across each) resulting in 3(desired) output channels. Whereas NN Resize performs interpolation on the kernel to upsample its size by :math:`r` before carrying out convolution that outputs 3 channels. The former method relies on shuffling &amp; the later method relies on nearest neighbor interpolation to obtain an upsampled output respectively. Both methods have shown to perform better qualitatively in dealing with the checkerboards, on random initialization. The authors also go ahead and prove mathematically that with an efficient initialization, both methods prove to be equivalent. They call it the ICNR initialization - Initialization of Convolution with NN Resize.&lt;/https:&gt;</p> <p><br> I ran multiple experiments with batch_size=5,10,10(with ICNR). The training loss curves obtained are as follows, all of them trained on 1 GPU for 24hrs. We see that all of them converge except the last one(B=10 with ICNR).</p> <p> </p> <p><img alt="VQVAE3D Monai training curve" src="https://github.com/lb-97/dipy/blob/blog_branch_week_8_9/doc/posts/2023/assets/vqvae3d-monai-training.png"><br> <br> The best training checkpoint has been used to reconstruct test images. Following images depict 2 such reconstructions in 2 rows, where 5 slices from each of these reconstructions have been displayed in columns.</p> <p><br> The first one is for B=10, the best training checkpoint had training loss=0.0037. Compared to our previous VQVAE model, we see a better performance in capturing the brain outer structure. Moreover, we don't see white blobs or artifacts as inner matter, rather some curvatures contributing to the inner microstructure of a human brain.</p> <p><img alt="VQVAE3D Monai, B=10" src="https://github.com/lb-97/dipy/blob/blog_branch_week_8_9/doc/posts/2023/assets/vqvae3d-monai-B10.png"></p> <p><br> The second one is for B=10 with ICNR kernel initialization, the best training checkpoint had training loss=0.0067. Although, the test results do not look complete. I implemented ICNR through DIPY's resize function to achieve NN resize equivalent output on the kernel filters. This initialization didn't work as it was intended to, further proving that the training is yet to be converged.</p> <p><img alt="VQVAE3D Monai, B=10, ICNR initialization" src="https://github.com/lb-97/dipy/blob/blog_branch_week_8_9/doc/posts/2023/assets/vqvae3d-monai-B10-ICNR.png"></p> <p><br> The next &amp; last image is for B=5, the best training checkpoint had training loss = 0.0031. By far the best one quantitatively as well as visually. The test loss for the below reconstructions is 0.0013. The superior performance of this batch size can be owed to the Batch Normalization(BN) layers in the architecture that calculate mean &amp; average of the batch to perform normalization over all batch elements using these statistics. Having lesser batch size may contribute to least variation in the output of the layer &amp; helps in achieving converging outputs faster. This explanation stems from the concept of Contrastive Learning, where BN layers are used as the source of implicit negative loss learners. Higher the batch size, more implicit negative samples to move away from. Whereas our objective is to minimize the reconstruction loss, having lesser batch size consequently may help in lesser variation.</p> <p> </p> <p><img alt="VQVAE3D, B=5" src="https://github.com/lb-97/dipy/blob/blog_branch_week_8_9/doc/posts/2023/assets/vqvae3d-monai-B5.png"></p> <p><br> What is coming up next week<br> ~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>As the next step, I can focus on training the LDM(Latent Diffusion Model) from the best performing model from the above experiments.</p> <p><br> Did I get stuck anywhere<br> ~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>In both weeks, I had issues accessing resources &amp; specifically multiple GPUs.</p>lakshmibayanagari@gmail.com (lakshmi97)Fri, 28 Jul 2023 20:55:25 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-8-week-9-july-24-2023/Week 6 & Week 7 - July 10, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-6-week-7-july-10-2023/<p>Diffusion Model results on pre-trained VQVAE latents of NFBS MRI Dataset: Week 6 &amp; Week 7<br> ========================================================================</p> <p><br> What I did this week<br> ~~~~~~~~~~~~~~~~~~~~</p> <p><br> My current code for VQVAE &amp; DM is well tested on MNIST dataset as shown in the previous blog posts. I extended the current codebase for MRI dataset by using 3D convolutions instead of 2D ones, which resulted in 600k parameters for VQVAE for a downsampling factor f=3. I used a preprocess function to transform MRI volumes to the desired shape (128,128,128,1) through DIPY's reslice and scipy's affine_transform functions, followed by MinMax normalization. I trained the VQVAE architecture for batch_size=10, Adam optimizer's lr=2e-4, 100 epochs. I followed suit for downsampling factor f=2 as well and got the following training curves-</p> <p> </p> <p><img alt="VQVAE3D Training Curves" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/vqvae3d-training-curves.png"><br> <br> The reconstructed brain volumes on the test dataset on the best performing model are as shown below. As seen in the first image, there are black artifacts in the captured blurry brain structure. Whereas the second image(f=2) does a better job in producing less blurrier brain structure. Nonetheless we only see the outline of the brain being captured with no micro-structural information inside them.</p> <p> </p> <p><img alt="VQVAE3D, f=3, reconstructions" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/vqvae3d-reconst-f3.png"><br>  </p> <p><img alt="VQVAE3D, f=2, reconstructions" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/vqvae3d-reconst-f2.png"><br> <br> Later, the 3D Diffusion Model was trained for approximately 200 epochs for 200 &amp; 300 diffusion time steps in two different experiments respectively. The training curves and obtained generations are shown respectively. Both the generations are noisy and don't really have a convincing outlook.</p> <p> </p> <p><img alt="DM3D Training curves" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/dm3d-training-curves.png"><br> .. image:: ./assets/dm3d-training-curves.png<br>   :width: 400</p> <p> </p> <p><img alt="DM3D reconstructions for 200 &amp; 300 diffusion steps" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/dm3d-reconst-D200-D300.png"><br> <br> Given the achieved noisy generations, I decided to train VQVAE for a higher number of epochs. This may also indicate that the performance of DM is hitched on good latent representations i.e., a trained encoder capable of perfect reconstructions. So I trained f=3 VQVAE for a higher number of epochs as shown below.</p> <p> </p> <p><img alt="VQVAE3D, f=3 further training" src="https://github.com/lb-97/dipy/blob/blog_branch_week_6_7/doc/posts/2023/assets/vqvae-f3-higher-epochs.png"><br> <br> The reconstructions obtained on best VQVAE seemed to have produced a better volumetric brain structure. Although, a common theme between all reconstructions is that we see a pixelated output for the last few slices with a checkerboard sort of artifacts. Anyhow, I ran a couple more experiments with a more complex VQVAE model that has residual blocks to carry forward information. None of the reconstructions nor the DM generations have made any progress qualitatively.</p> <p><br> What Is coming up next week<br> ~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p><br> One idea can be working to improve VQVAE's effectiveness by playing around with architecture components and hyper-parameter tuning. Alongside I can also work on looking into checkerboard artifacts seen in the reconstructions.</p>lakshmibayanagari@gmail.com (lakshmi97)Fri, 28 Jul 2023 20:51:43 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-6-week-7-july-10-2023/Week 5 - June 26, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-5-june-26-2023/<pre>Carbonate Account Setup, Experiment, Debug and Repeat: Week 5 ============================================================= What I did this week ~~~~~~~~~~~~~~~~~~~~ I finally got my hands on IU's HPC - Carbonate &amp; Big Red 200. I quickly set up a virtual remote connection to Carbonate's Slate on VS Code with Jong's help. Later, I started looking up on Interactive jobs on Carbonate to have GPUs on the go for coding and testing. I spent a ton of time reading up on Carbonate's Interactive SLURM jobs information. Using X11 forwarding, I was able to spin up an interactive job inside the login node using command prompt. It popped up a Firefox browser window from the login node ending up slow and not very user friendly. Same goes for the Big Red 200 as well. Eventually my efforts were in vain and I resorted to installing a jupyter notebook server on my home directory. Although I can't request a GPU with this notebook, it allows me to debug syntax errors, output visualization, plotting loss values etc. Continuing on my MNIST experiments, I ran into Multi Distribution issues while training the unconditional Diffusion Model(DM). Without getting into too many details I can summarize that having a custom train_step function in tensorflow, without any default loss reduction such as *tf.reduce_mean* or *tf.keras.losses.Reduction.SUM*, requires more work than *model.fit()*. So, my current loss function used for training DM is reduced on the last channel while the rest of the shape of each batch is kept intact. When using distributed training, tensorflow requires the user to take care of gradient accumulation if it's an unreduced loss. So, I tried to learn from Tensorflow tutorials. Alas, all their multi distributed strategy examples were based on functional API models whereas my approach is based on object oriented implementation. This led to design issues. For the sake of time management, I did a little bit of tweaking. While compiling the model under *tf.distribute.MirroredStrategy*, I passed *tf.keras.losses.Reduction.SUM* parameter to the loss function and divided the loss by a pre-decided factor which is *np.prod(out.shape[:-1])* i.e., number of elements in the output shape excluding the last channel which is reduced in the loss function. This tweak worked and also does not have any unexpected impacts on the architecture as well as the training paradigm. I followed the architecture described in my previous blog for the DM. I trained this on VQ-VAE latents of MNIST dataset for 200 diffusion steps, 2 Nvidia V100 GPUs, Adam Optimizer with 2e-4 learning rate, 200 batch size per GPU for 100+ epochs. For the generative process, I denoised random samples for 50, 100 and 200 steps on the best performing model(112 epochs). Here are the results I achieved - <img alt="DM-MNIST-112Epoch" src="https://github.com/lb-97/dipy/blob/blog_branch_week5/doc/posts/2023/assets/DM-MNIST-112epoch.png"> We see some resemblance of digit shapes in the generated outputs. On further training for 300 diffusion timesteps for the best performing model( 108 epochs) with least training loss, the visuals have improved drastically - <img alt="DM-MNIST-DDIM300-108epoch" src="https://github.com/lb-97/dipy/blob/blog_branch_week5/doc/posts/2023/assets/DM-MNIST-DDIM300-108epoch.png"> These outputs show the effectiveness of the model architecture, training parameters and the codebase. What Is coming up next week ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work on T1 weighted MRI datasets on modified 3D conv code. Hyperparameter tuning for the best results. If time permits, work on the FID evaluation metric. Did I get stuck anywhere ~~~~~~~~~~~~~~~~~~~~~~~~ Most of the work conducted this week included setting up the environment, debugging, researching documentation. For the rest of the little time, I ran experiments. Having the code ready, both VQ-VAE and DM, before I got hold of GPUs, helped me save a lot of time. This week's work imparted a great learning experience for me. </pre> <p> </p>lakshmibayanagari@gmail.com (lakshmi97)Fri, 28 Jul 2023 20:35:42 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-5-june-26-2023/Week 4 - June 19th, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-4-june-19th-2023/<p>Diffusion research continues: Week 4</p> <p>============================</p> <p> </p> <p>What I did this week</p> <p>~~~~~~~~~~~~~~~~~~~~</p> <p>As discussed last week, I completed researching on StableDiffusion(SD). Currently we're looking for unconditional image reconstruction/denoising/generation using SD. I completed putting together keras implementation of unconditional SD. Since I couldn't find official implementation of unconditional SD code, I collated DDPM diffusion model codebase, VQ-VAE codebase separately. DDPM code uses Attention based U-Net for noise prediction. The basic code blocks of the U-Net are ResidualBlock &amp; AttentionBlock. ResidualBlock is additionally conditioned on the diffusion timestep, DDPM implements this conditioning by adding diffusion timestep to the input image, whereas DDIM performs a concatenation. Downsampling &amp; Upsampling in the U-Net are performed 4 times with decreasing &amp; increasing widths respectively. Each downsampling layer consists of two ResidualBlocks, an optional AttentionBlock and a convolutional downsampling(stride=2) layer. At each upsampling layer, there's a concatenation from respective downsampling layer, three ResidualBlocks, an optional AttentionBlock, keras.layers.Upsampling2D and a Conv2D layers. The Middle layer consists of two ResidualBlocks with an AttentionBlock in between resulting in no change in the output size. The final output of Upsampling layer is followed by a GroupNormalization layer, Swish Activation layer and Conv2D layer to provide an output with desired dimensions.</p> <p>Due to personal reasons, I took a couple of days off this week and will be continuing rest of the work next week.</p> <p> </p> <p>What Is coming up next week</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>I will be running experiements on CIFAR10 on SD, NFBS on 3D VQ-VAE.</p> <p> </p>lakshmibayanagari@gmail.com (lakshmi97)Thu, 22 Jun 2023 15:12:14 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-4-june-19th-2023/Week 3 - June 12th, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-3-june-12th-2023/<pre>VQ-VAE results and study on Diffusion models : Week 3 ===================================================== </pre> <pre>What I did this week ~~~~~~~~~~~~~~~~~~~~ I continued my experiments with VQ-VAE on MNIST data to see the efficacy of Prior training in the generated outputs. The output of encoder for every input image delivers a categorical index of a latent vector for every pixel in the output. As discussed in the previous blog post, prior has been trained separately using PixelCNN (without any conditioning) in the latent space. If PixelCNN is a bunch of convolutions, then what makes it a generative model? This is an important question to ask and the answer to it is the sampling layer used on pixelCNN outputs during inference. The official code in Keras uses a tfp.layers.DistributionLambda(tfp.distributions.Categorical) layer as its sampling layer. Without this sampling layer PixelCNN outputs are deterministic and collapse to single output. Also similarly, sampling layer alone, i.e., without any PixelCNN trained prior, on the pre-determined outputs of encoder is deterministic. This is due to the fact that latent distances are correctly estimated by the pre-trained encoder and during inference categorical sampling layer would always sample the least distance latent, i.e., the one closest to the input. Therefore, the autoregressive nature of PixelCNN combined with a sampling layer for every pixel delivers an effective generative model. The outputs for all my experiments are shown in the image below -</pre> <div style="text-align: center;"> &lt;figure class="image" style="display: inline-block;"&gt;<img alt="VQ-VAE results" height="103" src="https://github.com/lb-97/dipy/blob/blog_branch/doc/posts/2023/assets/vq-vae-results.png" width="487"> &lt;figcaption&gt;Caption&lt;/figcaption&gt; &lt;/figure&gt; </div> <p><img alt="" src="https://github.com/lb-97/dipy/blob/blog_branch/doc/posts/2023/assets/vq-vae-results.png"></p> <pre>Based on qualitatively analysis, PixelCNN outputs may require some extra work. This leads me to the next step in my research - to explore Diffusion models. The first breakthrough paper on Diffusion models is by DDPM - Denoising Diffusion Probabilistic models. Inspired by previous work on nonequilibrium thermodynamics, they show that training of diffusion models while maximizing the posterior likelihood in an image generation task is mathematically equivalent to denoising score matching. In simple terms, there are two processes in diffusion modelling - forward &amp; reverse. Forward process iteratively produces noisy images using noise schedulers. This can be reduced to one step noisy image through reparametrization technique. During training in the reverse process a U-Net is trained to estimate the noise in the final noisy image. During inference/sampling, noise is iteratively estimated and removed from a random noisy image to generate a new unseen image. The L2 loss used to estimate the noise during training is mathematically equivalent to maximizing the posterior likelihood i.e., maximizing the distribution of final denoised image. You can find more details in <a href="https://arxiv.org/pdf/2006.11239.pdf">this</a> paper. <a href="https://arxiv.org/pdf/2112.10752.pdf">Stable Diffusion</a> paper moves the needle by making diffusion model more accessible, scalable, trainable using a single Nvidia A100 GPU. Earlier diffusion models were difficult to train requiring 100s of training days, instability issues and restricted to image modality. Stable Diffusion achieved training stability with conditioning on multimodal data by working in latent space. A pre-trained image encoder such as that of VQ-VAE is used to downsample and extract imperceptible details of an input image. These latents are used to trained Diffusion model discussed above. Doing so separates the notion of perceptual compression and generative nature of the whole network. Later the denoised latents can be passed through a VQ-VAE trained decoder to reconstruct images in pixel space. This results in lesser complex model, faster training and high quality generative samples. What is coming up next week ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Setting up of Big Red 200 HPC account. Training of Diffusion model using MNIST latent from VQ-VAE in tensorflow without any conditioning.</pre> <p><img alt="" src="https://github.com/lb-97/dipy/blob/blog_branch/doc/posts/2023/assets/vq-vae-results.png"></p>lakshmibayanagari@gmail.com (lakshmi97)Thu, 22 Jun 2023 15:05:22 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-3-june-12th-2023/Week 2 blog - June 5th, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-2-blog-june-5th-2023/<p>Deep Dive into VQ-VAE : Week 2</p> <p>What I did this week</p> <p>~~~~~~~~~~~~~~~~~~~~</p> <p>This week I took a deep dive into VQ-VAE code. Here's a little bit about VQ-VAE -</p> <p>VQ-VAE is discretized VAE in latent space that helps in achieving high quality outputs. It varies from VAE by two points - use of discrete latent space, performing separate Prior training. VAE also showed impressive generative capabilities across data modalities - images, video, audio.</p> <p>By using discrete latent space, VQ-VAE bypasses the 'posterior collapse' mode seen in traditional VAE. Posterior collapse is when latent space is not utilized properly and collapses to similar vectors independent of input, thereby resulting in not many variations when generating outputs.</p> <p>Encoder, Decoder weights are trained along with L2 updates of embedding vectors. A categorical distribution is assumed of these latent embeddings and to truly capture the distribution of these vectors, these latents are further trained using PixelCNN model.</p> <p>In the original paper, PixelCNN has shown to capture the distribution of data while also delivering rich detailing in generated output images. In the image space, PixelCNN decoder reconstructs a given input image with varying visual aspects such as colors, angles, lightning etc. This is achieved through autoregressive training with the help of masked convolutions. Auto regressive training coupled with categorical distribution sampling at the end of the pipeline facilitates PixelCNN to be an effective generative model.</p> <p>A point to be noted here is that the prior of VQ-VAE is trained in latent space rather than image space through PixelCNN. So, it doesn't replace decoder as discussed in the original paper, rather trained independently to reconstruct the latent space. So, the first question that comes to my mind - How does latent reconstruction help in image generation? Is prior training required at all? What happens if not done?</p> <p>My findings on MNIST data shows that trained prior works well only with a right sampling layer(tfp.layers.DistrubutionalLambda), that helps with uncertainty estimation. Therefore, PixelCNN autoregressive capabilities are as important as defining a distribution layer on top of them. Apart from this, I've also been researching and collating different MRI datasets to work on in the future.</p> <p> </p> <p>What Is coming up next week</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>My work for next week includes checking insights on CIFAR dataset, brushing up on Diffusion Models.</p> <p> </p> <p>Did I get stuck anywhere</p> <p>~~~~~~~~~~~~~~~~~~~~~~~~</p> <p>Working with VQ-VAE code required digging in a little bit before drawing conclusions on results obtained. I reached out to the author of the Keras implementation blog to verify a couple of things. And conducted couple more experiments than estimated and presented the same work at the weekly meeting.</p> <p><br>  </p>lakshmibayanagari@gmail.com (lakshmi97)Thu, 22 Jun 2023 15:01:29 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-2-blog-june-5th-2023/Week 1 blog - 29th May, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-1-blog-29th-may-2023/<p>Community Bonding period ended last week and my first blog is based on the work carried out in the last week. My meeting with GSOC mentors at the start of the week helped me chalk out an agenda for the week. As the first step, I familiarized myself with Tensorflow  operations, functions and distribution strategies. My previous experience with PyTorch as  well as `<a href="https://www.tensorflow.org/tutorials/images/cnn">website tutorials</a>` on basic Deep Learning models helped me quickly learn Tensorflow. As the next step, I read VQ-VAE paper &amp;  understood the tensorflow open source implementation. VQ-VAE addresses 'posterior collapse'  seen in traditional VAEs and overcomes it by discretizing latent space. This in turn also  improved the generative capability by producing less blurrier images than before.  Familiarizing about VQ-VAE early on helps in understading the latents used in Diffusion models in later steps. I also explored a potential dataset - `<a href="https://brain-development.org/ixi-dataset/">IXI (T1 images) </a>` - and performed some exploratory data analysis, such as age &amp; sex distribution. The images contain  entire skull information, it may require brain extraction &amp; registration. It maybe more useful  to use existing preprocessed datasets &amp; align them to a template. For next week, I'll be  conducting further literature survey on Diffusion models.</p>lakshmibayanagari@gmail.com (lakshmi97)Thu, 01 Jun 2023 02:01:50 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-1-blog-29th-may-2023/Week 0 blog - 19th May, 2023https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-0-blog-19th-may-2023/<p>GSOC 2023 week 0 blog</p> <p> </p> <p>While applying for the GSOC 2023 DIPY sub-project titled “Creating Synthetic MRI”, I knew this would be the right one for me for two reasons. Keep reading to know more!<br>  </p> <p>As nervous and not-so-optimistic as I am about applying for academic competitions, I pushed myself to apply for GSOC out of a necessity for summer job more than anything. This got me out of my comfort zone and I ventured into open source development. During the time of application I was a Master’s student from NYU(current status - graduated) with focus on Deep Learning Applications in Healthcare. I was so involved in research in Computer Vision during school, I decided to pursue career in the same field going forward. Fortunately, I came across a college senior’s post on LinkedIn regarding getting accepted as a mentor for GSOC 2023 during that time. This prompted me to look further into GSOC and its list of projects for this year. I have only heard of GSOC during my undergrad, during which I never could muster courage to pursue something outside college. But this time around, I decided to put a confident front and take the leap.</p> <p> </p> <p>As I searched through the list of available projects, I got iteratively definitive about what I wanted to work on - looked for python projects first, filtered out machine learning projects next, narrowed down to a couple of relevant projects. In the process, I came across the list of DIPY projects. Firstly, I was looking to further my research knowledge in ML by exploring Generative AI. Secondly, I have worked with MRI datasets in the context of Deep Learning previously, so ‘Creating Synthetic MRI’ project seemed the right fit. These reasons got me hooked to DIPY sub-organization. I thoroughly enjoyed exploring DIPY applications and began the process for the application preparation soon. With the wonderful help from the mentors, I successfully submitted an application, later got an interview call and voila, I got in!</p> <p> </p> <p>I am very happy about participating in GSOC this year. What started out as a necessity has now become a passion project. I hope to enjoy the journey ahead, looking forward to learning and implementing few things along the way!</p> <p> </p>lakshmibayanagari@gmail.com (lakshmi97)Thu, 01 Jun 2023 01:55:46 +0000https://blogs.python-gsoc.org/en/lakshmi97s-blog/week-0-blog-19th-may-2023/