Articles on epassaro's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on epassaro's BlogenFri, 23 Aug 2019 14:49:04 +0000Check in: Finalhttps://blogs.python-gsoc.org/en/epassaros-blog/check-in-final/<p><strong>1. What did you do this week?</strong></p> <ul> <li>Polished my <a href="https://epassaro.github.io/gsoc19">final evaluation</a></li> <li>Started working in how to add Chianti levels and lines to the atomic files.</li> <li><strong>GSoC'19 </strong>has ended!</li> </ul> <p> </p> <p><strong>2. What is coming up next?</strong></p> <p>I'll keep working with the TARDIS team and contributing to their codebase :)<br>  </p> <p><strong>3. Did you get stuck anywhere?</strong></p> <p>No, I didn't.</p>epassaro15@gmail.com (epassaro)Fri, 23 Aug 2019 14:49:04 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-final/Blog post: Week 12https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-12/<p>Hi everyone!</p> <p>End of summer<strong> </strong>is close, and we are doing some interesting stuff here :)</p> <p>This week we mainly focused on doing some integrity tests to our new atomic files, and finally running some TARDIS simulations with them! My objective was to ensure we're getting an identical atomic files with my new module.</p> <p style="text-align: center;"><img alt="" src="https://i.imgur.com/VPoW2Ih.jpg"></p> <p>Next week I'll run more simulations and polish the <a href="https://carsus.readthedocs.org">documentation</a>. Also, I started to write my <a href="https://epassaro.github.io/gsoc19">final evaluation</a><em> </em>for <strong>GSoC'19</strong>.</p> <p>Last months have been really fun and exciting!</p>epassaro15@gmail.com (epassaro)Sun, 18 Aug 2019 16:37:25 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-12/Check in: Week 11https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-11/<p><strong>1. What did you do this week?</strong></p> <p>Finally the new parser for <em>Kurucz line list</em> is ready! (see <a href="https://github.com/tardis-sn/carsus/pull/146">PR #146</a>). Made several tests to check if the new output matches the old one. Also started working in an up to date notebook for the section <em>"Creating the TARDIS example database"  </em>in the Carsus documentation.</p> <p> </p> <p><strong>2. What is coming up next?</strong></p> <p>I have to work on how to add Chianti levels and lines and then run some simulations.</p> <p> </p> <p><strong>3. Did you get stuck anywhere?</strong></p> <p>No, everything worked out :)</p>epassaro15@gmail.com (epassaro)Fri, 09 Aug 2019 17:52:48 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-11/Blog post: Week 10https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-10/<p>Hi everyone!</p> <p>We made huge progress with the new Kurucz parser (it's almost ready!). As I said last week, this is a difficult task and involves reestructuring different pieces of existing code and some knowledge the atomic structure and physics.</p> <p>In <a href="https://nbviewer.jupyter.org/gist/epassaro/1d1a26f1d9d643e6c94005c1e7774eb0?flush_cache=true">this notebook</a> you can see a demonstration on how the new GFALL class returns the same <em>DataFrames</em> than the old API. Next step consist in adding <em>metastable </em>flags for lines and levels.</p> <p>Once this work is completed we should discuss how to move on. Probably we want to automatize the process of making new atomic files: every time a source is updated (for example NIST atomic weights) trigger a new build.</p>epassaro15@gmail.com (epassaro)Sun, 04 Aug 2019 17:28:40 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-10/Check in: Week 9https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-9/<p><strong>1. What did you do this week?</strong></p> <p>This week I worked on the new Kurucz line list parser. Also, I finished the CMFGEN pipeline (merged PR <a href="https://github.com/tardis-sn/carsus/pull/143">#143</a>).</p> <p>NIST (PR <a href="https://github.com/tardis-sn/carsus/pull/144">#144</a>) and Knox Long's recombination zeta (PR <a href="https://github.com/tardis-sn/carsus/pull/145">#145</a>) are almost finished too.</p> <p> </p> <p><strong>2. What is coming up next?</strong></p> <p>I will continue working on the Kurucz parser.</p> <p> </p> <p><strong>3. Did you get stuck anywhere?</strong></p> <p>Kurucz parser is a really difficult one! It's taking me longer than I expected.</p>epassaro15@gmail.com (epassaro)Sat, 27 Jul 2019 21:55:46 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-9/Blog post: Week 8https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-8/<p>This week I focused in <em>two</em> things:</p> <ul> <li>Start writing new classes for previously existing atomic sources to bypass the SQL database and store data directly in HDF5 format. This is a much simpler approach and will speed things up for TARDIS developers when they need to build new atomic files. For an example, see <a href="https://github.com/tardis-sn/carsus/pull/144">PR #144</a>.</li> <li>We started the process to set up a pipeline to download, extract and convert the entire CMFGEN database to HDF5. See <a href="https://github.com/tardis-sn/carsus/pull/143">PR #143</a>.</li> </ul> <p>Also in the process I learned <em>a couple</em> of things:</p> <ul> <li>How to use the module <em>logger </em>from the Python standard library (and why it's a good idea to use it).</li> <li>Why you should never use a bare `except` statement. Yep, learned this in the worst possible way.</li> <li>I'm getting good at writing <em>regular expresions.</em></li> </ul> <p>I will continue working at these two items the next week.</p> <p> </p> <p>The second part of GSoC is almost ending and we already have some good results, but lot of work is ahead!</p>epassaro15@gmail.com (epassaro)Fri, 19 Jul 2019 19:11:04 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-8/Check in: Week 7https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-7/<p><strong>1. What did you do this week?</strong></p> <p>This week I succesfully recreated the standard <em>TARDIS</em> atomic file and run some simulations! :)</p> <p>The transition to Python 3 is complete and the <em>Carsus</em> package finally is fully operational.</p> <p>Also I've updated some documentation which can be accessed from here: <a href="https://tardis-sn.github.io/carsus/notebooks/quickstart.html">https://tardis-sn.github.io/carsus/notebooks/quickstart.html</a></p> <p> </p> <p><strong>2. What is coming up next?</strong></p> <p>Now we're going to add methods to bypass the SQL database and store directly in HDF5. This would give future <em>Carsus</em> users/developers a simpler workflow.</p> <p><br> <strong>3. Did you get stuck anywhere?</strong></p> <p>Not really. Spotted a couple of bugs in <em>Pandas</em> and <em>SQLAlchemy </em>which gave me headaches, but everything worked out with a lot of effort and my mentors support. I'm going to open tickets in GitHub for these issues!</p>epassaro15@gmail.com (epassaro)Fri, 12 Jul 2019 17:40:50 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-7/Blog post: Week 6https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-6/<p>This week I worked on how to make a TARDIS atomic file. This is an intermediate step necessary towards the work scheduled for the next weeks.</p> <p><em>Carsus</em> is the subpackage in charge to <em>parse and </em><em>ingest </em>atomic data from different sources into a SQL database. Once the data is ingested we can dump this data into the HDF5 file requested by TARDIS to run the simulations.</p> <p>The <em>Carsus </em>data model includes classes like: <em>Atom, Ion, Level, </em>and more. It was a bit hard to understand for me at the beggining, but it worked out.</p> <p>By now, we can ingest atomic data from three different sources: <i>National Institute of Standards and Technology </i>(NIST), the <em>Kurucz line list</em> (GFALL<i>), </i>and the <i>Chianti </i>atomic database. Our main goal for the next weeks is to write code to ingest data obtained from the CMFGEN parsers (the code we wrote for the first half of GSoC) in the SQL database.</p> <p>I found an annoying bug which made impossible to ingest data from GFALL without adding NIST data in first place. Debugging this error was very time consuming and we have not found a solution yet.</p>epassaro15@gmail.com (epassaro)Sun, 07 Jul 2019 23:31:28 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-6/Check in: Week 5https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-5/<p><strong>1. What did you do this week?</strong></p> <p>I started writing <em>docstrings </em>and <em>unit tests </em>for the new classes and functions I've created.</p> <p><br> <strong>2. What is coming up next?</strong></p> <p>I succesfully set up the Travis continous integration pipeline in the first weeks of GSoC, so I'm going to work a bit more on the output methods.</p> <p><br> <strong>3. Did you get stuck anywhere?</strong></p> <p>It's the first time I work with <i>unit tests</i>, so it was difficult at the beggining, but everything goes just fine :)</p>epassaro15@gmail.com (epassaro)Thu, 27 Jun 2019 20:24:45 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-5/Blog post: Week 4https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-4/<h2 style="text-align: justify;">Storing Pandas objects into HDF5 files 💾</h2> <p style="text-align: justify;">After successfully parsing more than 1.000 <em>plain text</em> files now it's time to store data in an appropiate way.</p> <h3 style="text-align: justify;">What is HDF5?</h3> <p style="text-align: justify;">HDF stands for <strong>'Hierarchical Data Format'</strong> and it was designed to store enormous amounts of data. Originally was developed at the <em>National Center for Supercomputing Applications</em> and now it's supported by <em>The HDF Group</em>, a non-profit corporation.</p> <h3 style="text-align: justify;">Why use HDF5?</h3> <ul> <li> <p style="text-align: justify;">At its core HDF5 is binary file type specification.</p> </li> <li> <p style="text-align: justify;">It has the ability to <strong>store many datasets</strong>, user-defined <strong>metadata</strong>, optimized I/O, and the ability to query its contents.</p> </li> <li> <p style="text-align: justify;">Many programming languages have tools to work with the HDF.</p> </li> <li> <p style="text-align: justify;">HDF allows datasets to live in a nested tree structure. <strong>In effect, HDF5 is a file system within a file.</strong> The 'folders' inside this filesystems are called <em>groups</em>, and sometimes <em>nodes</em> or <em>keys</em> (or at least these terms are used indistinctively).</p> </li> </ul> <h3 style="text-align: justify;">Toolbox</h3> <p style="text-align: justify;">There are at least three Python packages which can handle HDF5 files: <strong>pytables</strong>, <strong>h5py</strong> and <strong>pandas.HDFStore</strong>. Also, there are a few tools to visualize them: <strong>HDFViewer</strong> (Java), <strong>HDFCompass</strong> (Python) and <strong>Vitables</strong> (Python). They can be found at the Ubuntu repositories, but often they work as expected.</p> <p style="text-align: justify;">Fortunately, <strong>Vitables</strong> is available through <strong>conda-forge</strong> package and works flawlessly.</p> <h3 style="text-align: justify;">Example #1: Dump a DataFrame</h3> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>import pandas as pd </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>data = {'A': [1,2,3], 'B': [4,5,6]} </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>df = pd.DataFrame.from_records(data) </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>with pd.HDFStore('test.h5', mode='w') as f: </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>    f.append(key='/new_dataset', df, format='table', data_columns=df.columns) </code></code></span></p> <p style="text-align: justify;"><a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oWASJvIm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/26icvzyzbr63x8osxdl5.png"><img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--oWASJvIm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/26icvzyzbr63x8osxdl5.png"></a></p> <p style="text-align: justify;"> </p> <h3 style="text-align: justify;">Example #2: Include metadata</h3> <p style="text-align: justify;">Maybe one of the most interesting aspects of HDF is the ability to store metadata*. This was a bit hard to find in Pandas documentation.</p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>meta = { 'date': '21/06/2019', 'comment': 'Watch Evangelion on Netflix'} </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>with pd.HDFStore('test.h5', mode='w') as f: </code></code></span></p> <p style="text-align: justify;"><span style="font-size: 11px;"><code><code>    f.get_storer('/new_dataset').attrs.metadata = meta </code></code></span></p> <p style="text-align: justify;">*FITS format can do this as well ;)</p> <p style="text-align: justify;"> </p> <h2 style="text-align: justify;">What's next?</h2> <p style="text-align: justify;">Next week I'll be working on <strong>unit testing</strong>.</p> <p style="text-align: justify;"> </p> <p style="text-align: justify;"><strong>This entry also can be found at <a href="https://dev.to/epassaro">dev.to/epassaro</a></strong></p>epassaro15@gmail.com (epassaro)Sat, 22 Jun 2019 01:19:43 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-4/Check in: Week 3https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-3/<p><strong>1. What did you do this week?</strong></p> <p>On the third week of the coding period I wrote the <i>photoionizaton cross section </i>parser, as stipulated on my proposal.</p> <p><br> <strong>2. What is coming up next?</strong></p> <p>Next week I will work on <em>unit testing</em> and <em>documentation </em>of all the parsers.</p> <p><br> <strong>3. Did you get stuck anywhere?</strong></p> <p>No, I didn't.</p>epassaro15@gmail.com (epassaro)Wed, 12 Jun 2019 23:39:03 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-3/Blog post: Week 2https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-2/<p><span style="font-size: 18px;"><strong>Pandas + regex = ♥</strong></span></p> <p><span style="font-size: 16px;">I always avoid working with regular expressions, but sometimes is the right tool to use.</span></p> <p><span style="font-size: 16px;">I had to write a parser for a variety of files which are <strong>almost identical</strong> in format. These files are the output of Fortran routines dated from 1995 to present, and contains atomic measurements made by physicists. The subtle differences between them makes impossible to use whitespaces as separators.</span></p> <p><span style="font-size: 16px;">Fortunately, Pandas allows you to use <strong>regular expressions</strong> as <em>'sep' </em>argument in <em>pandas.read_csv</em> function.</span></p> <p><span style="font-size: 16px;">Also, one of my mentors is <strong>really good</strong> at regular expressions, so after a few tries we have our perfect parser.</span></p> <p><span style="font-size: 16px;"><a href="https://nbviewer.jupyter.org/gist/epassaro/5ba3fabe81827ad04d8362783a655cbd">See an example</a></span></p> <p><br> <span style="font-size: 16px;">Now we're capable of extracting data from +300 files in a simple and homogeneous way!</span></p> <p><span style="font-size: 16px;">On <strong>W</strong><strong>eek 2</strong> I had to write move from these Jupyter Notebooks to the actual codebase. This was a challenge to me because I'm not so confident about my <em>object oriented programming </em>skills, but it worked out!. I successfully wrote new classes for parsers which can read files and dump data in the HDF5 format.</span></p> <p><span style="font-size: 16px;"><a href="https://github.com/tardis-sn/carsus/pull/121/commits/d37f8c69de5178592677f4c49df8370fd2dde80f">See an example</a></span></p> <p> </p> <p><span style="font-size: 18px;"><strong>Moving to Python 3, continuous integration and more:</strong></span></p> <p><span style="font-size: 16px;">When I decided to learn Python I went for 3.5, so I skipped Python 2. The only thing I knew about Python "<em>legacy" </em>was the use of the <em>print </em>statement without parentheses. </span></p> <p><span style="font-size: 16px;">At the beginning of the coding period I was told to get Travis CI to work again. <em>Unit testing </em>and<em> continuous integration </em>were things I've heard about but never had the chance to use. So porting our codebase<em> </em>to Python 3 was absolutely necessary in order to move on. </span></p> <p><span style="font-size: 16px;">A few things I've learned in the process:</span></p> <ul> <li><span style="font-size: 16px;">Look for <strong>range()</strong>, <strong>zip()</strong>, and <strong>map()</strong> functions and use<strong> list() </strong>before them.</span></li> <li><span style="font-size: 16px;">Sometimes is good to pin package versions close to the ones that worked when the package was built.</span></li> <li><span style="font-size: 16px;"><strong>itertools()</strong> is a deprecated method in Python 3, look for it!</span></li> <li><span style="font-size: 16px;">Of course use parentheses in the print statements.</span></li> </ul> <p><span style="font-size: 16px;">Fortunately, Travis CI is "easy" to configure, specially if you have experience with <em>bash</em>.</span></p> <p><span style="font-size: 16px;">  </span></p> <p> </p> <p><span style="font-size: 16px;"><strong>This entry also can be found at <a href="https://dev.to/epassaro">dev.to/epassaro</a></strong></span></p>epassaro15@gmail.com (epassaro)Sun, 02 Jun 2019 20:59:37 +0000https://blogs.python-gsoc.org/en/epassaros-blog/blog-post-week-2/Check in: Week 1https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-1/<p><strong>1. What did you do this week?</strong></p> <p>On the first week of the coding period I wrote the <i>energy levels and oscillatory strengths </i>parser, as stipulated on my proposal. Also I had to port the <em>Carsus package </em>from Python 2 to Python 3 and build it with Travis CI.</p> <p><br> <strong>2. What is coming up next?</strong></p> <p>Next week I will work on the <em>collitional energies </em>parser.</p> <p><br> <strong>3. Did you get stuck anywhere?</strong></p> <p>No, I didn't.</p>epassaro15@gmail.com (epassaro)Fri, 31 May 2019 00:49:49 +0000https://blogs.python-gsoc.org/en/epassaros-blog/check-in-week-1/Winter is codinghttps://blogs.python-gsoc.org/en/epassaros-blog/hello-world-1/<p><span style="color: null;"><span style="font-family: Georgia,serif;"><span style="font-size: 18px;">Hi! I'm Ezequiel from Argentina and during the next 12 weeks (southern hemisphere winter) I will be working with the TARDIS sub-organization in the project called "Expansion of the TARDIS Atomic Database" as part of the Google Summer of Code 2019 program.</span></span></span></p> <p> </p> <blockquote> <p><span style="color: null;"><span style="font-family: Georgia,serif;"><span style="font-size: 18px;">TARDIS is a Monte Carlo radiative transfer code whose primary goal is the calculation of theoretical spectra for supernovae based on a number of input parameters, such as the supernova brightness and the abundances of the different chemical elements present in the ejecta. The main idea for this procedure is that by finding a close match between theoretical and observed spectra the parameters that actually describe the supernovae can be identified.</span></span></span></p> <p><span style="color: null;"><span style="font-family: Georgia,serif;"><span style="font-size: 18px;">The objective of this proposal is to incorporate new atomic data into the TARDIS database. In order to accomplish this job several tasks are required: parsers for different file types must be written, unit testing, full integration with TARDIS codebase and more. Finally, will be crucial to determine how new atomic data affects the synthethic spectra.</span></span></span></p> <p><span style="color: null;"><span style="font-family: Georgia,serif;"><span style="font-size: 18px;">The result of this work will not only be of great value for TARDIS, but also for many researchers who require atomic measurements.</span></span></span></p> </blockquote> <p> </p> <p><span style="color: null;"><span style="font-family: Georgia,serif;"><span style="font-size: 18px;">Coding period starts on May 27th, so now we'are in the middle of something called "bonding period" where organizations and students do some preliminary work. <strong>Stay tuned for more updates!</strong></span></span></span></p>epassaro15@gmail.com (epassaro)Tue, 21 May 2019 22:36:45 +0000https://blogs.python-gsoc.org/en/epassaros-blog/hello-world-1/