Articles on epassaro's Blog

Check in: Final

epassaro15@gmail.com (epassaro) — Fri, 23 Aug 2019 14:49:04 +0000

1. What did you do this week?

Polished my final evaluation
Started working in how to add Chianti levels and lines to the atomic files.
GSoC'19 has ended!

2. What is coming up next?

I'll keep working with the TARDIS team and contributing to their codebase :)

3. Did you get stuck anywhere?

No, I didn't.

Blog post: Week 12

epassaro15@gmail.com (epassaro) — Sun, 18 Aug 2019 16:37:25 +0000

Hi everyone!

End of summer is close, and we are doing some interesting stuff here :)

This week we mainly focused on doing some integrity tests to our new atomic files, and finally running some TARDIS simulations with them! My objective was to ensure we're getting an identical atomic files with my new module.

Next week I'll run more simulations and polish the documentation. Also, I started to write my final evaluation for GSoC'19.

Last months have been really fun and exciting!

Check in: Week 11

epassaro15@gmail.com (epassaro) — Fri, 09 Aug 2019 17:52:48 +0000

1. What did you do this week?

Finally the new parser for Kurucz line list is ready! (see PR #146). Made several tests to check if the new output matches the old one. Also started working in an up to date notebook for the section "Creating the TARDIS example database" in the Carsus documentation.

2. What is coming up next?

I have to work on how to add Chianti levels and lines and then run some simulations.

3. Did you get stuck anywhere?

No, everything worked out :)

Blog post: Week 10

epassaro15@gmail.com (epassaro) — Sun, 04 Aug 2019 17:28:40 +0000

Hi everyone!

We made huge progress with the new Kurucz parser (it's almost ready!). As I said last week, this is a difficult task and involves reestructuring different pieces of existing code and some knowledge the atomic structure and physics.

In this notebook you can see a demonstration on how the new GFALL class returns the same DataFrames than the old API. Next step consist in adding metastable flags for lines and levels.

Once this work is completed we should discuss how to move on. Probably we want to automatize the process of making new atomic files: every time a source is updated (for example NIST atomic weights) trigger a new build.

Check in: Week 9

epassaro15@gmail.com (epassaro) — Sat, 27 Jul 2019 21:55:46 +0000

1. What did you do this week?

This week I worked on the new Kurucz line list parser. Also, I finished the CMFGEN pipeline (merged PR #143).

NIST (PR #144) and Knox Long's recombination zeta (PR #145) are almost finished too.

2. What is coming up next?

I will continue working on the Kurucz parser.

3. Did you get stuck anywhere?

Kurucz parser is a really difficult one! It's taking me longer than I expected.

Blog post: Week 8

epassaro15@gmail.com (epassaro) — Fri, 19 Jul 2019 19:11:04 +0000

This week I focused in two things:

Start writing new classes for previously existing atomic sources to bypass the SQL database and store data directly in HDF5 format. This is a much simpler approach and will speed things up for TARDIS developers when they need to build new atomic files. For an example, see PR #144.
We started the process to set up a pipeline to download, extract and convert the entire CMFGEN database to HDF5. See PR #143.

Also in the process I learned a couple of things:

How to use the module logger from the Python standard library (and why it's a good idea to use it).
Why you should never use a bare `except` statement. Yep, learned this in the worst possible way.
I'm getting good at writing regular expresions.

I will continue working at these two items the next week.

The second part of GSoC is almost ending and we already have some good results, but lot of work is ahead!

Check in: Week 7

epassaro15@gmail.com (epassaro) — Fri, 12 Jul 2019 17:40:50 +0000

1. What did you do this week?

This week I succesfully recreated the standard TARDIS atomic file and run some simulations! :)

The transition to Python 3 is complete and the Carsus package finally is fully operational.

Also I've updated some documentation which can be accessed from here: https://tardis-sn.github.io/carsus/notebooks/quickstart.html

2. What is coming up next?

Now we're going to add methods to bypass the SQL database and store directly in HDF5. This would give future Carsus users/developers a simpler workflow.

3. Did you get stuck anywhere?

Not really. Spotted a couple of bugs in Pandas and SQLAlchemy which gave me headaches, but everything worked out with a lot of effort and my mentors support. I'm going to open tickets in GitHub for these issues!

Blog post: Week 6

epassaro15@gmail.com (epassaro) — Sun, 07 Jul 2019 23:31:28 +0000

This week I worked on how to make a TARDIS atomic file. This is an intermediate step necessary towards the work scheduled for the next weeks.

Carsus is the subpackage in charge to parse and ingest atomic data from different sources into a SQL database. Once the data is ingested we can dump this data into the HDF5 file requested by TARDIS to run the simulations.

The Carsus data model includes classes like: Atom, Ion, Level, and more. It was a bit hard to understand for me at the beggining, but it worked out.

By now, we can ingest atomic data from three different sources: National Institute of Standards and Technology (NIST), the Kurucz line list (GFALL), and the Chianti atomic database. Our main goal for the next weeks is to write code to ingest data obtained from the CMFGEN parsers (the code we wrote for the first half of GSoC) in the SQL database.

I found an annoying bug which made impossible to ingest data from GFALL without adding NIST data in first place. Debugging this error was very time consuming and we have not found a solution yet.

Check in: Week 5

epassaro15@gmail.com (epassaro) — Thu, 27 Jun 2019 20:24:45 +0000

1. What did you do this week?

I started writing docstrings and unit tests for the new classes and functions I've created.

2. What is coming up next?

I succesfully set up the Travis continous integration pipeline in the first weeks of GSoC, so I'm going to work a bit more on the output methods.

3. Did you get stuck anywhere?

It's the first time I work with unit tests, so it was difficult at the beggining, but everything goes just fine :)

Blog post: Week 4

epassaro15@gmail.com (epassaro) — Sat, 22 Jun 2019 01:19:43 +0000

Storing Pandas objects into HDF5 files 💾

After successfully parsing more than 1.000 plain text files now it's time to store data in an appropiate way.

What is HDF5?

HDF stands for 'Hierarchical Data Format' and it was designed to store enormous amounts of data. Originally was developed at the National Center for Supercomputing Applications and now it's supported by The HDF Group, a non-profit corporation.

Why use HDF5?

At its core HDF5 is binary file type specification.
It has the ability to store many datasets, user-defined metadata, optimized I/O, and the ability to query its contents.
Many programming languages have tools to work with the HDF.
HDF allows datasets to live in a nested tree structure. In effect, HDF5 is a file system within a file. The 'folders' inside this filesystems are called groups, and sometimes nodes or keys (or at least these terms are used indistinctively).

Toolbox

There are at least three Python packages which can handle HDF5 files: pytables, h5py and pandas.HDFStore. Also, there are a few tools to visualize them: HDFViewer (Java), HDFCompass (Python) and Vitables (Python). They can be found at the Ubuntu repositories, but often they work as expected.

Fortunately, Vitables is available through conda-forge package and works flawlessly.

Example #1: Dump a DataFrame

import pandas as pd

data = {'A': [1,2,3], 'B': [4,5,6]}

df = pd.DataFrame.from_records(data)

with pd.HDFStore('test.h5', mode='w') as f:

f.append(key='/new_dataset', df, format='table', data_columns=df.columns)

Example #2: Include metadata

Maybe one of the most interesting aspects of HDF is the ability to store metadata*. This was a bit hard to find in Pandas documentation.

meta = { 'date': '21/06/2019', 'comment': 'Watch Evangelion on Netflix'}

with pd.HDFStore('test.h5', mode='w') as f:

f.get_storer('/new_dataset').attrs.metadata = meta

*FITS format can do this as well ;)

What's next?

Next week I'll be working on unit testing.

This entry also can be found at dev.to/epassaro

Check in: Week 3

epassaro15@gmail.com (epassaro) — Wed, 12 Jun 2019 23:39:03 +0000

1. What did you do this week?

On the third week of the coding period I wrote the photoionizaton cross section parser, as stipulated on my proposal.

2. What is coming up next?

Next week I will work on unit testing and documentation of all the parsers.

3. Did you get stuck anywhere?

No, I didn't.

Blog post: Week 2

epassaro15@gmail.com (epassaro) — Sun, 02 Jun 2019 20:59:37 +0000

Pandas + regex = ♥

I always avoid working with regular expressions, but sometimes is the right tool to use.

I had to write a parser for a variety of files which are almost identical in format. These files are the output of Fortran routines dated from 1995 to present, and contains atomic measurements made by physicists. The subtle differences between them makes impossible to use whitespaces as separators.

Fortunately, Pandas allows you to use regular expressions as 'sep' argument in pandas.read_csv function.

Also, one of my mentors is really good at regular expressions, so after a few tries we have our perfect parser.

See an example

Now we're capable of extracting data from +300 files in a simple and homogeneous way!

On Week 2 I had to write move from these Jupyter Notebooks to the actual codebase. This was a challenge to me because I'm not so confident about my object oriented programming skills, but it worked out!. I successfully wrote new classes for parsers which can read files and dump data in the HDF5 format.

See an example

Moving to Python 3, continuous integration and more:

When I decided to learn Python I went for 3.5, so I skipped Python 2. The only thing I knew about Python "legacy" was the use of the print statement without parentheses.

At the beginning of the coding period I was told to get Travis CI to work again. Unit testing and continuous integration were things I've heard about but never had the chance to use. So porting our codebase to Python 3 was absolutely necessary in order to move on.

A few things I've learned in the process:

Look for range(), zip(), and map() functions and use list() before them.
Sometimes is good to pin package versions close to the ones that worked when the package was built.
itertools() is a deprecated method in Python 3, look for it!
Of course use parentheses in the print statements.

Fortunately, Travis CI is "easy" to configure, specially if you have experience with bash.

This entry also can be found at dev.to/epassaro

Check in: Week 1

epassaro15@gmail.com (epassaro) — Fri, 31 May 2019 00:49:49 +0000

1. What did you do this week?

On the first week of the coding period I wrote the energy levels and oscillatory strengths parser, as stipulated on my proposal. Also I had to port the Carsus package from Python 2 to Python 3 and build it with Travis CI.

2. What is coming up next?

Next week I will work on the collitional energies parser.

3. Did you get stuck anywhere?

No, I didn't.

Winter is coding

epassaro15@gmail.com (epassaro) — Tue, 21 May 2019 22:36:45 +0000

Hi! I'm Ezequiel from Argentina and during the next 12 weeks (southern hemisphere winter) I will be working with the TARDIS sub-organization in the project called "Expansion of the TARDIS Atomic Database" as part of the Google Summer of Code 2019 program.

TARDIS is a Monte Carlo radiative transfer code whose primary goal is the calculation of theoretical spectra for supernovae based on a number of input parameters, such as the supernova brightness and the abundances of the different chemical elements present in the ejecta. The main idea for this procedure is that by finding a close match between theoretical and observed spectra the parameters that actually describe the supernovae can be identified.

The objective of this proposal is to incorporate new atomic data into the TARDIS database. In order to accomplish this job several tasks are required: parsers for different file types must be written, unit testing, full integration with TARDIS codebase and more. Finally, will be crucial to determine how new atomic data affects the synthethic spectra.

The result of this work will not only be of great value for TARDIS, but also for many researchers who require atomic measurements.

Coding period starts on May 27th, so now we'are in the middle of something called "bonding period" where organizations and students do some preliminary work. Stay tuned for more updates!