sudharsana-kjl's Blog

Blog #6

sudharsana-kjl
Published: 08/22/2019

In the past week, my mentor and I tried to fix the dockerfile that sets up hadoop in a ubuntu container from scratch. Since that was becoming tidious, we tried setting up a mini hadoop cluster.

Apache has this mini mini hadoop cluster set up that gives a single node cluster. I tried building this using a maven docker image. The documentation has very little information on where hadoop is actually getting downloaded and the ports it'll be connecting to by default. My mentor and I debugged the dockerfile and tried to get this up and running but still there is a problem with ports and I'm working on it. Also, we figured out how to get the files from hdfs which can be either CSV or JSON type of files. I have implemented those changes as well.

Hopefully by next week I can finish this project.

View Blog Post

Weekly Check-in #10

sudharsana-kjl
Published: 08/22/2019

In the pat week, I was trying to set up hadoop using Dockerfile.

What did I do this week?

Setting up Hadoop in Docker with my limited knowledge in both is becoming a more difficult task due to the limited resources available on how to particularly set this up over docker using dockerfile. Also everytime I have to build the container from scratch, downloading all the files again and setting it up is a time consuming process. I tried an approach this week that got most of the instructions on the dockerfile working, yet there is an issue with starting the containers. I have addded the corresponding config files that would be used by docker and also a start-up shell script that is run while building the container to start hadoop after installing it. 

What is coming up next?

I need to get the dockerfileworking by this week so that i can move ahead and refine the hadoop source classes and add more tests if possible.

Did you get stuck anywhere?

Debugging the dockerfile was a difficult task for me. My mentor was very understanding and helped me in fixing it.

View Blog Post

Blog #5

sudharsana-kjl
Published: 08/22/2019

In this week I was trying my hands on in setting up hadoop in docker.

The next phase of our project involves making it compatible with input from a hadoop data source. With my limited knowledge in hadoop and docker, I was trying to set it up. First I set it up in my local computer and made it work. I had written the basic classes that will be needed to establish a connection and successfully set up a connection.

I also added config() and args() method that can be used to fetch the arguments and its corresponding values specific to hadoop source. In hadoop, the challenging part is to handle the files from the HDFS. These files can be either CSV or JSON files. So i have to discuss with my mentor about how I can handle this.

View Blog Post

Weekly Check-in #9

sudharsana-kjl
Published: 08/06/2019

In the past week, I was working on making certain changes in my previous PRs and make it ready for the upcoming release.

What did I do this week?

For MySQL source, I wa trying to set up the travis build. This was my first time working with travis CI for this project and it was a good learning experience. Label for CSV Source merge test was done by my mentor, there are certain tests failing now, I was trying to fix that as well. I tried setting up hadoop in docker. Initially my approach was to build a hadoop container and push it to the docker hub so that anyone can fetch from that. After discussing with my mentor, I'm going to set up a Dockerfile so that people can see the commands that are being run and it'll be easier to set up as well.

What is coming up next?

I'll try to set up the Hadoop connection in a week. I have to speed up things a bit so that I can focus on other things to be done once the connection is set up.

Did you get stuck anywhere?

I did get stuck on setting up hadoop. After discussing with my mentor, I have a clear idea on how to approach now.

View Blog Post

Weekly Check-in #8

sudharsana-kjl
Published: 08/01/2019

In the past week, I was working on setting up Hadoop and trying to import data from it. I got my PRs reviewed by my mentor and working on the changes he suggested.

What did I do this week?

I had initially set up Hadoop in my Ubuntu system. But setting this up would be difficult in Travis CI. So I was exploring other options. The easy way to do this is through docker but there is no official Hadoop distribution in docker. I was checking out cloudera's quick start VM but when I was trying to set this up my laptop started to hang. I will continue to look into other options. Also my mentor had reviewed my HDFS source PR and guided me on how to proceed further. 

What is coming up next?

I'll have to work on the docker set up for Hdfs source. I'll probably have to write a script or a docker-compose script. MySQL PR had an issue while my mentor was adding a merge test. Will work on that as well. We'll be preparing for a release soon.

Did you get stuck anywhere?

I struggled a bit with the Hadoop set up. My mentor gave me some input on this and hopefully I'll be able to create a docker set up by next week.

 

View Blog Post