Posts

Using Screens when working with remote servers

This will be a short post on using screen sessions when running long jobs on remote servers. Using screen session will help if you don't want something that is running to stop when you loose connection to the server. I use this when i run long running jobs on the super computers at my lab in interactive mode. But this can be useful with any remote server so i thought i should to a small write up regarding this.

Installation

First you would need to make sure that the remote server ( Linux based ) has screen installed. just type the following command to check the screen version, if it is not installed you would have to install it or ask the system admin to install it for you.
$ screen -version Screen installation command
$ sudo apt-get install screen
How it works
How screens work is simple to understand. After login in to the remote server you create a screen session and continue work inside the screen session. You can run something in the screen session and leave the screen session …

Similarities and Differences Between Parallel Systems and Distributed Systems Part 2

This post part 2 from a series of posts that are extracted from a technical report i wrote for a research study and Indiana University. The full paper can be found in the digital science center publications page, or you can get the pdf link here. I thought of posting the content here so more people can access the content which i think might be helpful to some. I am breaking this down into several posts because of the length of the document. The content is based on my knowledge and the current technologies as of day of writing the report and i might not have got everything correct. Please point out what you think in the comments if you think i have got something wrong, Will try to respond and update the document if needed. Will also include the list of references in each post for completeness.
Part 1:  Similarities and Differences Between Parallel Systems and Distributed Systems Part 1
This post will give the introduction and compare and contrast the two domains with regards to the follo…

Similarities and Differences Between Parallel Systems and Distributed Systems Part 1

This post part 1 from a series of posts that are extracted from a technical report i wrote for a research study and Indiana University. The full paper can be found in the digital science center publications page, or you can get the pdf link here. I thought of posting the content here so more people can access the content which i think might be helpful to some. I am breaking this down into several posts because of the length of the document. The content is based on my knowledge and the current technologies as of day of writing the report and i might not have got everything correct. Please point out what you think in the comments if you think i have got something wrong, Will try to respond and update the document if needed. Will also include the list of references in each post for completeness.

This post will give the introduction and compare and contrast the two domains with regards to the following
Fault tolerance Support of collectives Dynamic resources utilizationCommunication proto…

Shell script for Tree structured copying to copy large data files to large number of nodes with scp

Sometimes you need to copy a large file to number of remote hosts. I recently had a similar situation where i had to copy a 56GB data file to around 30 compute nodes in an HPC cluster. And i did not have the option to copy it to the shared disk (since it was pretty filled up). So i had to copy the file to the private scratch area of each node. Having the data in the private scratch area is better for the application since you get better read performance ( at least in the system i was working on).

So copying to each node from my machine or from the head node would take a very long time. because of network bandwidth limitations. So i came up with a small shell script that would do the copy in a tree like structure. How the script goes is that once it is provided with the set of nodes and the data file and destination. first it will copy the data to the first node in the file say node1. Then it will start copying from both the headnode and node1 to node2 and node3 respectively. likewise…

Setting up Heron Cluster with Apache Aurora Locally

Image
In this post we will be looking at how we can setup Heron steam processing engine in Apache Aurora in our local machine. Oh Boy this is going to be a long post :D. I am doing this on Ubuntu 14.04 and these steps should be similar to any Linux machine. Heron supports deployment in Apache Aurora out of the box. Apache Aurora will act as the Scheduler for Heron after the setup is complete. In order to do this first you will have to setup Apache Zookeeper and allow Heron to communicate with it. Here Apache Zookeeper will act as the State Manager of the Heron deployment. if you just want to setup a local cluster without the hassle of  installing aurora take a look at my previous blog post - Getting started with Heron stream processing engine in Ubuntu 14.04

Setting Up Apache Aurora Cluster locally  First thing we need to do is to setup Apache Aurora locally. I will try to explain as much of the configurations as i can as we go on. First lets get a Apache Aurora cluster running on our loca…

Getting started with Heron stream processing engine in Ubuntu 14.04

I was trying to get started with Heron which is a stream processing engine from twitter and faced some problems when trying to do the initial setup on Ubuntu. I am using Ubuntu 14.04 so not these problems might not happen in other Ubuntu versions. The steps below are simply following the steps in the Heron documentation. But since i am working on Ubuntu we will only show the steps for Ubuntu.

Step 1.a : Download installation script files
You can download the script files that match to Ubuntu from https://github.com/twitter/heron/releases/tag/0.14.0

For the 0.14.0 release the files you need to download will be the following.

heron-client-install-0.14.0-ubuntu.sh
heron-tools-install-0.14.0-ubuntu.sh

Optionally - You want need the following for the steps in the blog post

heron-api-install-0.14.0-ubuntu.sh
heron-core-0.14.0-ubuntu.tar.gz

Step 1.b: Execute the client and tools shell scripts
$ chmod +x heron-client-install-VERSION-PLATFORM.sh $ ./heron-client-install-VERSION-PLATFORM.sh --user …