Shell script for Tree structured copying to copy large data files to large number of nodes with scp

Sometimes you need to copy a large file to number of remote hosts. I recently had a similar situation where i had to copy a 56GB data file to around 30 compute nodes in an HPC cluster. And i did not have the option to copy it to the shared disk (since it was pretty filled up). So i had to copy the file to the private scratch area of each node. Having the data in the private scratch area is better for the application since you get better read performance ( at least in the system i was working on).

So copying to each node from my machine or from the head node would take a very long time. because of network bandwidth limitations. So i came up with a small shell script that would do the copy in a tree like structure. How the script goes is that once it is provided with the set of nodes and the data file and destination. first it will copy the data to the first node in the file say node1. Then it will start copying from both the headnode and node1 to node2 and node3 respectively. likewise in the next step it will copy the file from all 4 nodes (head node and 3 compute nodes) to 4 more nodes. And this will go in until the file is copied to all the nodes in the list.

Inputs :

1 : txt file with nodes listed with one node per line

2 : data file to be copied

3 : destination to be copied to.

For this code you need to run the script from the folder you have the data file from. But changing that would be a simple improvement ( which i am to lazy to do at the moment )


#!/bin/bash

# input nodes file, file to be copied, destination of file
# the command should be executed from the location the file is (needs to be improved)
counter=0
for line in `cat $1`;do
        nodes[$counter]=$line;
        counter=$((counter+1))
done
counter2=0
counter3=0
while (( $counter2 < $counter ))
do
        #scp $2 ${nodes[ $counter2 ]}:$3 &
        scp $2 ${nodes[ $counter2 ]}:$3 &
        if (( $counter2 > 0 ))
        then
                counter3=0
                while (( $counter3 < $counter2 ))
                do
                        if (( $((counter2 + counter3 + 1)) < $counter ))
                        then
                                scp ${nodes[ $counter3 ]}:$3/$2 ${nodes[ $((counter2 + counter3 + 1)) ]}:$3 &
                                counter3=$((counter3+1))
                        else
                                break;
                        fi
                done
                counter2=$((counter2+counter3))
        fi
        wait
        counter2=$((counter2+1))
done
wait

Hope this help someone who is having the same issue

Popular posts from this blog

How to set up a Apache Spark cluster in your local machine

Writing Unit Tests to test JMS Queue listener code with ActiveMQ

Apache Hadoop MapReduce - Detailed word count example from scratch