The past few days i grew some interest in Apache Spark and thought of playing around with it a little bit. If you haven't heard about it go an take a look its a pretty cool project it claims to be around 40x faster than Hadoop in some situation. The incredible increase in performance is gained by leveraging in-memory computing technologies. I want go into details about Apache Spark here if you want to get a better look at Spark just check out there web site - Apache Spark.
In this post we will be going through the steps to setup an Apache Spark cluster on your local machine. we will setup one master node and two worker nodes. If you are completely new to Spark i recommend you to go through First Steps with Spark - Screencast #1 it will get you started with spark and tell you how to install Scala and other stuff you need.
We will be using the launch scripts that are provided by Spark to make our lives more easier. First of all there are a couple of configurations we need to set.
During my GSoC project for OpenNMS i did some work with JMS and wanted to write test classes to make sure that my code was working well. In this article i will try to explain how you can use ActiveMQ to write test cases for your JMS code. This is very handy when your code has a part that listens to a JMS queue.
Apache ActiveMQ is a extremely popular and very powerful open source messaging and Integration Patterns server. You can check it out here. The part we will be using to run our test classes is the embedded broker that is provided by ActiveMQ. This allows you to create a temporary broker for test purposes to create a JMS queue in our case. If you want to learn more about the embedded broker check out this article. The easiet way to create a embedded broker is through the following code line. This will automatically create a embedded broker. ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
In this post we will look at how to create and run a word count program in Apache Hadoop. In order to make it easy for a beginner we will cover most of the setup steps as well. Please note that this blog entry is for Linux based environment. I am running Ubuntu 14.04 LTS on my machine. For windows users steps might be a little different, information regarding running Hadoop on Windows is available at Build and Install Hadoop 2.x or newer on Windows.
1. Need to have Java installed (preferabally a newer java version such as 1.7 or 1.8 )
Download Oracle JDK 8 from http://www.oracle.com/technetwork/java/javase/downloads/index.html Extract the archive to a folder named jdk1.8.0 Set the following environment variables. (You can set the variables in the .bashrc file) JAVA_HOME=
export JAVA_HOME PATH
2. SSH, If you do not have ssh installed in your machine use the following command to install ssh and rsync which is also needed