Access Apache Spark Web UI when cluster is running on closed port server machines

When you have a Apache spark cluster running on a server were ports are closed you cannot simply access the Spark master web UI by localhost:8080. The solution to this is to use SSH Tunnels. Which is pretty straight forward.

Note: You can checkout my blog post on how to setup a Spark standalone cluster locally  (The steps are pretty much the same when you are setting it up on a server) - How to set up a Apache Spark cluster in your local machine

Scenario 1:

The first most basic scenario would be if you have direct ssh access to the server where the Apache spark master is running on. The all you have to do is run the following command in a terminal window on your local machine ( Laptop or desktop that you use) after you start the master in the server machine.

 $ ssh -L 8080:localhost:8080 

Once you have run this command you can access the Spark Web UI by simply going to "http://localhost:8080/" on your web browser. Likewise you might want to create SSH tunnels for other ports that are needed when using the Spark Web UI, such as 4040.

Scenario 2:

If you do not have direct access to the server that is running the Apache Spark master you can do a multilevel SSH Tunnel. This might the case if you are running the cluster in a super computer where you only have access to the login node of the super computer and you are running the Spark master on a compute node, which you can only access through the login node. When this is the case you can simply do the same SSH tunnel in two steps. First after you start the Spark master on the compute node run the following command to create a SSH tunnel for port 8080

 $ ssh -L 8080:localhost:8080

Then run the following command from your local machine ( Laptop or desktop that you use).

 $ ssh -L 8080:localhost:8080 

Then you can access the Web UI just as before by going to "http://localhost:8080/".

Update: Based on Saliya's comment you can do the two steps in a single command. An example of the command is as follows. You just need to run this from your local machine

 $ ssh -L 

Popular posts from this blog

How to set up a Apache Spark cluster in your local machine

Writing Unit Tests to test JMS Queue listener code with ActiveMQ

Apache Hadoop MapReduce - Detailed word count example from scratch