Wednesday, December 18, 2013

Setting up a local 3 node Cassandra cluster on single machine

When trying to create a Cassandra cluster for a small test scenario i came across the fact that it is not possible to create a cluster in a single machine by changing ports. Since i didn't want to set it up on several machines did some digging and with some help found a workaround for this.

The workaround is to add virtual Ethernet devices to our machine and use these to configure the cluster. So first we need to create 3 ip aliases for our machine, you can use the following command to get this done. This is for a Linux machine check out the stack overflow thread to check how its done on windows here

ifconfig eth0:1 192.168.0.2
ifconfig eth0:2 192.168.0.3
ifconfig eth0:3 192.168.0.4
you can check if the aliases are created by executing 'ifconfig'.

once you have this setup all you have to do is configure the Cassandra configurations. You need to change the values of the following entries cassandra.yaml file as follows

Node 1:
- seeds: "192.168.0.2"
listen_address: 192.168.0.2
rpc_address: 192.168.0.2

Node 2:
- seeds: "192.168.0.2"
listen_address: 192.168.0.3
rpc_address: 192.168.0.3
Node 3:
- seeds: "192.168.0.2"
listen_address: 192.168.0.4
rpc_address: 192.168.0.4
once you have these configs setup all you have to do is start the Cassandra server. And it will form a 3 node cluster in a ring formation.


Tuesday, November 5, 2013

Simple log analysis with Apache Spark

In this post we will try to learn to run some simple commands with Spark, some simple transformations and actions. We will do some simple log analysis using Spark. I will be using the local Spark cluster that i setup on my laptop. if you haven't read my first post on how to setup an Spark cluster on your local machine i recommend you read the post How to set up a Apache Spark cluster in your local machine. First we will connect the the cluster with the following command.

MASTER=spark://pulasthi-laptop:7077 ./spark-shell

"spark://pulasthi-laptop:7077" is the URL of the master that can be found in the Spark web ui. After connecting successfully you should be able to see an Scala console where you can execute commands. Also you should be able to see your application listed in the web-ui under running applications. To run this scenarios i am using a set of log files generated from various WSO2 products as sample data. Any set of log files or even just a set of text file should suffice for this purpose. I have around 800Mb of log files in a single folder. and i will be running the commands on this data set. 

Spark has a concept called RDD or Resilient Distributed Data-sets. I am not going to go into details about what RDD's are since its a completely separate topic of its own. Simple understand that any transformation done to data in Spark will results in an RDD which can be persisted ( if wanted ) and used. If you want to learn about RDD's in depth there is a great paper on that just check it out Resilient Distributed Datasets.

So first lets create our first RDD from the log files we have. A point to remember is that Spark uses lazy loading hence even though we create an RDD with the log files data will not be pulled into memory until we so some action on the data. We are using the Spark’s interactive Scala shell so all the commands are Scala. A predefined variable "sc" or "SparkContext" is available and you can see the methods that are available with it using the tab

var logs = sc.textFile("/home/pulasthi/work/logs");

The command above will create an RDD with the text files in the path given. you will somthing similar to the follwing printed in the terminal after executing the command.

logs: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at :12

Now lets do a action on the RDD that we just created. Lets just do a simple count 

logs.count

After some time you should get a count of the total number of lines of the logs. on the first run the data will be directly accessed from the file system since Spark uses lazy loading. we will run the same query twice and see what happens. The time spent for the operation will be displayed on the console it self. If you click in the link "Spark shell" on the web ui of the master you will be able to see the progress of each task through the web ui.

From the screen shot above you can see the details of the two runs of the same task as you can see it took 40.7s to complete it on the first run and only 8.6s on the second run. This huge difference is because on the first run data was loaded from the disk and on the second run data was in the RAM.

Note: Much higher performance can be gained when when running on an actual cluster the numbers are a bit high since i am running everything on my laptop.

Now lets filter the logs a bit. lets say we only need to extract some specific data from the log files and we know they are only contained in the areas logged as ERROR in the log files. Simply put we want to analyse some of the errors. So instead of running the query on the logs RDD we will create a errors RDD that only contain the ERROR sections of the log. The benefit of doing this is that for future tasks we will only need the smaller errors RDD to reside in memory. We will execute the following command to create the errors RDD.



 val errors = logs.filter(_.contains("ERROR"))

Again since lazy loading is done the errors RDD will not be created yet. Now i want to extract the lines that have a value called "SequenceAdmin" within the line. The following command will do just that for us.

 val seqad = errors.filter(_.contains("SequenceAdmin"))
Now try out some actions on the newly created RDD "seqad" for example
  • seqad.count - will give you the number of lines "SequenceAdmin" was contained in.
  • seqad.first - will print the first entry in the RDD.
  • seqad.take(10) will give you the first 10 entries.
After the first action the "errors" RDD will be kept in memory unless the its removed to gain space for new RDD's. And subsequent tasks will be performed much faster than the first one. 

Hope this will help someone who is trying to learn spark and is new to spark. hope to write further posts as i work more with Spark.


Saturday, November 2, 2013

How to set up a Apache Spark cluster in your local machine

The past few days i grew some interest in Apache Spark and thought of playing around with it a little bit. If you haven't heard about it go an take a look its a pretty cool project it claims to be around 40x faster than Hadoop in some situation. The incredible increase in performance is gained by leveraging in-memory computing technologies. I want go into details about Apache Spark here if you want to get a better look at Spark just check out there web site - Apache Spark.

In this post we will be going through the steps to setup an Apache Spark cluster on your local machine. we will setup one master node and two worker nodes. If you are completely new to Spark i recommend you to go through First Steps with Spark - Screencast #1 it will get you started with spark and tell you how to install Scala and other stuff you need.

We will be using the  launch scripts that are provided by Spark to make our lives more easier. First of all there are a couple of configurations we need to set.

conf/slaves

When using the launch scripts this file is used to identify the host-names of the machine that the slave nodes will be running. All you have to do is provide the host names of the machines one per line. since we are setting up everything in our machine we will only need to add "localhost" to this file.

conf/spark-env.sh

There are a set of variables that you can set to override the default values. this can be done by putting in values in the "spark-env.sh" file. There is a template available "conf/spark-env.sh.template" you can use this template to create the spark-env.sh file. Several variable that can be added is mentioned in the template is self. we will add the following lines to the file.

export SCALA_HOME=/home/pulasthi/work/spark/scala-2.9.3
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/home/pulasthi/work/sparkdata

Here SPARK_WORKER_MEMORY specifies the amount of memory you want to allocate for a worker node if this value is not given the default value is the total memory available - 1G. Since we are running everything in our local machine we woundt want the slave the use up all our memory. I am running on a machine with 8GB of ram and since we are creating 2 slave node we will give each of the 2GB of ram.

The SPARK_WORKER_INSTANCES specified the number of instances here its given as 2 since we will only create 2 slave nodes.

The SPARK_WORKER_DIR will be the location that the run applications will run and which will include both logs and scratch space. Make sure that the directory can be written to by the application that is permission are set properly.

After we have these configurations ready we are good to go. now lets start by running the master node.
Just execute the launch script for the master that is "start-master.sh"


./bin/start-master.sh

Once the master is started you should be able to access the web ui at http://localhost:8080.

Now you can proceed to start the slaves. This can be done by running the "start-slaves.sh" launch script.

Note: In order to start slaves the master need to be able to access the slave machines through ssh. since we are running on the same machine that is your machine should be accessible through ssh. make sure you have ssh installed run "which sshd". if you don't have it installed install it with the following command.

 sudo apt-get install openssh-server

You will also need to specify an password for the root since this will be requested when running the slaves. If you do not have a root password set use the following command to set an password.

sudo passwd


With the slaves successfully started now you have a Spark cluster up and running. If everything went according to plan the web-ui for the master should show the two slave nodes.


Now lets connect to the cluster from the interactive shell by executing the following command

MASTER=spark://IP:PORT ./spark-shell
You can find the IP and the PORT in the top left corner of the web ui for the master. When successfully connected the web ui will show that there is an active task.

Hope to write more posts regarding Spark in the future. if you want to learn a bit more about Spark there is some great documentations on the Spark site is self here. Go an check it out.

Tuesday, September 24, 2013

WSO2 Governance Registry - Lifecycle Management Part 2 - Transition Validators

This is the second post of "WSO2 Governance Registry - Lifecycle Management" post series. In the first post - Part 1 - Check Items we gave a small introduction to lifecycle management in WSO2 Governance Registry and looked at how check items can be used and did a small sample on that. 

In this post we will look at Transition Validators as mentioned in the previous post. As mentioned in part 1 transition validations can be used within check items and it can also be used separately ( All the validators will be called only during a state transition, checking a check item will not call the validator ). we will take a look at the same config this time with two transition validation elements.

<aspect name="SimpleLifeCycle" class="org.wso2.carbon.governance.registry.extensions.aspects.DefaultLifeCycle">
    <configuration type="literal">
        <lifecycle>
            <scxml xmlns="http://www.w3.org/2005/07/scxml"
                   version="1.0"
                   initialstate="Development">
                <state id="Development">
                    <datamodel>
                        <data name="checkItems">
                            <item name="Code Completed" forEvent="Promote">
                               <permissions>
                                    <permission roles="wso2.eng,admin"/>
                                </permissions>
                                <validations>
                                  <validation forEvent="" class="">
                                          <parameter name="" value=""/>
                                 </validation>
                                </validations>
                            </item>
                            <item name="WSDL, Schema Created" forEvent="">
                            </item>
                            <item name="QoS Created" forEvent="">
                            </item>
                        </data>
                         <data name="transitionValidation">
                            <validation forEvent="" class="">
                                <parameter name="" value=""/>
                            </validation>
                        </data>
                    </datamodel>
                    <transition event="Promote" target="Tested"/>                  
                </state>
                <state id="Tested">
                    <datamodel>
                        <data name="checkItems">
                            <item name="Effective Inspection Completed" forEvent="">
                            </item>
                            <item name="Test Cases Passed" forEvent="">
                            </item>
                            <item name="Smoke Test Passed" forEvent="">
                            </item>
                        </data>
                    </datamodel>
                    <transition event="Promote" target="Production"/>
                    <transition event="Demote" target="Development"/>
                </state>
                <state id="Production">  
                    <transition event="Demote" target="Tested"/>
                </state>                
            </scxml>
        </lifecycle>
    </configuration>
</aspect>

The first transition validation is within an check item ( this part was commented out in the previous post). And the second one is as a separate element, both are supported.

Writing Validators

A validator is java class that implements  the "CustomValidations" interface there are several validators that are already implemented and it is also possible to write you own custom validator and add it. we will be looking at one of the validators that is shipped with the product. a custom validator will need to written similarly . Please refer Adding an Extension documentation to see how a new extension can be added into the Governance Registry through the GUI.

The following is a validator that is shipped with the WSO2 Governance Registry.

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.wso2.carbon.governance.api.common.dataobjects.GovernanceArtifact;
import org.wso2.carbon.governance.api.exception.GovernanceException;
import org.wso2.carbon.governance.api.util.GovernanceUtils;
import org.wso2.carbon.governance.registry.extensions.interfaces.CustomValidations;
import org.wso2.carbon.registry.core.RegistryConstants;
import org.wso2.carbon.registry.core.exceptions.RegistryException;
import org.wso2.carbon.registry.core.jdbc.handlers.RequestContext;
import org.wso2.carbon.registry.core.session.UserRegistry;

import java.util.Map;

public class AttributeExistenceValidator implements CustomValidations {

    private static final Log log = LogFactory.getLog(AttributeExistenceValidator.class);
    private String[] attributes = new String[0];

    public void init(Map parameterMap) {
        if (parameterMap != null) {
            String temp = (String) parameterMap.get("attributes");
            if (temp != null) {
                attributes = temp.split(",");
            }
        }
    }

    public boolean validate(RequestContext context) {
        if (attributes.length == 0) {
            return true;
        }
        String resourcePath = context.getResourcePath().getPath();
        int index = resourcePath.indexOf(RegistryConstants.GOVERNANCE_REGISTRY_BASE_PATH);
        if (index < 0) {
            log.warn("Unable to use Validator For Resource Path: " + resourcePath);
            return false;
        }
        index += RegistryConstants.GOVERNANCE_REGISTRY_BASE_PATH.length();
        if (resourcePath.length() <= index) {
            log.warn("Unable to use Validator For Resource Path: " + resourcePath);
            return false;
        }
        resourcePath = resourcePath.substring(index);
        try {
            UserRegistry registry = ((UserRegistry) context.getSystemRegistry())
                    .getChrootedRegistry(RegistryConstants.GOVERNANCE_REGISTRY_BASE_PATH);
            GovernanceArtifact governanceArtifact =
                    GovernanceUtils.retrieveGovernanceArtifactByPath(registry, resourcePath);
            for (String attribute : attributes) {
                if (!validateAttribute(governanceArtifact, attribute)) {
                    return false;
                }
            }
        } catch (RegistryException e) {
            log.error("Unable to obtain registry instance", e);
        }
        return true;
    }

    protected boolean validateAttribute(GovernanceArtifact governanceArtifact, String attribute)
            throws GovernanceException {
        return (governanceArtifact.getAttribute(attribute) != null);
    }
}

The "init" method is were the parameter that are defined under the validator tag is initialized. The "validate" method is were your validation logic goes this is the method that is called to do the validation. What this validator does is check whether the attributes names given as a parameter actually exist in the given aspect. If the attribute does not exist the validation will fail.

Configuring the Validator


<validation forEvent="Promote" class="org.wso2.carbon.governance.registry.extensions.validators.AttributeExistenceValidator">
    <parameter name="attributes" value="overview_version,overview_description"/>
</validation>

The fully qualified class name needs to be provided as the class name, the "forEvent" attribute specifies the action on which the validation needs to be triggered here it is set to Promote. For a complete list of validators that are available please refer to Supported Standard Validators documentation. Now you can add the validator configuration we commented out in the First Post and check out the functionality of validators.

Please leave a comment you need any more clarification. The next post of this series will cover transition permissions.

Wednesday, June 26, 2013

How to revert all the local changes in SVN

When working with SVN now and then you will need to revert changes that was done by you or by applying a patch. Recently i wanted to revert all the changes done locally and found a nice command that can get this done :). Thanks goes to the original poster here.

 svn st -q | awk '{print $2;}' | xargs svn revert 

Hope this helps someone who is looking for the same functionality.

Monday, June 24, 2013

WSO2 Governance Registry - Lifecycle Management Part 1 - Check Items

The Lifecycle Management(LCM) plays a major role in SOA Governance. The default LCM supported by the WSO2 Governance Registry allows users to promote and demote life cycle states of a given resource. Furthermore, it can be configured to use checklists as well check out the documentation here.

The Lifecycle configuration templates allows advance users to extend its functionality through 6 data elements which are listed below
  • check items
  • transition validations
  • transition permissions
  • transition executions
  • transition UI
  • transition scripts
Bellow is the full template of the lifecycle configuration. in this article series we will take a look at each item and see how they can be used to customize lifecycle management in WSO2 Governance Registry. In this article we will look at check items. 

check items

Check items allow you to define a list, ideally an check list that can be used to control changes in lifecycle states and make sure specific requirements are met before the lifecycle is changed to the next state. It is also possible to
  • Define permissions for each check item
  • Define custom validations for each check item
To check this out we will create a sample lifecycle with a new set of check items. First we have to create a new  lifecycle. The steps to create a new lifecycle can be found here - Adding Lifecycles. There will be a default lifecycle configuration when you create one using the steps since it is a complex configuration we will replace it with the following configuration. 

<aspect name="SimpleLifeCycle" class="org.wso2.carbon.governance.registry.extensions.aspects.DefaultLifeCycle">
    <configuration type="literal">
        <lifecycle>
            <scxml xmlns="http://www.w3.org/2005/07/scxml"
                   version="1.0"
                   initialstate="Development">
                <state id="Development">
                    <datamodel>
                        <data name="checkItems">
                            <item name="Code Completed" forEvent="Promote">
                               <permissions>
                                    <permission roles="wso2.eng,admin"/>
                                </permissions>
                                <!--<validations>
                                 <validation forEvent="" class="">
                                         <parameter name="" value=""/>
                                 </validation>
                                </validations>-->
                            </item>
                            <item name="WSDL, Schema Created" forEvent="">
                            </item>
                            <item name="QoS Created" forEvent="">
                            </item>
                        </data>
                        
                    </datamodel>
                    <transition event="Promote" target="Tested"/>                  
                </state>
                <state id="Tested">
                    <datamodel>
                        <data name="checkItems">
                            <item name="Effective Inspection Completed" forEvent="">
                            </item>
                            <item name="Test Cases Passed" forEvent="">
                            </item>
                            <item name="Smoke Test Passed" forEvent="">
                            </item>
                        </data>
                    </datamodel>
                    <transition event="Promote" target="Production"/>
                    <transition event="Demote" target="Development"/>
                </state>
                <state id="Production">  
                    <transition event="Demote" target="Tested"/>
                </state>                
            </scxml>
        </lifecycle>
    </configuration>
</aspect>

As you can see several check items are listed below the "Development" and "Tested" states, the two main attributes in the check list data item is name and forEvent. 

name - The name of the check item, this is the text that will be displayed for the check item.
forEvent - The event that is associated with this check item, for example if the forEvent is set to "Promote" this check item must be clicked in order to proceed with the promote operation for that state.

Custom permissions

As you can see in the "Development" state there is a sub element as follows
 
<permissions>
     <permission roles="eng,admin"/>
</permissions>

In this element it is possible to define a set of roles that are allowed to check this check item. in this sample only engineers and admins are allowed to check this item

Custom validations
 
<validations>
     <validation forEvent="" class="">
          <parameter name="" value=""/>
     </validation>
</validations>>

As seen in the commented out section under the "Code Completed" check item it is also possible to define custom validations. But the validations will only be called when during a state transition. we will look into custom validations under "transition validations" in the next post. 

Now you can save the newly created lifecycle configuration and use it in an artifact like an "api" or "service" and see its functionality.

We will look at Transition Validations and how to use them in the next post of this series.

Wednesday, June 5, 2013

How to define XSD to allow any XML element as child element

While working on a new feature for WSO2 GREG i came across a requirement  to allow any arbitrary XML element to be passed as a child element and i needed to define this in an XSD file. After a bit of searching and some help i was able to find the correct XSD structure for this and thought i might be helpful to share it

<xs:element maxoccurs="unbounded" minoccurs="0" name="parameter">
    <xs:complextype>
 <xs:sequence>
     <xs:any maxoccurs="unbounded" minoccurs="0" processcontents="skip">
 </xs:any></xs:sequence>
        <xs:attribute name="name" type="xs:string" use="optional">
        <xs:attribute name="value" type="xs:string" use="optional">
    </xs:attribute></xs:attribute></xs:complextype>
</xs:element>

The given XSD will allow an XML such as the example given below

<parameter name="payload">
 <p:adderprocessrequest xmlns:p="http://wso2.org/bps/sample">
  <!--Exactly 1 occurrence -->
  <x xmlns="http://wso2.org/bps/sample">x</x>
  <!--Exactly 1 occurrence -->
  <y xmlns="http://wso2.org/bps/sample">y</y>
 </p:adderprocessrequest>
</parameter>
Hope this is helpful to anyone who needs to to configure an XSD for a similar situation

pulasthi

Amazon Deals