Monday, September 10, 2012

BIRT, Cassandra and Hector

While BIRT offers many ways to connect to Cassandra, including using the Cassandra JDBC driver, this post focuses on using a Scripted data source to call the Hector Client Java client.  A BIRT scripted data source allows external Java classed to be called to retrieve data for a BIRT report and can be written in Java or JavaScript.  The examples below will use JavaScript.    For this post we used the DataStax community edition which is available here, and created a keyspace with the name users and a column family named User.  The User column family contains three string columns for first name, last name and age.  The script used to load the sample data is available in the example download.

Set Designer Classpath


The first thing that you will need to do is set the classpath for the designer to access the following set of jars. 
  • hector-core-version.jar
  • hector-object-mapper-version.jar
  • slf4j-api-version.jar
  • libthrift-version.jar
  • apache-cassandra-thrift-version.jar
  • guava-rversion.jar
  • commons-lang-version.jar


All of these jars, with the exception of the two Hector jars are available in the /install-directory/DataStax Community/apache-cassandra/lib directory.  To get the Hector jars you can download and build the hector source or just download them from a maven repository.

The Hector-object-mapper jar file can be downloaded from here.
The Hector core jar file can be downloaded from here.

One way to setup the classpath is to create a libs directory in your Report Project and then copy all of the jars above to this folder.
Next Select Window->Preferences.  Select the Report Design->Classpath preference and click on the Configure project specific settings link.
 
Select the BIRT Project that you will be using Hector with and click on ok.

Select the enable project specific settings checkbox and add the jars in the lib folder you created earlier.


Creating a Scripted Data Source using Hector

 
You can now create a report that calls the Hector APIs directly.  To do this first create a new report.  Select the data explorer view and right click on the data sources node and click on New Data Source.  Select the Scripted Data Source option and click on finish.
 

Next right click on the Data Sets node and choose the New Data Set option.  Make sure to select the Scripted Data Source that you just created as the data source for this data set. 

 

Click on the Next button and enter each column name and data type for the data set.


 
Click on the Finish button.  You can now enter script for the data set.  To do this first make sure the data set is selected in data explorer view and click on the script tab at the bottom of the report canvas.


In the script editor you will have many events that could be scripted, but in this example all we need is an open script and a fetch script.  First select open from the script drop down list and enter a script similar to the following.

importPackage(Packages.java.util);

importPackage(Packages.me.prettyprint.cassandra.serializers);

importPackage(Packages.me.prettyprint.cassandra.service);

importPackage(Packages.me.prettyprint.hector.api);

importPackage(Packages.me.prettyprint.hector.api.beans);

importPackage(Packages.me.prettyprint.hector.api.factory);

importPackage(Packages.me.prettyprint.hector.api.query);

 

var cluster = HFactory.getOrCreateCluster("Test Cluster",new CassandraHostConfigurator("localhost:9160"));

var keyspace = HFactory.createKeyspace("users", cluster);

var rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get())

.setColumnFamily("User").setRange(null, null, false, 10).setRowCount(100);            

var result = rangeSlicesQuery.execute();

myrows = result.get();          

rowsIterator = myrows.iterator();

Hector also supports using CQL so you could also use the following open script

importPackage(Packages.java.util);

importPackage(Packages.me.prettyprint.cassandra.serializers);

importPackage(Packages.me.prettyprint.cassandra.service);

importPackage(Packages.me.prettyprint.hector.api);

importPackage(Packages.me.prettyprint.hector.api.beans);

importPackage(Packages.me.prettyprint.hector.api.factory);

importPackage(Packages.me.prettyprint.hector.api.query);

importPackage(Packages.me.prettyprint.cassandra.model);

 

var cluster = HFactory.getOrCreateCluster("Test Cluster",new CassandraHostConfigurator("localhost:9160"));

var keyspace = HFactory.createKeyspace("users", cluster);

            

var cqlQuery = new CqlQuery(keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get());

cqlQuery.setQuery("select * from User");

var resultCQL = cqlQuery.execute();    

rowsIterator = resultCQL.get().iterator();

Next add a fetch script like the following.

if (rowsIterator.hasNext()) {

     var myrow = rowsIterator.next();

     var cols = myrow.getColumnSlice().getColumns();

     for( ii=0; ii < cols.size(); ii++ ){

       row[cols.get(ii).getName()] = cols.get(ii).getValue();

     }

        return true;

}else{

       return false;

}



In the above fetch the script assumes you have named your scripted data set columns the same as the columns in Cassandra.  You should now be able to preview the data set.  Double click on the data set in the data explorer view and select preview.

 
You can now use the data set within your report. 


Deploying a Report that Uses the Hector API


 
If you are using the BIRT Viewer and deploy a report that calls the Hector API, verify that all the jars discussed in the beginning of this Post (Set Designer Classpath) are placed in WEB-INF/lib directory of the Viewer.  If you are running BIRT reports using the BIRT APIs verify that the above jars are also in the classpath.

More information on CQL and Hector is available here.  The example in this post is available on Birt-Exchange.


7 comments:

Unknown said...

Good one. But i am facing a problem now. I can't able to fetch multiple column, current its possible to read only first column. Do you have any idea?

karishma said...

"Can you clarify the difference between using Hector APIs and CQL in this context? Which is more efficient for real-world applications?"

pulse jet bag filter in delhi
Rotary Air Locks in delhi

yashikawebdesigninghouse said...

"What are the key advantages of using a Scripted Data Source over a Cassandra JDBC driver in BIRT? It would be great to see a comparison."

Dust Collector Manufacturer India
Axial Flow Fans India

onlinepromotionhouse22@gmail.com said...

"Would it be possible to share a detailed explanation of the fetch script logic? Understanding its workflow could help beginners."

Dust Collector Manufacturer
Air Pollution Control System manufacturer

varshakush said...

"Could you elaborate on why you chose Hector Client over other clients for this tutorial? Are there specific benefits?"

Air Ventilation System Manufacturer
Dust collector manufacturer

shivaniwebdesigning said...

"How can we handle scenarios where the columns in Cassandra don’t match the scripted data set column names? A tip or example would be useful."

paint booth
checkered sheet dealer in delhi

abhay said...

"Is there a way to automate the addition of JAR files to the project classpath? It seems like a tedious task for large-scale projects."

structural-steel-tubes in gwalior
pulse jet bag filter manufacturer