Friday, December 07, 2012

BIRT and Google BigQuery



BIRT offers many ways data can be retrieved, including extending JDBC drivers, scripted data sources and implementing the Open Data Access (ODA) framework extensions.  Each of these methods has pros and cons associated with them. Generally the most re-usable method is to implement the ODA extension which allows adding a personalized GUI to the BIRT designer.  Many examples of this approach are available on Birt-Exchange.  The quickest method is to implement a scripted data source, which involves writing some Java or JavaScript to retrieve values.  This post illustrates retrieving a Google BigQuery using the scripted data source approach.   


For this report we will use the births by year query described here.   


The query is defined as "SELECT year, SUM(record_weight) as births FROM publicdata:samples.natality GROUP BY year"

The first thing we will need to do is to create a Java class that will call BigQuery.  The class for this example is shown below.  More details on the BigQuery Java API are available here.



package birt.big.query.sample;
import java.io.File;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.List;

import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.model.QueryRequest;
import com.google.api.services.bigquery.model.QueryResponse;
import com.google.api.services.bigquery.model.TableCell;
import com.google.api.services.bigquery.model.TableRow;


public class GetBigQueryData {

        private static final String SCOPE = "https://www.googleapis.com/auth/bigquery";
        private static final HttpTransport TRANSPORT = new NetHttpTransport();
        private static final JsonFactory JSON_FACTORY = new JacksonFactory();
        private Bigquery bigquery;
        private static String QUERY = "SELECT year, SUM(record_weight) as births FROM publicdata:samples.natality GROUP BY year";

        public void setupConnection() throws GeneralSecurityException, IOException{

               GoogleCredential credential = new GoogleCredential.Builder().setTransport(TRANSPORT)
                              .setJsonFactory(JSON_FACTORY)
                               .setServiceAccountId("yournumber@developer.gserviceaccount.com")
                              .setServiceAccountScopes(SCOPE)
                              .setServiceAccountPrivateKeyFromP12File(
                                             new File("locationofyourkey.p12")).build();

               bigquery = new Bigquery.Builder(TRANSPORT, JSON_FACTORY, credential)
               .setApplicationName("BigQuery-Service-Accounts/0.1")
               .setHttpRequestInitializer(credential).build();
        }
        public List executeQuery( String query) throws IOException{


               QueryRequest queryInfo = new QueryRequest().setQuery(query);

               Bigquery.Jobs.Query queryRequest = bigquery.jobs().query("yourprojectid", queryInfo);
               QueryResponse queryResponse = queryRequest.execute();
               return queryResponse.getRows();
        }

        public static void main(String[] args) throws IOException, GeneralSecurityException {
               GetBigQueryData bq = new GetBigQueryData();
               bq.setupConnection();
               List rows = bq.executeQuery(QUERY);

               if( rows != null ){
                       for (TableRow row : rows) {
                             for (TableCell field: row.getF()) {
                              System.out.printf("%s--", field.getV());
                             }
                       }
               }

        }
}
 
This is a simple class based on the Google BigQuery examples.  Note that this project was configured to use a service account.  If you plan to use the above code make sure to do the same for your project and enter the correct service account id and private key.  You will also need to specify your project id when creating the query request.  This example has two methods that will be called from BIRT setupConnection and executeQuery.  The setupConnection method just builds our GoogleCredentials object and the executeQuery method executes the passed in query string and returns the query response. 

To make this class available to BIRT you can either add it to BIRT as a jar or create a Java project in the same workspace as the BIRT Report project.  For this post, a Java project was created (Named BirtGoogleBigQueryExample).  Using the BIRT Classpath preference, the Java project and the Google jars where added to the report project (Named BirtGoogleBigQueryReport).



Select Configure Project Specific Settings and click on the BIRT report project (BirtGoogleBigQueryReport in this example).
 
Next click on the Add Projects button and add the Java project (BirtGoogleBigQueryExample in this example).  Finally select the Add JARs button and add the Google BigQuery API jars to the Report Project classpath.

You can now create a Scripted Data Source in BIRT that calls this class.  First create a new report and in the data explorer view, right click on Data Sources and select New Data Source.


Select Scripted Data Source and Click Finish.  Next right click on the Data Sets node and select New Data Set.

Select the scripted data source you created and click on the next button.  Enter two columns, one for year and one for births.  Set their type to integer.
For this example only two events need to be implemented.  Both are on the scripted data set.  An open and a fetch event must be implemented.  To do this select the scripted data set in the data explorer view and click on the script tab in the report editor.  Enter the following script for the open event.

importPackage(Packages.birt.big.query.sample);
importPackage(Packages.com.google.api.services.bigquery.model);
importPackage(Packages.java.util);

var bq = new GetBigQueryData();
bq.setupConnection();
var mrows = bq.executeQuery("SELECT year, SUM(record_weight) as births FROM publicdata:samples.natality GROUP BY year");
iter = mrows.iterator();

This script imports the required packages and calls the setupConnection method from the class we created earlier.  The query is then executed by calling our executeQuery method.  An iterator is then created to use in the fetch method.  Select the fetch event and enter the following script.

if( iter.hasNext()){
 tr = iter.next();
 myrow = tr.getF();

 row["Year"] = parseFloat(myrow.get(0).getV());
 row["Births"] = parseInt(myrow.get(1).getV());

 return true;
}else{
 return false;
}

With BIRT scripted data sets the fetch method is continuously called until a false is returned.  In this example each of the BigQuery rows is iterated and the column values are set.  You can now use the data from the query in your report.  


When deploying the report make sure to have the BigQuery jars and the custom class in the classpath of your runtime.  In most cases you will just copy the jars to the WEB-INF/lib of the viewer.
 
The example Java Project and Report Project are available at Birt-Exchange.


 

9 comments:

Jason Weathersby said...

You may also want to look at the Starschema JDBC driver for BigQuery Located here:
http://code.google.com/p/starschema-bigquery-jdbc/

kajal said...

This guide is super helpful! The detailed explanation on integrating BigQuery with BIRT is exactly what I was looking for. Thanks for sharing the Java class example—it’s very clear!
Franchise India Expo
Franchise India Expo Mumbai

sheetal said...

I appreciate the step-by-step instructions for setting up the scripted data source in BIRT. It’s often hard to find such comprehensive guides online.
Franchise India Expo Bangaluru
Trough Screw Conveyor feeder delhi

abhay said...

The combination of Java and BIRT makes data handling so flexible. Your approach to using the Google BigQuery API is well-explained and easy to follow. Great work!
Silo Top Dust Filter in delhi
vibro sifter delhi

aman said...

Using scripted data sources in BIRT was always confusing for me. This blog clears up a lot of doubts, especially with the BigQuery integration example. Thanks for this!
Warehouse Storage rack manufacturer
mezzanine floor manufacturer

Bhanu said...

The way you explained the pros and cons of different methods to retrieve data in BIRT is commendable. I’ll definitely explore the ODA extension method as well.
mobile compactor Manufacturer
fifo flow rack

gautam said...

Excellent tutorial! I managed to replicate your steps for a different BigQuery dataset, and it worked seamlessly. Keep up the great work.
heavy duty rack manufacturer delhi
Multi tier rack manufacturer delhi

karishma said...

Could you elaborate more on handling errors during the BigQuery API calls? That would make this blog even more complete for beginners like me.
slotted Angle rack manufacturer delhii
pallet rack in delhi

Naina said...

Thank you for breaking down the open and fetch event scripting! This was the missing piece I needed to create dynamic reports in BIRT.
Dust collector manufacturer
Franchise for sale