- By Stany Simon
We know what Kinesis exactly is, from our earlier discussions, but only theoretically! Let us carry out some hands-on training so that our picture about Kinesis is much more clear. So what we will be building today using Kinesis, is a very basic application that will track user activity on our dummy website, more like a Click Path analysis application, but at a very simple level. This application is just to give you all a better understanding of how to integrate Kinesis into your applications.
A Checklist of Requirements:
1.A valid AWS account with access to AWS Kinesis.
2.Secret key and access key for the account.
Ingredients for the application
1. A frontend, dummy website on which users will navigate, in JSP.
2. Backend application ( a Servlet ) which will act as the producer for the stream.
3. A kinesis stream with a single shard.
4. PHP for analytics and a little bit of jQuery for jazzy little graphs!
The Initial Design of the Kinesis Application:
So let’s start with the design of the application first –
As explained earlier an application using Kinesis will need three things:
1. A Producer
2. A Consumer
3. And a Kinesis Stream
Understanding the Kinesis Stream
So the first thing we need will be a Stream. A stream can be created using APIs or the AWS management console. In thisexample we will go with a Stream created using the Management Console. Follow the steps given in the AWS Kinesis Documentation. (http://docs.aws.amazon.com/kinesis/latest/dev/step-one-create-stream.html) .It’s pretty straight forward.
So here I have created a stream and named it “ClickStream” configured with a single Shard.
Now that the Stream is created we need an application that will put data into the stream, i.e. a Producer.
For this, we will create a dummy website with a little bit of jQuery to track the usage of the user on the website. This will be the source of data to the producer which in turn will be the source of data for the Kinesis Stream (‘ClickStream’ ).
As the user Clicks on the dummy website ( 18.104.22.168:8080 ) , details related to the user’s clicks will be captured and sent to the Producer, in this case, the “Producer” servlet will act as the producer.
Code for the dummy website along with the jQuery script for the tracking purpose is available “here”[ https://s3-us-west-2.amazonaws.com/stanytest/KinesisBlog/dummy_website.zip ].
I won’t be going into the details of the dummy website and its jQuery part, instead we will be concentrating on the code on the Producer’s part.
The Producer Class, Put Constructor & More…
First of all we will need the Kinesis SDK , you can download it from here (.https://s3-us-west-2.amazonaws.com/stanytest/KinesisSDK/AmazonKinesisSDK-preview.zip )
( Note: Import these jar files into the project )
Download the code for Producer from here [ https://s3-us-west-2.amazonaws.com/stanytest/KinesisBlog/KinesisBlog.zip ]
In the downloaded pack, there will be a folder named Kinesis Blog( Basically a NetBeans Project).
We have a package named “producer” within which reside two classes ,namely,
Producer Class : This class basically works as the endpoint to which all the data from our “dummy website” will be sent. After it “GET”s the data, the data is sent to the Put class.
The Producer class:
Put Class: Put class does the actual work of a Producer. So lets take a closer look at it.
Here in the constructor, we create a “AmazonKinesisClient” class object, with credentials provided as an object of the AWSCredentials class object. We then set the endpoint( Currently Kinesis is available only in the us-east ).
Some of the SDK classes and functions we will be using here include:
1. PutRecordRequest class
Here we first specify our stream name. In our case, “ClickStream”.
We put the data we received through the PUTs from the dummy website into the stream using the “putRecordRequest” function. As defined by AWS,
“The putRecordRequest ( http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/PutRecordRequest.html) operation puts a data record ( or the data blob) into an Amazon Kinesis stream from a producer. This operation must be called to send data from the producer into the Amazon Kinesis stream for real time ingestion and subsequent processing. The PutRecord operation requires the name of the stream that captures, stores,and transports the data; a partition key; and the data blob itself ”
We specify the above described parameters using:
1. setStreamName() function to set the stream name into which the data needs to be PUT.
2. setData: The blob of data that will be put into the stream. In our case, “fileContent” will be the blob of data
The data blob could be a segment from a log file, geographic/location data, website clickstream data, or any other data type[amazon documentation]. The maximum length of this data blob can be 256 bytes.
3. setPartitionKey():[link to the highway example]: Determines which shard in the stream the data record is assigned to. Here we will set the partition key as the users unique session-id
putRecord: This function is provided with the object of the PutRecordRequest class as a parameter . It places the records into the stream. putRecord returns the shard ID of where the data record was placed and the sequence number that was assigned to the data record.
PutRecordResult : Represents the output of a PutRecord operation. Using the functions of this class we can see the shard-id into which the record was placed , the partition key used and also the sequence number of the record.
If all goes well, we would be ready with a basic Producer application!! So now the flow of the application that we have completed till now:
Some Interesting Reads on Big Data:
1. Learn How Amazon Kinesis Solves the Big Data Puzzle
2. Manoeuvering Through the Big Data Highway (Shards) with Amazon Kinesis
3. Setting the Stage to Design a Kinesis Application on AWS Cloud – The High Level Architecture