“It is a capital mistake to theorize before one has data.” – Sherlock Holmes
Just a few days back I overheard someone saying, “You know the world no longer revolves around the sun but DATA”. Well, I couldn’t disagree. Today, the whole world revolves around data. But having data alone is of no use; it is like having a gun with no bullets; you have it but cant make use of it. And here steps in, “Big Data Analysis” – the driving force of today’s business world. As Rightly said by Geoffrey Moore,
“Without big data analytics, companies are blind and deaf, wandering out onto the web like a deer on a freeway.”
What the Industry leaders are thinking?
The ‘Big’ in “Big Data Analysis” implies the humongous size of data (upto Peta-bytes), that needs to be analyzed. However the beauty of big data does not lie in just collecting the data, but to make sense out of it also. A recent survey conducted by IBM across 900 businesses and IT executives from over 70 countries, following trends emerged:
- Are 166% more likely to make most decisions based on data.
- Are 2.2 times more likely to have a formal career path for analytics.
- Cite growth as the key source of value from analytics (75%).
- Measure the impact of analytics investments (80%).
- Have some form of shared analytics resources (85%).
This shows the significance of data in today’s dynamic business world and the impact of its proper analysis linking it directly to the success of a business. However, the processing of this data is done in terms of “Batch Processing” i.e. data generated over a time interval ranging from a few hours to weeks or even months are collectively processed. So if data that is “old” can make such a difference, imagine the impact of real time data analysis. Analyzing data as and when it arrives.
In the Dawn of this Big Data onslaught, Cloud service-giant Amazon Web Services has released its new service, “Kinesis” to tackle the Big Data Challenges.
So what Exactly is Amazon Kinesis ?
Well, we talked about big data, its significance & batch processing. So data was being analyzed in a few weeks or months, so would it make any difference if it could be processed in Real Time?
Well before you answer instinctively, Consider this. Your favorite mobile company launching a new model is promoting it on the Internet. The company will try hard to track the related Tweets & Status Updates and collate all these “responses” understanding the market’s response to its proposed product. Based on which it can make changes to the product or marketing strategy. In other words, it will be able to predict better whether the product will be a success or not. But sometimes it would prove futile if the company receives this data after the product has been launched or it may be even forced to extend the launch date waiting for the arrival of such data!
Yes you guessed it! With Kinesis, the company will be able to analyze the markets’ response on the very same day of its promotion launch! Imagine the difference it can make to the future of that product. So data analysis at “real time” will be, in the near future, the fine line between failure and success.
As described by Amazon, Kinesis is “a fully managed service for real-time processing of streaming data at massive scale”.
Nowadays the sources of data are majorly from various kinds of sensors,with no one particular source of data. Kinesis too doesn’t constrain you to any one source. Kinesis can ingest data from a myriad of sources ranging from cell phones to large servers or may be as described as in AWS re:INVENT 2013 “any device that is capable of making a “put” call can be a source of data for Kinesis”. Kinesis is more about quick computations & this quickly computed data can be later stored into other AWS Services like RedShift, DynamoDB, etc. for further analysis.
How does AWS Kinesis work ?
Imagine the way in which lumberjacks used to transport lumber – lumber would be pushed into a stream of fast flowing water reaching all the way down to the end of the stream, fetched from there to be transported to the factories for further processing. Similarly, AWS Kinesis can be considered as a continuously flowing stream. Cell phone devices, servers (similar to lumberjacks), who push the server logs, tweets, stock market data (similar to logs) into the stream & AWS Kinesis, implementing the Kinesis Client Library (we will be discussing this in our further blogs) will be at the end the stream ready to process the data as and when it comes.
As mentioned earlier, Kinesis is a managed service, therefore, the user does not have to think about how to process data simultaneously but concentrate more on the logic of processing this data. Plug-in your code for processing the data and let Kinesis handle the processing for you. A few scenarios in which Kinesis can be used:
- Server logs can be ingested at real time: Here dashboards can be created providing various vital statistics of the servers, at real-time, vital to many IT businesses.
- Click streams from websites: Customers can be provided with real time analytics of their websites
- Advertisement campaigns: Effectiveness can be known at real time.
- Stock market firms: can make use of the real time data coming in, helping them know the trends of the industry.
We will be publishing blogs that will talk about details on how Kinesis works, concepts like “Shards”, Kinesis Client library – their implementation, examples on Kinesis applications, etc. Subscribe to Our Blog & Stay Tuned to Our Kinesis Blog Series & More!