Data Warehouse as a Service
When we think about Data warehouses, it’s always about expensive dedicated hardware along with huge software licensing fees. You have to pay upfront for both the hardware and software along with the costs associated with setting up and installing them. This would require you to have DBA and networking teams in place to ensure smooth deployment and continuous maintenance.
Small enterprises cannot afford data warehouses, and loose the competitive edge vis a vis larger organizations.
For Larger Organizations, the challenges are different, while the average growth in enterprise data is at 50% year on year, data warehousing is not growing at the same pace. This results in a lot of data being left out of the Data Warehousing and Business Intelligence process.
Enter Amazon Redshift.
Amazon Redshift is Data Warehousing on Cloud by Amazon Web Services. It is a fully managed, petabyte scale data warehouse.
Amazon Redshift turns the Data Warehousing economics upside down. The best thing about Amazon Redshift is that you can provision it within minutes, doing away with the routine heavy lifting of setting up hardware and installing software to start using a data warehouse.
With Redshift, you do away with all the upfront investments required for hardware or software. It is a pay as you go service and is priced to analyze all your data. It is extremely fast and it is cheaper than most options available in the market today.
Key features of Amazon Redshift
Redshift reduces I/O Operations
Redshift provides columnar data storage. With Columnar data storage, all values for a particular column are stored contiguously on the disk in sequential blocks.
Columnar data storage helps reduce the I/O requests made to the disk compared to a traditional row based data storage. It also reduces the amount of data loaded from the disk improving the processing speed, as more memory is available for query executions.
As similar data is stored sequentially, Redshift compresses the data rather efficiently. Compression of data further reduces the amount of I/O required for queries.
Redshift is implemented using a Massively parallel processing architecture
Amazon Redshift has a Massively Parallel Processing Architecture. MPP enables Redshift to distribute and parallelize queries across multiple nodes. Apart from queries, the MPP architecture also enables parallel operations for data loads, backups and restores.
Redshift architecture is inherently parallel; there is no additional tuning or overheads for distribution of loads for the end users.
Redshift has security built in
Amazon provides various security features for Redshift just like all other AWS services. Access Control can be maintained at the account level using IAM roles. For data base level access control, you can define Redshift database groups and users and restrict access to specific database and tables.
Redshift can be launched in Amazon VPC. You can define VPC security groups to restrict inbound access to your clusters.
Redshift allows Data Encryption for all data which is stored in the cluster as well as SSL encryption for data in transit.
Redshift Node Types
Redshift provides a choice of 2 Node types, an extra-large node (XL) with 2TB of attached storage or an eight extra-large (8XL) with 16TB of attached storage.
High Storage Extra Large (XL) DW Node:
- CPU: 2 virtual cores – Intel Xeon E5
- ECU: 4.4
- Memory: 15 GiB
- Storage: 3 HDD with 2TB of local attached storage
High Storage Eight Extra Large (8XL) DW Node:
- CPU: 16 virtual cores – Intel Xeon E5
- ECU: 35
- Memory: 120 GiB
- Storage: 24 HDD with 16TB of local attached storage
You can provision a Redshift Cluster with from a single Node to 100 Nodes configuration depending on the processing and storage capacity required.
Redshift nodes come in two sizes XL & 8XL. XL node comes with 2 TB attached storage and 8XL node comes with 16 TB attached storage. Clusters can have a maximum of 32 XL nodes (64 TB) or 100 8XL nodes (1.6 PB).
For details on Amazon Redshift Pricing, please visit: http://aws.amazon.com/redshift/pricing/
We had recently conducted a webinar on Redshift Specifically, you can watch the recording here.
If you are looking for guidance on implementation of Amazon Redshift, please send us an email at email@example.com.