Amazon Kinesis – Service Guide

Amazon Kinesis

Amazon Kinesis is a managed, scalable, cloud-based service that allows real-time processing of streaming large amount of data per second. It is designed for real-time applications and allows developers to take in any amount of data from several sources, scaling up and down that can be run on EC2 instances.

Properties of Amazon Kinesis

  • Managed
  • Scalable
  • Allow real time video processing
  • Can handle large amount of data per second
  • Designed for real time applications.
  • Used to capture, store, and process data from large, distributed streams such as social media feeds
  • Easy to use
  • High throughput
  • Easily compatible with other AWS services
  • Cost-efficient – It is cost-efficient for workloads of any scale. Pay as we go for the resources used and pay hourly for the throughput required.
  • Usage
    • It is used in situations where we require rapidly moving data and its continuous processing.
      • Data log and data feed intake – We need not wait to batch up the data, we can push data to a Kinesis stream as soon as the data is produced. It also protects data loss in case of data producer fails.
      • Real-time graphs – We can extract graphs/metrics using Amazon Kinesis stream to create report results. We need not wait for data batches
      • Real-time data analytics – We can run real-time streaming data analytics.
  • Limitations
    • Records of a stream can be accessible up to 24 hours by default and can be extended up to 7 days by enabling extended data retention.
    • The maximum size of a data blob in one record is 1 MB
    • One shard supports up to 1000 PUT records per second

Best Practices

  • Managing Applications
    • Setup monitoring using AWS Cloudwatch for the following parameters
      • Input bytes and input records
      • Output bytes and output records
      • MillisBehindLatest
    • To avoid getting the ReadProvisionedThroughputException exception, limit the number of production applications reading from the same Kinesis data stream to two applications.
    • Limit the number of production applications reading from the same Amazon Kinesis Data Firehose delivery stream to one application
  • Scaling Applications
    • Use multiple streams and Kinesis Data Analytics for SQL applications if your application has scaling needs beyond 100 MB/second.
    • Use Amazon Kinesis Data Analytics for Java Applications if you want to use a single stream and application.
  • Defining Input Schema
    • Adequately test the inferred schema.
    • The Amazon Kinesis Data Analytics API does not support specifying the NOT NULL constraint on columns in the input configuration.
    • Relax data types inferred by the discovery process.
    • Use SQL functions in your application code to handle any unstructured data or columns.
    • Make sure that you completely handle streaming source data that contains nesting more than two levels deep.
  • Connecting to Outputs
    • Use the first destination to insert the results of your SQL queries.
    • Use the second destination to insert the entire error stream and send it to an S3 bucket.
  • Authoring Application Code
    • In your SQL statement, don’t specify a time-based window that is longer than one hour.
    • During development, keep the window size small in your SQL statements so that you can see the results faster.
    • Instead of a single complex SQL statement, consider breaking it into multiple statements
    • When you’re using tumbling windows, AWS recommends that you use two windows, one for processing time and one for your logical time.
  • Testing Applications
    • Test is important before deploying the changed schema or application code to production.

AWS certification examination & Practice Questions

The questions are collected from the Internet. The answers are based on my experience. Please apply your idea before you select the answers.

Your current log analysis application takes more than 4 hours to generate a report of the top 10 users of your web application. You have been asked to implement a system that can report this information in real time, ensure that the report is always up to date, and handle increases in the number of requests to your web application. Choose the option that is cost-effective and can fulfill the requirements.

A. Publish your data to CloudWatch Logs, and configure your application to Auto Scale to handle the load on demand.
B. Publish your log data to an Amazon S3 bucket. Use AWS CloudFormation to create an AutoScaling group to scale your post-processing application which is configured to pull down your log files stored in Amazon S3.
C. Post your log data to an Amazon Kinesis data stream, and subscribe your log processing application so that is configured to process your logging data.
D. Configure an Auto Scaling group to increase the size of your Amazon EMR cluster.

Your company has a large social network presence and as a result has access to large amounts of data via api feeds from those social networks. If you wanted to analyze and process them in real time what AWS service would best suit your needs?

A. AWS Kinesis
C. EC2 instance with a Message queue installed
D. There is no service for this scenario

Reading reference for AWS Certifications

Amazon Kinesis Data Analytics for SQL Applications

If you find this article useful, please feel free to share and give a like. Your comment is my inspiration. To read more about such articles, please click here.

Amazon Kinesis AMI Automation AWS AWS AppStream 2.0 AWS Backup aws certifications aws certifications catalog AWS EBS AWS Elastic Transcoder AWS IAM AWS Sagemaker aws services AWS SES aws support aws swf AWS WorkSpaces AWS X-Ray Azure Cost Management Best Practices chage command Linux Cloud Computing Cloud Migration Data Science DNS Edge Computing Fog Computing Interview Preparation Jenkins Kubernetes Linux Linux User Management Microsoft Azure OSI Model Python R sample questions Server Hardening Supercomputer WordPress