Kinesis Deepdive

Post author: Rachel Rui Liu
Post link: <a href="https://racheliurui.github.io/2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/" title="AWS - Kinesis">https://racheliurui.github.io/2018/08/03/markdown/AWS/AWS2018/023a_Kinesis/
Copyright Notice: All articles in this blog are licensed under <a href="https://creativecommons.org/licenses/by-nc-sa/3.0/" rel="external nofollow" target="_blank">CC BY-NC-SA 3.0 unless stating additionally.

Kinesis:

Small , fast moving data, being captured quickly , then being consumed concurrently by multi different consumers for different analytics Purpose.

best practises

Avoid hot shard
- use random partition key
- use high cardinality key
- use business key : per billing customer or per device id or per stock symbol

1	java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=myStream -Dscaling-action=scaleUp -Dcount=10 -Dregion=eu-west-1

Amazon JDK
- one worker maps to one shard
- libary to feed data into S3, DynamoDB , Redshift, Elastic Search.
- feeding data following below pipeline,
  - ITransformer: transform the data read from Kinesis
  - IFilter: filter only data interested
  - IBuffer: batching the data before sending out (for example to S3 or Redshift, better buffer to MB level before sending out)
- connector to redshift will put data into S3 first and buffer it then send to redshift
application consuming the data better has the capability to scale automatically
use Matric to detect why the consumer is slow
- GetRecord.Latency
build flush-to-S3 consumer to capture original data (by number; by byte ;by time)

https://youtu.be/8u9wIC1xNt8