Hey Hey Hey


  • Home

  • Archives

  • Tags

  • Search

AWS - AWS EMR best practise

Posted on 2018-05-06 |

Introduction

Moving data to AWS --> Data Collection --> Data Aggregation --> Data Processing --> Cost and Performance Optimizations

1. Moving data to AWS

Means moving bulk of the existing data to AWS

  • Moving Direction
    • Local Storage <–> AWS S3
      • Local HDFS --> S3
        • using S3DistCp: an extension of DistCp with optimizatios to AWS
        • using DistCp
      • Local Filesystem --> AWS S3
        • opensourced tools support multi-threading: Jets3t / GNU Parallel
        • Aspera Direct-to-S3: a file transfer protocol based on UDP and with optimizations to AWS
        • Device based import/export
        • AWS Direct Connect
          • One-time direct connection: once bulk data trasferred, stop the direct connection
          • On going direct connection: always connected
      • S3 --> Local HDFS
        • using S3DistCP or DistCP
    • AWS S3 --> AWS EMR
    • AWS S3 --> HDFS
  • With good optimization: several Terabytes a day

2. Data Collection

Means streaming data to AWS

  • Apache Flume : collected data can be sent to S3, HDFS and more
  • Fluentd: collected data can be sent to S3, SQS , MongoDB, Redis and more

3. Data Aggregation

Means, aggregated collected data at proper size before sending to target storage (S3, HDFS, EMR).

Read more »

AWS - Summarzie

Posted on 2018-05-04 |

Service Saling Summary

Service Name Sacling/Failover capability Comments
AutoScaling AZ --Yes ; Region – No horizental scaling EC2 in same autoscaling group
Elastic Cache Multi AZ failover;
Route53 Global Service
CloudFront Global Service
VPC Span Multi AZ; Region – No
RDB Muti AZ; Multi Region (read replica)

Calcuation

EBS --> GP2 typed SSD --> flexible IOPS

Read more »

AWS - Storage Gateway best practise

Posted on 2018-04-26 |

Introduction

使用storage gateway帮助实现Hibrid Architecture

  • 使用gateway实现 NFS到Amazon S3的转换:翻译NAS协议到S3到API call
  • 云端的AWS EMR或者Athena可以直接访问备份到S3的数据

Architecture

  • Storage Gateway是virtual applicance
  • 支持NFS v3.0 or 4.1
  • local storage is used to provide read/write cache
  • "Bucket Share": 一个share代表一个S3到NFS的mount point映射 (s3到bucket share是一对一的关系)
  • 一个gateway至多支持10个bucket share

File to object mapping

  • 通常unix文件的读写权限(owner,group, permission)和时间戳会映射到s3d的object的metadata中

Read/Write操作和本地cache

  • LRU(least recently used)算法用来evict data
  • Read Operation (read through cache), 先读cache,没有则访问网络
  • Write Operation (write-back cache), 先写cache(parallel writes到local),然后异步写变化的部分回网络

NFS Security in LAN

Read more »

AWS - Data Security

Posted on 2018-04-23 |

Encryption and Key Management in AWS

Encryption and key management in AWS --2015
https://youtu.be/uhXalpNzPU4

Encryption Primer

  • Encrypt the data using Symetric key and store the encrypted data
  • Use Master Key to encrypt Symetric Key and store the encrypted key
  • Use Master master key to encrypt master key, and store the encrypted master key
  • …, these keys are called Key Hierarchy, and store in HSM
    • Reduce the Blast Radius about losting a single key

Client Side Encryption

  • customer encrypt the data and manage key
  • For client side encrypted data targeting to store in S3, you can use AWS SDK to simplify the approach
    • Using AWS SDK, your master key will be on premise, but symetric key and encrypted data will saved in S3

Server Side Encryption

  • AWS encrypt the data and manage key
    • upload raw data via TLS to AWS (S3, Glacier, EBS, Redshift, RDS etc), then enable encryption
      • Use AWS key, AWS will generate dynamic unique key for each object, then manage the key using aws s3 internal service
      • Use customer key, AWS will encrypt the data using customer’s key and throw away the key after encryption
        • when request the encrypted data, you need to provide the key , aws decrypt the data and return it back
  • For S3/EBS/RDS/Redshift server side encryption, you have 2 options,
    • use S3/EBS/RDS/Redshift service master key (who ever have access to bucket will be able to decrypt the data)
    • use AWS KMS service, then you can specify which master key you want to use when encrypt the data

Key management Options

  • self manage
  • AWS Key Management Service
    • Use API to generate the key, encrypted and plain text key will be returned
      • Plaintext is used to encrypt the data, encrypted key is stored locally
      • when decryption needed, client need to submit the locally stored encrypted key
      • Master key is alwasy stored in KMS.
      • Benefit: KMS have more fine-grained access control, so encrypted data can only be decrypted by user who have access to the key.
      • Better auditing
      • Plain text never exist in any persist storage; AWS Service operate team is fully separated with KMS team; Multi-Party control
  • AWS Partner solutions
    • Browse AWS marketplace for security
  • AWS CloudHSM – HSM is hardware Security module
    • A box used to store the keys
    • Only the user have access to the module
    • You can use offical cloudformation to provision it
    • support oracle/sql server( run on EC2 ) encrytion with HSM
    • support EBS storage encryption
    • support redshift (the only one )
  • KMS vs HSM
    • KMS underlyingly using HSM platform but not dedicated HSM
    • HSM is useful to comply government standards
Read more »

AWS - Direct Connect

Posted on 2018-04-23 |

104.mp4 105.mp4 – AWS direct connect

  • Dedicated network connection between on-primises network and AWS
    • 1 gigbit or 10 gigbit fiber
    • 8.2.1q VLANS
Read more »

AWS - Architecture Design

Posted on 2018-04-22 |

100.mp4 101.mp4 – Architecture Design

Architecture Design

  • Security

  • Reliability

  • Performance

  • Cost Optimization

    • Costs
    • Suboptimal Resources
  • Operational Excellence

    • Max Business value
    • Continious Improvement
  • Production Scale Testing

  • Data-Driven Architecture

  • CHAOS MONKEY

  • Forensic Clean ???

  • WAF : web application firewall

  • Penetration Testing : need to inform aws

Design Principles

  • Mechanical Sympathy
    • https://github.com/jjfidalgo/mechanical-sympathy
  • Storage : select from Block , File , Object

Cost Optimization

  • Analysis attribute expenditure

  • AWS Trusted Adviser : feature

  • Runbook (how to run daily operations) and playbook (how to handle specific situation)

aws.amazon.com/architecture

Read more »

AWS - CloudTrail

Posted on 2018-04-22 |

Reference

https://youtu.be/vtMCjyE5nms

AWS CloudTrail OFF IR (Incident Response) Runbook

When someone want to turn off the CloudTrail, it will automatically being turned on and automated report and reminder being generated.

Automation Steps,

  1. Turn CloudTrail backon
    • using python/lambda to handle the turn off event and turn it back on.
  2. Gather data related to “TURN OFF” incident
  3. Extract principal, date, time, source IP from event data
  4. Map principal who assumed the role
  5. Lookup human contact info
  6. Contact human provide guidance
  7. Generate event summary for report

questionair before implementation

  1. What’s my expressed security objective in words
  2. Is it configuration or behavior related ?
  3. What data, where could help inform me ?
  4. Do I have requisite ownership or visibility ?
  5. What are my performance requirements ?
  6. What mechanisms support the above ?
  7. What is my expressed security objective in code?
  8. Am I done?
  9. Does a human need to look at this? When?

Demo - S3:PutBucketPolicy IR Runbook

When someone changed the S3 policy in a bad way, check the policy and restore it if needed.This runbook is making use of Stepfunction to implement it.

Read more »

AWS - Elastic Cache hands on

Posted on 2018-04-22 |

093.mp4 094.mp4 – Elastic Cache hands on

An important use scenario is to maintain application session state ( sesson replication)

  • Create Security Group for Redis service as RedisSG.
    • allow inbound 6379 port from webserverSG
    • allow outbound 6379 to everywhere
  • Cache subnet group
    • similiar like RDB subnet group.
  • Using wizard to create Elastic Cache cluster
    • option: enable replication ; enable multiAZ
    • instance type: cache.t2.micro
    • option: file location in S3 bucket
    • select cache subnet group using newly created; select security group using newly created;
    • optional : maintain window , SNS
  • review the result ( 1 node )
Read more »

AWS - Elastic Cache

Posted on 2018-04-22 |

Elastic Cache

  • Managed in-memory cache service
  • key value stores
  • sub-mili sec latency to data
  • Redis / Memcached data store options
  • Multi-AZ capability
  • increase application throughput : 20M reads /sec ,4.8M write /sec
  • Scaling DB layer is much more expensive compared to scaling caching layer

Compare the two options

  • depending on project language & framework support
  • Redis feature is superset of Memcached

Memcached Vs Redis

  • Store json
    • in Memcached , use serialized string
    • in Redis , use hash

Memcached (mem cache d) store Option

  • Free and opensource
  • Object max size 1MB
  • Total max size 7 TiB
  • No persistence ; easy to adding node

Redis Store option

  • Free and open Source

  • Object max size 512M

  • Total max size 3.5 TiB

  • persistence; read replica

  • Support Notification from Redis Pub/Sub channel

  • Support more data structures including : bitmaps, hyperlogs,GeoSpacial command; geo indexes with radius queries ; and also those supported by Memcached

  • Support auto sorting of data

  • Support HA and Failover

    • Failover is automatic, will chose the read replica with lowest latency and will switch the dns automatically
  • API provided to query all read replica endpoints

  • Sharding : 16384 sharding slots (automatic client sharding; developer must use Redis Cluster Client )

  • Standard Redis use case

  • Leaderboard

  • Counters ; like & dislike

Read more »

AWS - High Available and Fault Tolerant Architecture

Posted on 2018-04-19 |

085.mp4 — HA and Fault Torerant Architecture hands on overview

086.mp4 – focusing on VPC

Advanced VPC Architecture

  • The Advanced VPC Architecture can use CloudFormer to duplicate into different regions
    • Route53 (Global) handle cross Region requests to IGW sitting in each region and the CloudFront Distribution
      • Route53 is Global service
    • CloudFront (Global) caching source linked to S3 bucket.
      • CloudFront is Global service
    • 2 VPC : sitting in different regions.
      • VPC can’t span region
    • 2 S3 : each region has 1 S3 bucket and using “Service Endpoint” linking to VPC in same rigion; one of the S3 bucket is used as the other S3’s replica.
      • S3 can’t span region
      • S3 can connect to VPC via “service endpoint”
      • S3 can have replica in another region
    • 2 ELB : each region has one ELB, receiving request from IWG and banlance the request to Instances sitting accross AZs.
      • ELB can span AZ and balancing request accross AZ
      • ELB can’t span Region
    • 4 Availability Zones : each region has 2 AZ. EC2 instances and Aurora DB services are splitted into 2 Regions and 2 AZs in each region.
    • 2 AutoScaling Group : each region has 1 auto scaling group to contain the EC2 clusters. Each AutoScaling Group is spanning 2 AZs.
    • 4 Public Subnet: each region has 1 VPC and each VPC has 2 public subnet sitting in different AZs.
      • subnet can’t span AZs. (Subnet has property to specify its AZ id)
      • each of the public subset contains partial of the EC2 cluster ( which belongs to the AutoScaling Group)
    • 2 NAT Service : each region has 1 NAT service sits in one of the public subnet to provide NAT service to instances sitting in Private subnet.
    • 4 Private Subnet : Each region has 2 private subnet sitting in different AZs. These subnet used to contain data services.
      • subnet can’t span AZs.
    • 5 Aurora service : each AZ has at least one Aurora service.
      • The main Aurora and standby Aurora sits in one Region but splitted into different AZ.
      • One AZ has the main Aurora, and all the other AZ has the read replica.
      • Aurora support cross Region Read Replica
        • https://aws.amazon.com/blogs/aws/new-cross-region-read-replicas-for-amazon-aurora/
    • 2 DB Subnet Group : each region has one

hands on by creating one VPC in one region that meet the Advanced VPC archi design

  • Use wizard to create the VPC and then review the config
    • choose the wizard to create VPC with 1 pubic subnet and 1 private subnet
    • select CIDR for each subnet and select same AZ for both subnet
    • Specify NAT instance type and key pair
    • Specify S3 Service Endpoint access level (none / public only / private only / both; full access / custom )
  • understanding the Route Table being used / created in Public and Private Subnet
    • public subnet的路由表,0.0.0.0/0指向IWG; Private subnet,0.0.0.0/0指向NAT的ENI
    • 一条s3的请求指向特定的VPC(service endpoint)
    • 一条本地VPC内部的局部路由
    • route table is explicitly associated to public subnet; route table is implicitly associated to private subnet
  • Check默认的ACL :allow inbound/outbound everything
  • Check the NAT service being created via wizard
    • Virtulization is using paravirtual ( new EC2 instances are more using HVM )
    • NAT security group by defaut is allow all
  • To fix above issues,
    • Change the network interface to “not being deleted after termination”
    • terminate the current NAT instance and check the network interface’s status become “available”
    • create a new VPC security group to be used by NAT instance
      • allow inbound http(s) from private subnet
      • allow inbound ssh from current client ip
      • allow outbound http(s) to anywhere
      • allow all inbound traffic from current security group (!!! ???)
    • change the network interface’s security group to use the new security group we just created.
      • VPC security group is binding with instance (attach to network interface is the same to attach to instance)
    • review the VPC security group vs ACL (access control list)

087.mp4 – going on with re-create the NAT instance

  • when creating the new NAT instance, search from community AMI with key word “NAT HVM” to select the existing Image
  • select the existing public subnet to contain this new NAT instance
  • Disable “Assign public IP” because we will attach existing network interface to it
  • Network Interfaces section: attach existing one to this new NAT instance
  • Select the newly created Security Group , the NAT security group
  • Review and launch the instance

088.mp4 – In the same region, create subnet in another AZ and create ACL for all subnets

  • Create the new private subnet sitting in Same VPC but different AZ. (design the size accordingly)

  • Create the new public subnet sitting in Same VPC but different AZ. (design the size accordingly)

  • Review the route table being created for both new Subnets

    • the route table attached by default with public VPC is wrong , it’s the route table being used by private subnet; change it to the other one that routing internet traffic to iwg.
  • Create new ACL called “Public NACL” which sits inside existing VPC.

    • ACL rule has an number, it will be applied using Number sequence
    • allow inbound http(s) from internet,inbound ssh from client
    • 1024-65535 (by ELB health check) from internet
      • https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html
    • allow outbound http(s) to internet, outbound ssh to all private subnets, outbound 1024-65535 to internet, port 3306(mysql ) to all private subnets
    • associate newly created ACL to 2 public subnets
  • Create new ACL called “private NACL” which sits inside existing VPC

    • Allow inbound MySQL (3306) from both public subnets ; allow inbound ssh from both public subnets; inbound 32768-61000 from internet (NAT)
    • Allow outbound http(s) to internet; allow outbound 32768-61000 (mysql response) to public subnets;
    • Associate newly created ACL to 2 private subnets
  • the ACL inbound and outbound rule will have a default deny rule at the end.

  • If ping doesn’t work, check the ICMP protocol at security group and ACL level

  • If ssh doesn’t work, check ACL outbound protocol allow 32768-61000 to internet: which means the ssh respond to the ssh client.

Read more »
1…789…18
Rachel Rui Liu

Rachel Rui Liu

178 posts
193 tags
RSS
GitHub Linkedin
© 2021 Rachel Rui Liu
Powered by Hexo
Theme - NexT.Pisces
0%