Hey Hey Hey


  • Home

  • Archives

  • Tags

  • Search

AWS - S3

Posted on 2018-04-09 |

S3 Overview

S3 is a webstore , not a file system !!

S3 – writing concurrency

  • “Eventually Consistency”: make sure concurrent write will eventually get synced
  • “Read after Write Consistency”: make sure read access do not need to wait until write consistency is archived.

S3 Event Notification

  • Configure when event is triggerred
  • Configure filter : File Prefix (Path and name) , Suffix
  • Notification can integrate with Lambda, SQS , SNS

LifeCycle Management

S3 --> S3 IA --> Glacier

  • S3 IA has same durability 11 9’s with S3 (Availability is lower as 2 9’s)
  • You can move object to IA by life cycle policy or direct upload with parameter to specify to save in IA

Cross region replication

Read more »

AWS - Handson NodeJs

Posted on 2018-04-09 |

027.mp4 , 028.mp4, 029.mp4 – Set up dev environments

  • create users and download their access credentials

  • create A group stands for developer, attach policy to group and add users into the group

  • create a role to stands for the EC2 instance

  • create security group — attach to VPC / define inbound and outbound rules (open http/https/ssh)

  • launch instance — select community AMI; enable public ip; attach the role created;protect against accidental termination; advanced details (put bash script) ; tag it; attach security group ; create and download keypair to access the instance

  • install 2 useful plugins for atom: remote-edit git-plus ; edit EC2 server file ,save refresh the page

Read more »

Python 101

Posted on 2018-03-18 |

Pythcharm Edu

Python不用分号换行;
处理string支持负数参数从末尾处理;数组也是!!!
带换行符号的一堆字符串可以用""“三个双引号”"“扩起来。
数组支持:表示till
Tuples类型: 元组
[]数组,()元组,{}字典
function的注释在function定义下一行,”"" “”"
init(self) 初始化函数,self 是必须
新的import方式:from python文件名 import function或者class名

  • No need to escape " inside ‘’
  • print(“Hello, %s! I am %d years old” % name % year)
Read more »

AWS - Immersion Day

Posted on 2018-03-14 |

Agenda

http://aws.johnhildebrandt.info/

Introduction to AWS & EC2 Overview

https://stackoverflow.com/questions/29575877/aws-efs-vs-ebs-vs-s3-differences-when-to-use

EC2 attach EBS and then the EBS snapshot will be saved into S3( mandatroy)-Snapshot freequency(snapshot rotation)
When EC2 restart — attach and re-attach the EBS
The EC2 security group can be stacked to form other security group
By default the EC2 don’t have public ip unless choose the optional service
SSH private key is not stored in AWS
Instance Metadata: used to retrieve the instance information (magic ip hosting http service to retrieve the current instance information)

9:30am - 10:15am EC2 Immersion Lab
10:15am - 10:30am Break

Networking in AWS

Security Groups :
VIF ?
DX location

11:15am - 12:00pm VPC Immersion Lab
12:00pm - 1:00pm Innovation at scale video

Storage on AWS

Read more »

AWS - EC2

Posted on 2018-02-28 |

Login EC2

  • for ubuntu it’s ubuntu@hostname instead of ec2-user@hostname

016.mp4 Elastic Compute Cloud

Region: same region same price, same latency , same regulation
Available Zone: same datacentor
Edge Location: Cloud Front

purchase mode,

  1. on-demand instances
  2. reserved instances
  3. spot instances (错峰用机)

Q: How do I select the right instance type?

Amazon EC2 instances are grouped into 5 families: General Purpose, Compute Optimized, Memory Optimized, Storage Optimized and Accelerated Computing instances.

  • General Purpose Instances have memory to CPU ratios suitable for most general purpose applications and come with fixed performance (M5, M4) or burstable performance (T2);
  • Compute Optimized instances (C5, C4) have proportionally more CPU resources than memory (RAM) and are well suited for scale out compute-intensive applications and High Performance Computing (HPC) workloads;
  • Memory Optimized Instances (X1e, X1, R4) offer larger memory sizes for memory-intensive applications, including database and memory caching applications;
  • Accelerating Computing instances (P3, P2, G3, F1) take advantage of the parallel processing capabilities of NVIDIA Tesla GPUs for high performance computing and machine/deep learning; GPU Graphics instances (G3) offer high-performance 3D graphics capabilities for applications using OpenGL and DirectX; F1 instances deliver Xilinx FPGA-based reconfigurable computing;
  • Storage Optimized Instances (H1, I3, D2) that provide very high, low latency, I/O capacity using SSD-based local instance storage for I/O-intensive applications, with D2 or H1, the dense-storage and HDD-storage instances, provide local high storage density and sequential I/O performance for data warehousing, Hadoop and other data-intensive applications.

019 hands-on connect to EC2 widnows instance

Read more »

InfoQ Readings

Posted on 2018-02-07 |

Migrating Batch ETL to Stream Processing: A Netflix Case Study with Kafka and Flink
https://www.infoq.com/articles/netflix-migrating-stream-processing

Read more »

AWS - IAM

Posted on 2018-02-07 |

IAM Overview

IAM: Identity and Access Management

010.mp4 overview

011.mp4

  • Understand the difference between AWS IAM and customer IAM
  • Understand the difference between aws account and aws iam users
  • IAM is a service.
  • IAM control access by policies which is organized by “statement”, it include : resource (like a table); action(like access database); effect (like allow)
  • security compliance: Payment Card Industry (PCI) Data Security Standard (DSS)
  • Auditing : using CloudTrail
  • Credential Report: downloaded excel (the report is generated every 4 hours, so there’s delay)

User

如何表示user:

  • arn: amazon resource name, 格式,
    arn:aws:iam::[accountIDNum]:user/Bill
    • loads of examples:

https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html

  • id --如果用户是命令行创建,则可以拿到用户id
  • 普通的unique的user name
  • never give root access
Read more »

AWS - Overview

Posted on 2018-02-05 |

Terminology

  • 16 Regions : 不同的Region价格不同,部署的服务不同
  • 42 Availability Zone : zone之间的故障是完全隔离的
  • 50 Edge Locations: 缓存;加速

Services

FAAS (function as a service; serverless service)
问题:那些服务是serverless的,哪些不是? 给定一个场景,需要哪些服务的组合?
https://aws.amazon.com/serverless/
AWS serveless service: lambda, dynamodb, api gateway , S3, AWS Step Functions,SNS,SQS, Kinesis, Athena (interactive query against big data), tools and services (city9)

Compute services

  • EC2
  • ECS (docker)
  • Elastic BeanStalk: 自动部署环境。
  • ELB (balancer)
  • Autoscaling
  • Lambda

Storage Services

  • S3
  • Glacier
  • EBS (Elastic Block Storage): attach to EC2 instances
    EBS to EC2 , n:1
  • EFS (Elastic File Storage) to EC2, n:n
  • Storage Gateway: 用来连接和同步s3 bucket with objects with 企业私有的数据中心。
  • Snowball Device:比storage gateway快。

S3在VPC(virtual private cloud)之外。VPC需要创建Endpoint 去连S3 bucket with objects(存储服务的映射),S3 bucket再连Glacier (通过Glacier Vault), 从而定义archive的规则。

database

Read more »

AWS - Handson Static Website

Posted on 2018-02-05 |

set up a bulletproof website with AWS

  • demo-005 part2: use Route53 service to buy a domain
  • demo-006 part3:
    • create S3 bucket, upload the files; host static website (and visit using raw s3 url);
    • create another bucket with www naming and redirect the request to naked domain (domain.com).

Tips, set the correct MIME type for uploaded files
https://developer.mozzila.org/en-US
search for “complete list of MIME type”

  • demo-007 part4: use “Certificate Manager” service to create Certificate (give the domain as domain.com and *.domain.com )
  • demo-008 part5: create distribution using cloudfront service to help with security(D-DOS),performance,fail over,attach certs
    • default TTL : default is 24 hours, means refresh from S3 every 24 hours
    • alternative domain names: the domain name purchased
    • SSL certificate: custom SSL certificate (created in previous demo)
    • default root object: like index.html
    • how to manually trigger a refresh: create a invalidation, and using “*” to specify invalidate everything.
  • demo-009 part6: go back to “Route53” and configure the “Hosted Zones”
    • Create a “A-IPV4” typed “record set”, set the url to domain.com, and alias target to the cloudfront endpoint.
    • Create another “CName” typed “record set”, url to www.domain.com, and non-alias pointing to naked domain which will be routed to cloudfront endpoint.

background www and naked domain names

https://www.sitepoint.com/domain-www-or-no-www/
In short words, www is the prefix to indicate the url is hosted on internet (in olden days). Now it’s not so necessary, if you skip the www prefix, then your host name is called “Naked”. Anyway, people might choose to be compatible to both www and naked domain.

Read more »

Machine Learning - Week 11

Posted on 2018-02-04 |

photo OCR (Optical Character Recognition)

Pipeline

解决复杂问题的思路。

以photo OCR为例, pipeline为:

图片–》识别文字区域–》文字分离–》单个文字识别

sliding window的使用: 例如一个图片找行人,已知一个算法可以识别一个20*100的方块里有没有人形,我们就可以使用从小到大不同尺寸的比例方块,每次移动一个sliding window的距离,切下来一个方块,调整到算法要求的比例尺寸,进行判断。

类似思路用在OCR上:

step1, 切方块,找可能的字符区域
step2, “expansion”算法,把字符区域放大,找出text rigion,并根据比例特征划掉干扰区域。
step3, 挑选出来的区域变成透明,其它区域全部遮住,对选出来的区域进行识别。
step4, 1D sliding window找出单个字符
step5, 识别单个字符

如何得到大量训练数据

以OCR为例,
real data,从真实图像中切出来的字母块
另外一个重要来源是: synthetic data: 人工合成

Read more »
1…101112…18
Rachel Rui Liu

Rachel Rui Liu

178 posts
193 tags
RSS
GitHub Linkedin
© 2021 Rachel Rui Liu
Powered by Hexo
Theme - NexT.Pisces
0%