AWS CloudSearch

Post author: Rachel Rui Liu
Post link: <a href="https://racheliurui.github.io/2018/07/02/markdown/AWS/AWS2018/Extra_AWS_CloudSearch/" title="AWS - CloudSearch">https://racheliurui.github.io/2018/07/02/markdown/AWS/AWS2018/Extra_AWS_CloudSearch/
Copyright Notice: All articles in this blog are licensed under <a href="https://creativecommons.org/licenses/by-nc-sa/3.0/" rel="external nofollow" target="_blank">CC BY-NC-SA 3.0 unless stating additionally.

Overview

Key difference between db query and search engine:
DB query will give you exact result, search engine only gives you best result.

Create and config a domain ( cli )
Create batches ( go to data store, change data format to Solr supported type )
- Use maximum batches (5M bytes) – use max sized batches
- conver data: remove bad charactors
integrate with IAM (who can connect to which domain)
integrate with cloudtrail

tip: increase instance type for load-in
- Test against different type of data (tweets vs web data)
- Options will have effect on index size
- muti-threads to upload data ( test the limites to avoid 500 error)
set multi partition
Pre-warm for traffic spike

Cache , add elastic cache before cloud engine
Consider Muti tenant
- Option1,
  - feed all data into same domain
  - use a field with customer id
- Option2,
  - multi domain
- how to choose:
  - if each customer has very different config, then use multi domain
  - if each customer has similiar config, but we have a lot of different customers, then set up a lot of domain is not cost effective.
Mine user behavior to improve result ( user search result log into EMR , analysis result feed back in as search parameters to help imporve search result)
- help with document boosting; augmentation; synonym creation

AWS CloudSearch DeepDive and Best Practices
https://youtu.be/OeHaj1a66I4