Blog P

//----

EMR Project use :

EMR - USE for spark job

S3 - storage and data lake

GLUE & Athena - SQL Engine

CI/CD - github action

data set :

project :

1) emr spark job creating the matrix and writing output in s3.

2) we use the glue for traction the job and run query in atena then moving to redshift.

3) s3 used for data-bucket/csv file as well as code-bucket/.py code.

4) Step function creating the spark-emr-cluster and runing job and then terminating spark-emr-cluster.

5) used ci/cd github action

emr job

Data's Blog