Submit Search
Upload
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
•
0 likes
•
21 views
S
saidbilgen
Follow
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Read less
Read more
Design
Report
Share
Report
Share
1 of 37
Download now
Download to read offline
Recommended
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
Amazon Web Services
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Amazon Web Services
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Amazon Web Services
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Amazon Web Services
Recommended
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
Amazon Web Services
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Amazon Web Services
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Replicate and Manage Data Using Managed Databases and Serverless Technologies
Amazon Web Services
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Amazon Web Services
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Amazon Web Services
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
Implementing a Data Lake
Implementing a Data Lake
Amazon Web Services
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Amazon Web Services
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Amazon Web Services
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Amazon Web Services
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Amazon Web Services
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Amazon Web Services
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
kbdhl05e
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
More Related Content
Similar to Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Amazon Web Services
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Amazon Web Services
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
Implementing a Data Lake
Implementing a Data Lake
Amazon Web Services
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Amazon Web Services
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Amazon Web Services
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Amazon Web Services
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Amazon Web Services
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Adir Sharabi
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Amazon Web Services
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Amazon Web Services
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
Similar to Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
(20)
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Implementing a Data Lake
Implementing a Data Lake
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
SRV327 Replicate, Analyze, and Visualize Data Using Managed Database and Ser...
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Data Warehouses and Data Lakes
Data Warehouses and Data Lakes
Recently uploaded
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
kbdhl05e
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
Call Girls in Okhla Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Okhla Delhi 💯Call Us 🔝8264348440🔝
soniya singh
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
jennyeacort
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025
Rndexperts
Call Us ✡️97111⇛47426⇛Call In girls Vasant Vihar༒(Delhi)
Call Us ✡️97111⇛47426⇛Call In girls Vasant Vihar༒(Delhi)
jennyeacort
Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi
Design principles on typography in design
Design principles on typography in design
nooreen17
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
z xss
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
Amil baba
Call Girls in Pratap Nagar, 9953056974 Escort Service
Call Girls in Pratap Nagar, 9953056974 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
lvtagr7
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
abhishekparmar618
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
zdzoqco
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdf
vaibhavkanaujia
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
yuu sss
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
nhjeo1gg
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
dollysharma2066
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
mapanig881
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
Recently uploaded
(20)
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
2024新版美国旧金山州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
Call Girls in Okhla Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Okhla Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Call Girls in Ashok Nagar Delhi ✡️9711147426✡️ Escorts Service
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025
Call Us ✡️97111⇛47426⇛Call In girls Vasant Vihar༒(Delhi)
Call Us ✡️97111⇛47426⇛Call In girls Vasant Vihar༒(Delhi)
Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.
Design principles on typography in design
Design principles on typography in design
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
Call Girls in Pratap Nagar, 9953056974 Escort Service
Call Girls in Pratap Nagar, 9953056974 Escort Service
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
定制(RMIT毕业证书)澳洲墨尔本皇家理工大学毕业证成绩单原版一比一
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdf
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
1.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Building Your Data Lake on AWS Luke Anderson Business Development, AWS
2.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1. Defining the Data Lake 2. Reducing Costs 3. Increasing Performance 4. Planning for the Future
3.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Rethink how to become a data-driven business • Business outcomes • Experimentation • Agile and timely
4.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Traditionally, Analytics looked like this (Duplication & Sprawl) Hadoop Spark NoSQL Storage Arrays Databases Data Warehouse Structured Data SQL Raw Data ETL Advanced Analytics ETL
5.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Defining the AWS data lake Data lake is an architecture with a virtually limitless centralized storage platform capable of categorization, processing, analysis, and consumption of heterogeneous data sets Key data lake attributes • Decoupled storage and compute • Rapid ingest and transformation • Secure multi-tenancy • Query in place • Schema on read
6.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Data Lake Components Any analytic workload, any scale, at the lowest possible cost Insights Analytics Data Lake Data Movement QuickSight SageMaker Glue (ETL & Data Catalog) S3/Glacier (Storage) Redshift +Spectrum EMR Athena Elasticsearch service Kinesis Data Analytics Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams Real-time Comprehend DW Big data processing Interactive
7.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrations Most ways to bring data in Reasons to choose Amazon S3 for data lake
8.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Reducing Data Lake Costs
9.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Optimize costs with data tiering Hot Cold Amazon S3 standard Amazon S3— infrequent access Amazon Glacier HDFS Use EMR/Hadoop with local HDFS for hottest data sets Store cooler data in S3 and cold in Glacier to reduce costs Use S3 Analytics to optimize tiering strategy S3 Analytics
10.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Process data in place… Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue Amazon S3
11.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR: Decouple compute & storage Highly distributed processing frameworks such as Hadoop/Spark Compress datasets Columnar file formats Aggregate small files S3distcp “group-by” clause
12.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift Spectrum: Exabyte Scale query-in-place Structured data w/ joins Multiple on-demand clusters-scale concurrency Columnar file formats Data partitioning Better query performance with predicate pushdown
13.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena: Query without ETL Serverless service Schema on read Compress datasets Columnar file formats Optimize file sizes Optimize querying (Presto backend) Query Data in Glacier (Coming)
14.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
15.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to…. entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3 Amazon Glacier
16.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Select Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
17.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Motivation Behind S3 Select GET all the data from S3 objects, and my application will filter the data that I need Redshift Spectrum Example: • Beta customer: Run 50,000 queries • Amount of data fetched from S3: 6 PBs • Amount of data used in Redshift: 650 TB Data needed from S3: 10%
18.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select SELECT a filtered set of data from within an object using standard SQL Statements • First content aware API within Amazon S3 • Unlike Amazon Athena and Spectrum, operates within the Amazon S3 system • SQL Statement operates on a per-object basis—not across a group of objects • Works and scales like GET requests • Accessible via SDK (Java, Python), AWS CLI and Presto Connector—others to follow • Who will use it? • Amazon Redshift Spectrum, Amazon Athena, Presto and other custom Query engines • Everyone doing log mining
19.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
20.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
21.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select Lambda Trigger
22.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Before 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. Amazon S3 Select: Serverless MapReduce After 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. 2X Faster at 1/5 of the cost
23.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Demo – S3 Select Timing
24.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
25.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Before Amazon S3 Select: Accelerating big data After After 5X Faster with 1/40 of the CPU
26.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Using Amazon Glacier Select
27.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. How Amazon Glacier Select Works
28.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Delivering Results Faster
29.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Optimizing data lake performance Aggregate small files EMR: S3distcp Amazon Kinesis Firehose S3 Select Big data cheaper, faster Up to 400% faster Data Formats Columnar formats EMRFS consistent view Amazon S3 Amazon DynamoDB
30.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis—Real Time Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics SQL New
31.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
32.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue—Serverless Data catalog & ETL service Data Catalog ETL Job authoring Discover data and extract schema Auto-generates customizable ETL code in Python and Spark Automatically discovers data and stores schema Data searchable, and available for ETL Generates customizable code Schedules and runs your ETL jobs Serverless
33.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker (GA) The quickest and easiest way to get ML models from idea to production End-to-End Machine Learning Platform Zero setup Flexible Model Training Pay by the second $
34.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Planning for the Future
35.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Transactional Data Stream Data Collect Store Analyze Visualize A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda Amazon Elastic MapReduce Amazon ElastiCache Search SQL NoSQL Cache Stream Processing Batch Interactive Logging Stream Storage IoT Applications File Storage Analysis & Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight File Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Evolve As Needed!
36.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Training Offer Make your data driven decisions count, and make a career in Big Data on AWS. Follow the Big Data Specialty learning path and become a specialist in Big Data: • Implement core AWS Big Data services according to best practices • Design and maintain Big Data • Leverage tools to automate data analysis Certified Cloud Practitioner Associate-level Certification AWS Certified Big Data - Specialty • Enterprise solutions architects • Data scientists • Big Data solutions architects • Data analysts Who should attend Free AWS digital training: Foundational knowledge Big Data on AWS – 3-day Classroom Training Free AWS digital training: Big Data Technology Fundamentals Visit www.aws.training to find out more.
37.
© 2018 Amazon
Web Services, Inc. or its Affiliates. All rights reserved. We hope you found it interesting! A kind reminder to complete the survey. Let us know what you thought of today’s event and how we can improve the event experience for you in the future. Thank You For Attending AWS Data Driven Decisions Webinar Series. aws-apac-marketing@amazon.com twitter.com/AWSCloud facebook.com/AmazonWebServices youtube.com/user/AmazonWebServices slideshare.net/AmazonWebServices twitch.tv/aws
Download now