PacktPublishing / Data-Engineering-with-AWS

Data Engineering with AWS, Published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ERROR: Hands-on – triggering an AWS Lambda function when a new file arrives in an S3 bucket

jgrove90 opened this issue · comments

commented

Hello,

I'm getting the following error following chapter 3 hands-on directions

{
  "errorMessage": "'Records'",
  "errorType": "KeyError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 7, in lambda_handler\n    for record in event['Records']:\n"
  ]
}

Perhaps the api response changed? Doesn't seem it can find event['Records']

Hey @eagarg could you please help us resolve this?

I just tried and I don't have that error. Here the first lines of the function
`import boto3
import awswrangler as wr
from urllib.parse import unquote_plus

def lambda_handler(event, context):
# Get the source bucket and object name as passed to the Lambda function
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])`

But How much times does the function need? I set the limit to 5 minutes and still is not enough

In my testing, I found that the Lambda function takes between 14 - 18 seconds to complete, so 5 minutes should be way more than enough. Please would you paste the CloudWatch log for your run here so we can identify where it may be getting stuck.

Sure, Here the log. I set the limit of 5 minutes to the function. I'm new to aws so please tell me if you need something else or if you are looking for another type of log.

`
INIT_START Runtime Version: python:3.8.v31 Runtime Version ARN: arn:aws:lambda:eu-north-1::runtime:05a1944859083528568a8398e5aec22be94bdaefef94faafa061158bf4530956
START RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Version: $LATEST
key_list: ['testdb', 'csvparquet', 'test.csv']
Bucket: dave-landingzone
Key: testdb/csvparquet/test.csv
DB Name: testdb
Table Name: csvparquet
Input_Path: s3://dave-landingzone/testdb/csvparquet/test.csv
Output_Path: s3://dave-clean-zone/testdb/csvparquet

  • Database testdb already exists
    2023-10-31T19:18:52.625Z e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Task timed out after 302.01 seconds

END RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9
REPORT RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Duration: 302005.82 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 3928.52 ms
INIT_START Runtime Version: python:3.8.v31 Runtime Version ARN: arn:aws:lambda:eu-north-1::runtime:05a1944859083528568a8398e5aec22be94bdaefef94faafa061158bf4530956
START RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Version: $LATEST
key_list: ['testdb', 'csvparquet', 'test.csv']
Bucket: dave-landingzone
Key: testdb/csvparquet/test.csv
DB Name: testdb
Table Name: csvparquet
Input_Path: s3://dave-landingzone/testdb/csvparquet/test.csv
Output_Path: s3://dave-clean-zone/testdb/csvparquet

  • Database testdb already exists
    2023-10-31T19:24:52.898Z e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Task timed out after 302.08 seconds

END RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9
REPORT RequestId: e0f9fa74-808c-455d-9e86-f9cd60c4fbd9 Duration: 302081.81 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 128 MB
INIT_START Runtime Version: python:3.8.v31 Runtime Version ARN: arn:aws:lambda:eu-north-1::runtime:05a1944859083528568a8398e5aec22be94bdaefef94faafa061158bf4530956
`

It looks like it is timing out when trying to write the file to S3 using the AWS Data Wrangler library. I would suggest double-checking that you are using the specified version of the Data Wrangler library (https://github.com/awslabs/aws-data-wrangler/releases/download/2.10.0/awswrangler-layer-2.10.0-py3.8.zip) within your Lambda layer just to make sure there is no compatibility issue there.

Also, when you look at the your output path (in the clean-zone), are there any parquet files in that location?

Finally, make sure that the IAM policy you created was configured correctly to have access to both your landing zone and clean zone buckets.

Let me know if any of that works or if it still continues to timeout.

I deleted the layer and re-uploaded it and now it works. I don't get what went wrong before but probably I made a mistake somewhere in lambda configuration. Many thanks!!

Fantastic - thanks for confirming that it's working!