hohoaisan / serverless-scrapy-less-than-one-min

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serverless Scrapy Project Less Than 1 Minute

Sample scrapy project intergrate with AWS Step Function to trigger all lambdas all at once, then save results to AWS S3 Bucket.

References:

With a couple of modifications for scrapy working with AWS Lambda

Prerequisites

  • Docker (Non-linux evironment, for building compatible python package for AWS Lambda environment)
  • Python 3.9
  • Pipenv
  • NodeJs 16
  • AWS cli + AWS profile
  • Serverless CLI 3.22

Local development & testing

Packages

Python packages is managed by Pipenv. Use Pipenv's pipenv install to install required packages and pipenv shell to start python development environment with those installed packages.

Scrapy

This repository is already a scrapy project. Any scrapy command can be used. For example scrape a spider that is already defined:

scrapy crawl quotes -o test.json

Lambda Functions

We can test the lambda function by invoke it locally:

serverless invoke local -f scrape_quotes

Deploy to AWS

Change the stage of the deployment in the serverless.yml file.

Deploy

With configued AWS CLI Profile, serverless deployment can be done by using

serverless deploy

Destroy

Defined serverless deployment can be remove by using

serverless remove

Or Delete corresponding stack on CloudFormation

All of buckets created needs to be empty before removing resources, we can remove again if there's errors

About

License:MIT License


Languages

Language:Python 100.0%