brandonmburroughs / RecipesScraper

Scrape various recipes sites for names, ingredients and tags

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RecipesScraper

This repo contains various spiders to scrape different recipe websites.

Spiders

Running a spider

Each spider takes in a corresponding seed list (available in sitemap/seed_lists) and either scrapes the recipe itself or follow recipes links on the section page.

To run a spider, use the following command:

nohup scrapy crawl <spider_name> &

where <spider name> is the name of the spider (allrecipes or jamieoliver for now). This will create a JSON file named <spider_name>_recipes.json in the output folder.

Since this will be a long running process, it's a good idea to run it as a background process in no-hangup mode.

nohup scrapy crawl <spider_name> &

Sample Output

Here's sample output for a Black-Bean Quesadilla recipe from Epicurious.

{
    "date_scraped": "2017-02-11 22:33:05.667233",
    "recipes": [
        {
            "url": "http://www.epicurious.com/recipes/food/views/black-bean-quesadillas-239962",
            "ingredients": [
                "1 (19-ounce) can black beans, rinsed and drained",
                "1 (8-ounce) bag mixed grated \"taco cheese\"",
                "1 1/4 cups chopped cilantro, divided",
                "1 cup chopped white onion, divided",
                "8 (10-inch) flour tortillas",
                "1 tablespoon vegetable oil, divided",
                "2 large tomatoes, quartered",
                "1 to 2 teaspoons hot sauce",
                "Equipment: a large (2-burner) ridged grill pan (preferably cast-iron)"
            ],
            "recipe_name": "Black-Bean Quesadillas",
            "description": "Have we found Tex-Mex heaven? This very easy vegetarian meal satisfies with the heft of beans and melted mixed cheese.",
            "tags": [
                "Bean",
                "Cheese",
                "Tomato",
                "Vegetarian",
                "Quick & Easy",
                "Gourmet",
                "Tex-Mex"
            ]
        }
    ]
}

About

Scrape various recipes sites for names, ingredients and tags


Languages

Language:Python 100.0%