scrapy / scrapyd

A service daemon to run Scrapy spiders

Home Page:https://scrapyd.readthedocs.io/en/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Feed exporter not working

MathiasIconB opened this issue · comments

I am having problems with getting the azure feed exporter plugin to work with scrapyd (https://github.com/scrapy-plugins/scrapy-feedexporter-azure-storage).

When I run this locally with scrapy it dumps the items directly to the blob storage as expected, however it does not do this when running it through a scrapyd cluster.

I am pretty sure I have the scrapy-feedexporter-azure-storage dependencies installed on the scrapyd cluster.

Here are the relevant parts of the settings.py file;

# settings.py
FEED_STORAGES = {'azure': 'scrapy_azure_exporter.AzureFeedStorage'}
FEED_EXPORT_BATCH_ITEM_COUNT = 20
AZURE_CONNECTION_STRING = 'XYZ'
FEEDS = {
    "azure://xyz.blob.core.windows.net/test/%(batch_time)s_%(batch_id)d.json": {
        "format": "json"
        }
    }

When running this code on the scrapyd cluster I get the following error in the logs, which does not appear when running it locally:

2023-10-26 12:29:31 [scrapy.extensions.feedexport] ERROR: %(batch_time)s or %(batch_id)d must be in the feed URI (file:///var/lib/scrapyd/items/myspider/myitems/49cf024e73fb11ee8586c68804063c80.jl) if FEED_EXPORT_BATCH_ITEM_COUNT setting or FEEDS.batch_item_count is specified and greater than 0. For more info see: https://docs.scrapy.org/en/latest/topics/feed-exports.html#feed-export-batch-item-count
2023-10-26 12:29:32 [scrapy.middleware] INFO: Enabled extensions:

Any help on this would be massively appreciated!

Please check whether you are setting items_dir in your configuration file. If you are, the FEEDS is getting overwritten.

https://scrapyd.readthedocs.io/en/stable/config.html#items-dir

Amazing! That fixed it. Many thanks!