kezhenxu94 / house-renting

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

docker-compose启动的容器scrapyd和crawler会立即退出

Vickey-Wu opened this issue · comments

Bug 描述 (Describe the bug)

清晰简短的描述你遇到的 Bug. (A clear and concise description of what the bug is.)
docker-compose启动的容器scrapyd和crawler会立即退出,lianjia在一段时间后也会退出,lianjia应该是爬去完毕退出

如何重现 (To Reproduce)

docker-compose up -d
docker logs -f lianjia
docker logs -f scrapyd
docker logs -f crawler

重现步骤 (Steps to reproduce the behavior):

root@ubuntu:/mnt/house-renting# docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                     PORTS                              NAMES
16fae3c93371        house-renting/crawler              "scrapy crawl 58"        2 hours ago         Up 2 hours                                                    58
473fa78fc6a6        house-renting/crawler              "scrapy crawl lianjia"   2 hours ago         Up 3 minutes                                                  lianjia
c1336d24f029        house-renting/crawler              "scrapy crawl douban"    2 hours ago         Up 2 hours                                                    douban
d81f4f5c9c5e        house-renting/scrapyd              "/bin/bash"              2 hours ago         Exited (0) 3 minutes ago                                      scrapyd
69660e516589        vickeywu/kibana-oss:6.3.2          "/docker-entrypoint.…"   2 hours ago         Up 2 hours                 0.0.0.0:5601->5601/tcp             kibana
d88e85587d63        house-renting/crawler              "/bin/bash"              2 hours ago         Exited (0) 3 minutes ago                                      crawler
8b1e03c93a95        redis                              "docker-entrypoint.s…"   2 hours ago         Up 2 hours                 0.0.0.0:6379->6379/tcp             redis
2be0615aab21        vickeywu/elasticsearch-oss:6.4.1   "/usr/local/bin/dock…"   2 hours ago         Up 2 hours                 0.0.0.0:9200->9200/tcp, 9300/tcp   elasticsearch

lianjia日志

2019-04-08 06:19:02 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-08 06:19:02 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7988,
 'downloader/request_count': 22,
 'downloader/request_method_count/GET': 22,
 'downloader/response_bytes': 404392,
 'downloader/response_count': 22,
 'downloader/response_status_count/200': 22,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 4, 8, 6, 19, 2, 77559),
 'item_dropped_count': 21,
 'item_dropped_reasons_count/DropItem': 21,
 'log_count/INFO': 33,
 'log_count/WARNING': 21,
 'memusage/max': 62763008,
 'memusage/startup': 56500224,
 'request_depth_max': 1,
 'response_received_count': 22,
 'scheduler/dequeued': 22,
 'scheduler/dequeued/memory': 22,
 'scheduler/enqueued': 22,
 'scheduler/enqueued/memory': 22,
 'start_time': datetime.datetime(2019, 4, 8, 6, 14, 47, 67989)}
2019-04-08 06:19:02 [scrapy.core.engine] INFO: Spider closed (finished)

scrapyd和crawler用docker logs -f scrapyddocker logs -f crawler无法看到日志

桌面环境 Desktop (please complete the following information)

  • 操作系统(OS): ubuntu16.04
    如果是通过 Docker 运行:
  • Docker: 18.06.1-ce
  • Docker-compose: 1.23.2, build 1110ad0

@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据

@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据

还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?

  • 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳"
crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/a58.py:    u'深圳',
crawler/house_renting/spider_settings/a58.py:    u'深圳': 'http://sz.58.com/chuzu/',
crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/lianjia.py:    u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳',
crawler/house_renting/spider_settings/lianjia.py:    u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
  • 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities"
# 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',)
cities = (u'深圳',)
available_cities = (
available_cities_map = {
  • kibana截图:
    image

@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据

还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?

  • 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳"
crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/a58.py:    u'深圳',
crawler/house_renting/spider_settings/a58.py:    u'深圳': 'http://sz.58.com/chuzu/',
crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/lianjia.py:    u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳',
crawler/house_renting/spider_settings/lianjia.py:    u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
  • 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities"
# 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',)
cities = (u'深圳',)
available_cities = (
available_cities_map = {
  • kibana截图:
    image

修改完有重新build docker镜像吗?

@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据

还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?

  • 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳"
crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/a58.py:    u'深圳',
crawler/house_renting/spider_settings/a58.py:    u'深圳': 'http://sz.58.com/chuzu/',
crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/lianjia.py:    u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳',
crawler/house_renting/spider_settings/lianjia.py:    u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
  • 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities"
# 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',)
cities = (u'深圳',)
available_cities = (
available_cities_map = {
  • kibana截图:
    image

修改完有重新build docker镜像吗?

有的,用这个命令build的
docker-compose up --build -d