kezhenxu94 / house-renting

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

elasticsearch 莫名 exit

g10guang opened this issue · comments

CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS                     PORTS                              NAMES
a2a636a67a74        elasticsearch           "/docker-entrypoint.…"   4 hours ago         Exited (1) 2 seconds ago                                      house-renting_elastic_run_1
9bece9b62acb        redis                   "docker-entrypoint.s…"   4 hours ago         Up 4 hours                 6379/tcp                           house-renting_redis_run_1
91a9dd06893b        kibana                  "/docker-entrypoint.…"   4 hours ago         Up 4 hours                 5601/tcp                           house-renting_kibana_run_1
65d5c5167e77        house-renting_lianjia   "scrapy crawl lianjia"   4 hours ago         Up 4 hours                                                    house-renting_lianjia_run_1

我尝试过很多次,elasticsearch 还是会自动退出。

docker logs [elasticsearch_container_id]

输出:

[2018-05-30T10:12:50,594][INFO ][o.e.n.Node               ] [] initializing ...
[2018-05-30T10:12:50,633][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data/elasticsearch]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:123) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.6.9.jar:5.6.9]
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data/elasticsearch]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
	at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:261) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.node.Node.<init>(Node.java:265) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.node.Node.<init>(Node.java:245) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:233) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:233) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:342) ~[elasticsearch-5.6.9.jar:5.6.9]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:132) ~[elasticsearch-5.6.9.jar:5.6.9]
	... 6 more

  • 检查一下是不是docker mount 上去的目录在docker 容器中是不是可写的

  • 检查一下是不是已经有 ES 在运行(docker 之外,即你本机)

麻烦把操作系统信息和docker信息贴一下?

Sorry firstly.

原来是电脑的内存不足,我在本地运行,发现你这一套系统运行需要大量的内存。

还想放到一个 2G 内存的云服务器跑一晚看结果

我觉得如果 redis 只是用来去重完全可以使用布隆算法去重。

elasticsearch 运行似乎也需要大量的内存。

最好裁剪一下,或者某些部件设计为可插拔启动的,比如用于展示的 kibana。

@g10guang 好主意,毕竟占用大量内存不是很让人接受,后面我会把一些不是必须的东西做成可选的。

Redis 用来去重指的是在你重启爬虫的时候,之前扒取过的不会再下载,涉及到两次重启之间的去重,因此需要一些持久化的缓存,你可以在 settings.py 文件中把 REDIS_HOST 配置去掉或设置为 None,再把 docker-compose.yml 中的 lianjia 等爬虫的 depends_on 去掉 Redis,就不会启动 Redis 了

@kezhenxu94 没用过 elasticsearch 和 kibana,elasticsearch 里面都有些什么字段呢?真的不知道 kibana 应该怎么用

Elastic Search (ES) 在这个项目中是用来存储和搜索的,主要是为了提供搜索功能,否则就用 MySQL 了, ES 中的字段是动态的,不用预先定义,具体可以看 items.py 里面定义的字段, 在存储时 ES 会添加新的字段, Kibana 的 Discovery 标签页面是提供了可以方便搜索的图形化界面,你也可以直接调用 ES 提供的 RESTful API 接口,Kibana 其他可视化标签可以用来做一些基本的分析,画一些分析图。

我抽空找一些比较好的 ES 和 Kibana 文章贴在这里给大家看看吧 ^ _ ^

@kezhenxu94 简单看了一下查询 API,我在访问该 URL: http://127.0.0.1:9200/_all/_search/?size=1000&pretty=1 得到的结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".kibana",
        "_type" : "config",
        "_id" : "5.6.9",
        "_score" : 1.0,
        "_source" : {
          "buildNum" : 15629,
          "defaultIndex" : "AWOw0PODc6ereKT4VEg3"
        }
      },
      {
        "_index" : ".kibana",
        "_type" : "index-pattern",
        "_id" : "AWOw0PODc6ereKT4VEg3",
        "_score" : 1.0,
        "_source" : {
          "title" : "*",
          "notExpandable" : true,
          "fields" : "[{\"name\":\"_id\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"_index\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":false},{\"name\":\"_score\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":false,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"_source\",\"type\":\"_source\",\"count\":0,\"scripted\":false,\"searchable\":false,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"_type\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":false},{\"name\":\"accessCount\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"accessDate\",\"type\":\"date\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"buildNum\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"columns\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"createDate\",\"type\":\"date\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"description\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"fieldFormatMap\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"fields\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"hits\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"intervalName\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"kibanaSavedObjectMeta.searchSourceJSON\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"notExpandable\",\"type\":\"boolean\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"optionsJSON\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"panelsJSON\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"refreshInterval.display\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"refreshInterval.pause\",\"type\":\"boolean\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"refreshInterval.section\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"refreshInterval.value\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"savedSearchId\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"sort\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"sourceFilters\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"timeFieldName\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timeFrom\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timeRestore\",\"type\":\"boolean\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timeTo\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_chart_height\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_columns\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_interval\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_other_interval\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_rows\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"timelion_sheet\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"title\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"uiStateJSON\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"url\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false},{\"name\":\"url.keyword\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"uuid\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"version\",\"type\":\"number\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":true,\"readFromDocValues\":true},{\"name\":\"visState\",\"type\":\"string\",\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false}]"
        }
      }
    ]
  }
}

PS:服务已经启动了将近一个小时,data/images/full 文件夹下也有不少图片,请问是 elasticsearch 出了问题?

➜  house-renting git:(master) ✗ du -sh data
112M	data

data 文件夹大小为 112M

似乎看不到 elasticsearch 中有什么数据

@g10guang 是一个 Bug, 刚刚修复了

@kezhenxu94 那是不是意味着我需要重跑?

@g10guang 是这个意思:flushed: