rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Home Page:http://bit.ly/agile_data_science

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ch_02 elasticsearch EsHadoopInvalidRequest "final mapping would have more than 1 type"

ppkn opened this issue · comments

When I try to write data to ElasticSearch from PySpark I get this output: gist

This is on a ec2 instance created with ./ec2.sh

What could be going on here?

Recently, Elasticsearch was changed so that an index can only have one type. When you post to "localhost:9200/agile_data_science/test" you created an index named "agile_data_science" with a type of "test". Then, in the last line of pyspark_elasticsearch.py, the code attempts to create a new type called "executives" in the agile_data_science index, which is no longer allowed. There are a lot of possible solutions and I'll leave it to @rjurney to decide what the correct one is.

The easiest one for now is to replace "executives" on the last line of pyspark_elsticsearch.py with "test"

Shit. That is a damned stupid change. I'll have to downgrade ElasticSearch.

@dpipkin @bravefoot I downgraded Elasticsearch in both the Vagrant and EC2 scripts. Try again, sorry for the problems!

I noticed that bootstrap.sh on master still has Elasticsearch 6.1.2

echo "curl -sLko /tmp/elasticsearch-6.1.2.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.1.2.tar.gz"

tar -xvzf /tmp/elasticsearch-6.1.2.tar.gz -C elasticsearch --strip-components=1

I've added a public note to the Safari version of the book that provides notes this issue and the suggested workaround.

Now installing elasticsearch 5.6.1 again, to resolve this bug.