tballison / chorus

Towards an open source stack for e-commerce search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chorus Logo

Chorus

Towards an open source tool stack for e-commerce search.

What Runs Where

Working with macOS? Pop open all the tuning related web pages:

open http://localhost:4000 http://localhost:8983 http://localhost:9000 http://localhost:3000 http://localhost:7979

Learning all about Chorus!

We are trying to strike a balance between making the setup process as easy and fool proof as possible with the need to not hide too much of the interactions between the projects that make up Chorus.

If you are impatient, we have a quick start script, ./quickstart.sh that sets you up, however I recommend you go through Kata 0: Setting up Chorus.

After that, you can learn how to use the tools in Chorus to improve search in First Kata: Lets Optimize a Query.

Useful Commands for Chorus

To start your environment, i.e to do each step manually, run:

docker-compose up --build -d

Otherwise you can just run ./quickstart.sh. To include the observability features, run:

./quickstart.sh --with-observability

To see what is happening in the Chorus stack you can tail the logs for all the components via:

docker-compose logs -tf

If you want to narrow down to just one component of the Chorus stack do:

docker-compose ps                       # list out the names of the components
docker-compose logs -tf solr1 solr2     # tail solr1 and solr2 only

To reset your environment (including any volumes created like the mysql db), just run:

docker-compose down -v

If Docker is giving you a hard time then some options are:

docker system prune                     # removes orphaned images, networks, etc.
docker system prune -a --volumes        # removes all images, clears out your Docker diskspace if you full.

You may also have to increase the resources given to Docker, up to 4 GB RAM and 2 GB Swap space.

Chorus Data Details

The Chorus project includes some public datasets. These datasets let the community learn, experiment, and collaborate in a safe manner and are a key part of demonstrating how to build measurable and tunable ecommerce search with open source components.

The product data is gratefully sourced from Icecat and is licensed under their Open Content License.

The version of the Icecat data that Chorus provides has the following changes:

  • Data converted to JSON format.
  • Products that don't have a 500x500 pixel image listed are removed.
  • Prices extracted for ~19,000 products from the https://www.upcitemdb.com/ service using EAN codes to match.

The ratings data (a.k.a explicit judgements) allows you to measure the impact of your changes to relevance. We are profoundly grateful to the team at Supahands for voluntarily generating multiple ratings for the set of 125 representative ecommerce queries and sharing that data with the Chorus community:

Learn more in Kata 006: How to Use Explicit Judgements about how you can work with Supahands to generate your human judgements.

About

Towards an open source stack for e-commerce search

License:Apache License 2.0


Languages

Language:Ruby 49.8%Language:XSLT 24.3%Language:Shell 12.6%Language:HTML 9.1%Language:Dockerfile 1.7%Language:JavaScript 1.6%Language:CSS 0.8%Language:SCSS 0.1%