WendyBu / PubMedApp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PubMedApp

Yiwen Bu ybu34@gatech.edu GT ID: ybu34

Project Introduction

  1. The purpose of the recommendation system is to push the most possible interesting health-related articles to the readers, based on their browser and action history, inlcuding view, bookmark, like, comment, and follow. It will be given different weights.

  2. Input data
    i. articlewID.csv
    It contains about 3000 articles's metadata columns: 'Unnamed: 0', 'index', 'oldIndex', 'journal', 'title', 'text', 'year', 'lang', 'type', 'region', 'PMID', 'url', 'key', 'contentId']

    ii. users_interactions.csv
    It contains about 1800 readers actions on the article. columns: 'timestamp', 'eventType', 'contentId', 'personId', 'sessionId', 'userAgent', 'userRegion', 'userCountry'

  3. Evaluation:
    Top-N accuracy metrics Recall@5 of 0.21, which means that about 21% of interacted items in test set were ranked the top-5 items (from lists with 100 random items)
    Recall@10 of 0.33, which means that about 33% of interacted items in test set were ranked the top-10 items (from lists with 100 random items)

  4. Conclusion:
    The content based model has the best performance, it will be used to generate reader's recommendation list. It achieves 58% in Top-10 accuracy and 49% in Top-5 accuracy.

  5. Testing:
    In the test session, given one specific user, the engine will recommend the top 20 articles for him/her.

Deliverable 3 Recommendation Engine (Backend)

It compared different models for the recommendation system:

  1. popularity model: it provides good recommendations for most people.
  2. Content-Based Filtering model: uses only information about the description and attributes of the articles readers have actions before to model user's preferences
  3. Collaborative Filtering model: This method makes automatic predictions (filtering) about the interests of a reader by collecting preferences or taste information from many readers (collaborating).
  4. Hybrid model: combining collaborative filtering and content-based filtering

File structures

  1. /query.py
    Data preparation. Generate the sample data from pubmed database. It queried more than 3000 publications, including title, abstract, journal, year, language, region, pubmed ID, url et al.
    Don't need to run this file again, unless want to generate new dataset.
    The output of this file is saved as data/articlewID.csv
    For limiting the file size, transition or temp data have been deleted.

  2. /recommender.py
    main file. All the learners and recommender models are included.
    It will generate all the performance for each of the model and their comparison.
    The performance plot is saved in output/ComparisonModels.png
    This file can run through, it should take less than 5 minutes to run and results will be printed in the console.

  3. /pubmedRec.py
    Place holder for flask file. Not finish yet.

  4. /requirements.txt
    required packages

  5. /README.md
    Information

  6. /data/articlewID.csv
    Input data. more than 3000 aritcles meta data from pubmed, generated by myself.

  7. /data/users_interactions.csv Input data. Users action on each article. generated by Gabriel Moreira.

  8. /output/comparisonModel
    Model comparison figure.

To do in next deliverable

  1. Flask
  2. Frontend: GUI
  3. Take readers' inupt and add into the users_interaction.csv.

Reference

  1. Building a Movie Recommendation Engine in Python using Scikit-Learn https://medium.com/code-heroku/building-a-movie-recommendation-engine-in-python-using-scikit-learn-c7489d7cb145

  2. Evaluating recommender systems http://fastml.com/evaluating-recommender-systems/

  3. Building a Recommendation System with Python Machine Learning & AI Lynda.com course by Lillian Pierson

  4. Recommender system wikipedia/Recommender_system

  5. Articles sharing and reading from CI&T DeskDrop by Gabriel Moreira

  6. Recommender Systems in Python: Beginner Tutorial https://www.datacamp.com/community/tutorials/recommender-systems-python

About


Languages

Language:Python 98.3%Language:HTML 1.2%Language:Shell 0.3%Language:PowerShell 0.1%Language:CSS 0.1%