saharavishag / DigitalHumanProject

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DigitalHumanProject

Summary

This project is related to the course "Subjects in Digital humanities". In this project I hoped to achive a knowledge about how Isreali politicians are being reviewed in the new apps.

In order to do that, we’ll need to understand the context of the articles about them.

We’ll understant the context by extracting adjectives from the articles.

Related sources

API - News Api

https://newsapi.org/

Yap - Yet Another (natural language) Parser

https://github.com/OnlpLab/yap

Artice - A Unified Morpho-Syntactic Scheme of Stanford Dependencies

http://www.tsarfaty.com/pdfs/acl13.pdf

The Process

Set up and activate

Prerequirements

  • python3
  • go
  • create apikey for NewsAPI
  • install and compile YAP - make sure you do it inside GOPATH

activate venv

cd <your project dir>
git clone <git url>
cd DigitalHumanProject

activate venv

python3.7 -m venv venv3
source venv3/bin/activate

install project requirements

pip3 install -r requirements.txt
export api_key=<YOUR_API_KEY>

(example: export api_key=40d61a5ed053486f8b3ef093551f4d40)

deactivate venv (if needed)

deactivate

Action & Explaination

Our target is to extract

For that, We'll apply 4 steps:

Phase #1 (manually)

Prepare you research:

In these project I focused on Israelies politicians,
but it can be applied to any context you wish to research.

Phase #2 (manually)

Get contenct into jsons with the following structure:

{"status": "ok", 
"totalResults": 16, 
"articles": [
    {"source": {"id": "ynet", "name": "Ynet"},
    "author": "...", 
    "description": "...", 
    "url": "'''", 
    "publishedAt": "2020-01-01T20:00:00Z", "content": "..."},
    ...
]
}
  • The files will be saved in content/json/name/
  • In our project the files contains articles with politicians names, but it can be applied to any name you want

Phase #3 (automatically)

Extracting tokens from the content.

Assuming the content is set in the right way,
this phase will extract the content and parse it into tokens.

An example for a token file:

חבר
הכנסת
מהליכוד
טען
באולפן
ynet
כי
למרות
שראש
הממשלה
נתניהו
יישב
על
ספסל
הנאשמים,
"הניסיון
שלו
כל
כך
משמעותי,
שאדם
עם
אפס
צרות
אחרות
לא
מסוגל
להיכנס
לנעליו".
הוא
גיבה
את
בנט
למרות
המתקפות
נגדו:
"שר
ביטחון
טוב".
על
גדעון
סער:
"נתניהו
מבין
היטב
את
מקומו
בהנהג…
 
.

  • The files will be saved in tokens/name/

Phase #4 (automatically)

Applying YAP utils on our tokens

In this step, we'll apply yap utils to extract "Part of speech" in Hebrew.

  • The files will be saved in finalresults/name/

Project specifications

Work with Makefile

make api-data
make extract-tokens
make apply-yap
make delete-results
make restart
make start

examples for api call

newsapi = NewsApiClient(api_key='40d61a5ed053486f8b3ef093551f4d40')

top_headlines = newsapi.get_top_headlines(q='bitcoin',
                                          sources='bbc-news,the-verge',
                                          category='business',
                                          language='en',
                                          country='us')

all_articles = newsapi.get_everything(q='bitcoin',
                                      sources='bbc-news,the-verge',
                                      domains='bbc.co.uk,techcrunch.com',
                                      from_param='2017-12-01',
                                      to='2017-12-12',
                                      language='en',
                                      sort_by='relevancy',
                                      page=2)

Result structure from the API

Json

Each item:

{ "source": {"id": "ynet", "name": "Ynet"}, "author": "...", "description": "..."", "url": "...", "urlToImage": "...", "publishedAt": "2020-01-25T16:51:00Z", "content": "..." }

About


Languages

Language:Python 53.2%Language:Shell 35.2%Language:Makefile 11.5%