vohai611 / diemthi-thpt-2021

High school graduate exam score

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Haivo 7/30/2021


This repository provide a simple R script to get Vietnam highschool graduate exam in 2020. Data are saved in data folder. This README files show how the process are done.

There are several website provide web interface to get the score. In this script, I use https://diemthi.vnanet.vn and https://tienphong.vn/tra-cuu-diem-thi.tpo. Under the hood, these two website use API to retrieve the data, and user can access this API directly via safari web browser devtools (network tab). The API from vnanet.vn is quite slow and only allow to retrieve 1 result per request. tienphong.vn on the other hand, allow user to get a maximum of 300 result per request, hence I decide to use the latter options to get the whole data. These two API take the student ID as input and provide full result of that student as output. Input are in the form {province_code}{student_id}. Province_code vary from 01 to 64, while the student_id are from 1 to the max number of student attend in the exam. For example, input = ‘01000001’ is for student 1 from the province that had code 01 (Ha Noi).


Option 1:


get_score <- function(sbd) {
  url <- glue("https://diemthi.vnanet.vn/Home/SearchBySobaodanh?code={ sbd }&nam=2021")
  GET(url) %>% 
    content(type = 'text',encoding = 'UTF-8') %>% 
    jsonlite::fromJSON() %>% 
    .[['result']] %>% 

get_score('01000001') %>% 
CityCode CityArea Code Toan NguVan NgoaiNgu VatLi HoaHoc SinhHoc KHTN DiaLi LichSu GDCD KHXH ResultGroup Result
01 NA 01000001 2.20 3.50 5.50 2.50 [{“g”:“A07”,“p”:10.20},{“g”:“C00”,“p”:11.50},{“g”:“C03”,“p”:8.20},{“g”:“C04”,“p”:11.20}]

Option 2:
In this API options, I can use sbd = ‘0100001’ to get the result of 10 attendance in one request from 01000011 to 01000019

get_score2 <- function(sbd){
  # prepare URL
  url <- glue('https://tienphong.vn/api/diemthi/get/result?type=0&keyword={ sbd }&kythi=THPT&nam=2021&cumthi=0')
  ## send request
  a <- GET(url,
           add_headers('referer'= 'https://tienphong.vn/tra-cuu-diem-thi.tpo')
  ) %>% 
    content(as = 'text') %>% 
  # parse request to text format
  a$data$results %>% 
    rvest::read_html() %>% 
    rvest::html_text2() %>% 

# data received in the form of TSV:
get_score2('0100001') %>% 
  data.table::fread() %>% 
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 1000019 7.0 8.50 8.8 NA NA NA 4.00 5.25 6.75 NA
2 1000018 8.8 8.25 NA 8.00 5.00 6.75 NA NA NA NA
3 1000017 7.8 8.00 9.6 7.50 8.00 7.25 NA NA NA NA
4 1000016 7.8 8.50 9.4 NA NA NA 6.50 7.50 8.00 NA
5 1000015 6.0 7.75 9.0 NA NA NA 4.00 7.75 7.00 NA
6 1000014 7.4 8.00 8.6 NA NA NA 6.00 6.25 7.50 NA
7 1000013 7.4 6.75 9.0 NA NA NA 3.75 8.50 6.50 NA
8 1000012 6.4 6.75 7.8 NA NA NA 5.50 7.00 7.50 NA
9 1000011 6.0 7.75 8.2 NA NA NA 3.00 7.25 8.50 NA
10 1000010 8.8 6.25 9.2 8.75 8.75 3.00 NA NA NA NA

In the script, I also use furrr package (front-end to the future package) to send request in parallel. The use of parallel yield around 3 times faster result (60 mins for nearly 1 millions result) compare with normal use.


High school graduate exam score


Language:R 100.0%