okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**

Home Page:https://serenata.ai/en

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Year information seems to be wrong

fernandobarbalho opened this issue · comments

I've tried to retrieve several information from the last legislature, I`ve coded a pilot to test if i was going on the right way. Bellow is the code in R.

library(jsonlite)
library(purrr)
library(dplyr)


df_reembolso<- map_df(2015:2016, function(ano){
  map_df(seq(from=0, to=200, by=100), function(seq_n){
    
    exp<- "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?"
    year = paste0("year=",ano)
    offset = paste0("offset=",seq_n)
    limit=paste0("limit=100")
    
    json_exp <- paste0(exp,year,"&",offset,"&",limit)
    
    print(json_exp)
    
    reembolso<- fromJSON(json_exp)
    tibble(congressista_id=reembolso$results$congressperson_id,
           congressista_nome=reembolso$results$congressperson_name,
           partido=reembolso$results$party,
           estado=reembolso$results$state,
           valor_liquido=as.numeric(reembolso$results$total_net_value),
           data_ocorrencia = reembolso$results$issue_date,
           ano = reembolso$results$year)
    
  })
})

With this code I intended to retrieve the 300 first lines form years 2015 and 2016.
These were the get command generated by the code, that were executed in the fromJSON command

[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2015&offset=0&limit=100"
[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2015&offset=100&limit=100"
[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2015&offset=200&limit=100"
[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2016&offset=0&limit=100"
[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2016&offset=100&limit=100"
[1] "https://jarbas.serenata.ai/api/chamber_of_deputies/reimbursement/?year=2016&offset=200&limit=100"

The result set was not the one I expected. As it can be seen in the pictures attached to this issue, the dates are not related to the years they were supposed to associated. Reimbursement from 2016 are associated to the year 2015 as well as those from 2017 are associated to the year 2016.

I wish to know if this is a bug or I am missing something.

image
image

Reimbursement from 2016 are associated to the year 2015 as well as those from 2017 are associated to the year 2016.

This is not a problem, at least, not as in the sense the issue title claims: year information is not wrong — it is the official information provided by the Chamber of Deputies that we do not change. For example, this is one of the reimbursements that might be in your set, in the offical Chamber sites it reads:

Data de emissĂŁo
12/05/2017

CompetĂŞncia
12/2016

What I mean is that this is caused by other things, but not the information being wrong in Serenata's database and API. Thus, the first thing: always double check data with official sources to be sure where the apparent issue is coming from.

Surely you can explore how this happens, LAI (Access to Information Law) gives you the opportunity to preemptively ask this directly to the administration at the Chamber, but reading the _ato de mesa _ that implements CEAP might help you understand a couple of scenarios in which year will not match date_issued fields.

Thanks for the answer. I misinterpeted the variable issue_date. I thought that this was the date that the fact related to the expense ocurred, but seems that this is the date the reimbursement occured.

I thought that this was the date that the fact related to the expense ocurred, but seems that this is the date the reimbursement occured.

As far as I can remember (and it's a poor thing the Chamber took down the documentation of this dataset), date_issued is the date when the receipt was issued. Not necessarily the date when the expense happened, nor the date when the reimbursement took place.