openwashdata / cbssuitabilityhaiti

Data for a sanitation zoning assessment prepared for the city of Cap Haitien, Haiti. The package combines two datasets used for an analysis of the suitability of container-based sanitation (CBS)

Home Page:https://openwashdata.github.io/cbssuitabilityhaiti/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide metadata for variables in okap and mwater data

larnsce opened this issue · comments

Hi @mayalubecks

Thanks again for sharing your data with us. It looks really valuable and already comes in a nice and tidy structure that won't need much data wrangling. I will rename the variables (columns) slightly, but before I do that could you please provide a brief description for each variable in the okap and mwater data?

I have prepared a data dictionary as an XLSX file for you, which you can use.

Step 1

You can download the dictionary.xlsx file from the following link:

https://github.com/openwashdata/cbssuitabilityhaiti/raw/main/data-raw/dictionary.xlsx

Step 2

Open the file on your computer and fill in the column description with a brief description for each variable shown in the column variable_name. That would be one to two sentences with about 5 to 20 words. See an example here:

https://github.com/openwashdata/fsmglobal/blob/main/data-raw/dictionary.csv

Step 3

Upload the completed dictionary.xlsx file back into the data-raw folder using the same steps as in #1.

Thank you for working through this with us.

Hi @larnsce I went through and added for the descriptions of the variables I know/used in the analysis, the excel file is now uploaded (called dictionary_MLS). Please note there are several variables that I do not know (because this data came from elsewhere and I only got an explanation for the specific variables I needed for the analysis) so I wrote "Unknown"- it may make sense to remove these columns of data?

Hi @mayalubecks, thank you for sharing the dictionary. Do you have no information on the "okap" dataset? I have found that it comes form this report stored on: https://data.humdata.org/dataset/cap-haitien-haiti-sanitation-zoning-assessment

The dataset looks quite interesting and we would love to include it in the package.

I am now handing over this work to @sebastian-loos. He works on the openwashdata project with us and will build this package.

Hey @mayalubecks,

Since you mentioned that the data came from elsewhere and you only got a description for some variables I went on and tried to determine the variables of the okap dataset myself. I was not entirely successful since the abbreviations are not clear to me either and the paper Lars had found is in french.

For the following numerical variables in the okap dataset, I wasn't able to determine a description:

  • qlty_water (Values 1.00-5.00)
  • qty_water (Values 1.00-4.66)
  • health_car (Values 1.00-6.33)
  • schooling (Values 1.00-8.00)
  • transport (Values 4.00-7.00)
  • ranking (Values 3.2-4.2)

It looks like they have been collected together and only for the city of Cap Haitien.

Additionally, for the categorical variable standing I couldn't find a description.
It contains the values "tres bas", "bas", "moyen", "haut", "collectif", "rural".

Since we are also very interested in this dataset, it would be great if you could share any information and/or a contact to further investigate this variables.

Thank you very much for collaboration!

Hey @mayalubecks

No worries, thank you for your reply!

It is indeed very unfortunate, but thank you very much for your efforts and for letting me know.

Unfortunately, my investigations were unsuccessful and I could not find any definition for the unknown variables... Although looking at the these variable names one could guess their meaning, I decided to remove the variables from the published datasets as they would not be meaningful without the context/description.

I have now updated the dictionary for the data variables and will therefore close this issue.

Thanks for your collaboration on this!

Best,
Sebastian