Welcome folks! 🎉🎉
This repository contains data from our research: COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances (To be Announced)!
Our dataset comprises 559 instances that tests Common Sense Reasoning (CSR). This task focuses on Indonesian local nuances and culture and is presented in COPA-style. Here are few examples of our data:
Premise | Choice 1 | Choice 2 | Question Type | Label |
---|---|---|---|---|
Penumpang angkutan umum ingin turun di jalan. | Penumpang teriak "kanan" | Penumpang teriak "kiri" | effect | Choice 2 |
Dia merasa masuk angin | Dia membuka jendela untuk meperbaiki sirkulasi udara | Dia meminta tolong untuk kerokan | effect | Choice 2 |
Kemarin malam, ia baru selesai jaga lilin. | Ia adalah orang yang taat beribadah | Ia percaya dengan ilmu hitam | cause | Choice 2 |
Ia dibawa ke kantor polisi akibat mencuri televisi | Ia tertangkap basah membawa televisi | Ia membawa televisi dengan tangan merah | cause | Choice 1 |
Our data can be downloaded on Huggingface or you can just clone this repository and get the content of /data
.
test_copal.csv
contains COPAL-IDtest_copal_colloquial.csv
contains the colloquial version of COPAL-ID
Further detailed information will be provided in the future!
To be announced. Wait for us cleaning our code 🙏🙏, especially for messy stuffs.
For instance, we are cleaning stuffs that look like these: # DONT CHANGE THIS CODE OR IT WILL BREAK
, print('TESTTESTTEST')
, print("CAT A MEONG MEOW")
, how_should_i_name_this_var=123
.
- Haryo Akbarianto Wibowo @ MBZUAI
- Erland Hilman Fuadi @ Independent Researcher
- Made Nindyatama Nityasya @ Independent Researcher
- Radityo Eko Prasojo @ Pitik
- Alham Fikri Aji @MBZUAI