bigscience-workshop / data_tooling

Tools for managing datasets for governance and training.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create dataset xnli

albertvillanova opened this issue · comments

  • uid: xnli
  • type: primary
  • description:
  • languages:
    • language_names:
      • English
      • French
      • Spanish
      • Arabic
      • Vietnamese
      • Chinese
      • ar-MSA
      • German
      • Greek languages
      • Bulgarian
      • Russian
      • Arabic
      • Turkish
      • Thai
      • Hindi
      • Swahili (macrolanguage)
      • Urdu
    • language_comments:
    • language_locations:
    • validated: False
  • custodian:
    • name:
    • in_catalogue:
    • type:
    • location:
    • contact_name: XNLI
    • contact_email:
    • contact_submitter: False
    • additional:
    • validated: False
  • availability:
    • procurement:
    • licensing:
      • has_licenses: Yes
      • license_text:
      • license_properties:
        • open license
        • public domain
      • license_list:
    • pii:
      • has_pii: Unclear
      • generic_pii_likely:
      • generic_pii_list:
      • numeric_pii_likely:
      • numeric_pii_list:
      • sensitive_pii_likely:
      • sensitive_pii_list:
      • no_pii_justification_class: general knowledge not written by or referring to private persons
      • no_pii_justification_text:
    • validated: False
  • source_category:
    • category_type: collection
    • category_web:
    • category_media:
    • validated: False
  • media:
    • category:
      • text
    • text_format:
      • .CSV
    • audiovisual_format:
    • image_format:
    • database_format:
      • .ZIP
    • text_is_transcribed: Yes - audiovisual
    • instance_type:
    • instance_count: 100K<n<1M
    • instance_size: 10<n<100
    • validated: False
  • fname: xnli.json