bigscience-workshop / data_tooling

Tools for managing datasets for governance and training.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create dataset ekantipur_com

albertvillanova opened this issue · comments

  • uid: ekantipur_com
  • type: primary
  • description:
    • name: ekantipur.com
    • description: Kantipur (Nepali: कान्तिपुर) is a Nepali language daily newspaper,
    • homepage: https://ekantipur.com/
    • validated: True
  • languages:
    • language_names:
      • Indic
      • Nepali (macrolanguage)
    • language_comments:
    • language_locations:
      • Southern Asia
      • Nepal
    • validated: False
  • custodian:
  • availability:
    • procurement:
      • for_download: No - we would need to spontaneously reach out to the current owners/custodians
      • download_url:
      • download_email: csd@kmg.com.np
    • licensing:
    • pii:
      • has_pii: Yes
      • generic_pii_likely: very likely
      • generic_pii_list:
        • names
        • physical addresses
        • full-face photographs and comparable images
      • numeric_pii_likely: somewhat likely
      • numeric_pii_list:
        • telephone numbers
        • health plan beneficiary numbers
        • certificate/license numbers
        • vehicle identifiers and serial numbers
        • social security numbers
        • account numbers
      • sensitive_pii_likely: very likely
      • sensitive_pii_list:
        • political opinions
        • health-related data
        • racial or ethnic origin
        • data concerning a person's sex life or sexual orientation
        • religious or philosophical beliefs
      • no_pii_justification_class:
      • no_pii_justification_text:
    • validated: False
  • source_category:
    • category_type: website
    • category_web: news or magazine website
    • category_media:
    • validated: False
  • media:
    • category:
      • text
    • text_format:
      • .HTML
    • audiovisual_format:
    • image_format:
    • database_format:
    • text_is_transcribed: No
    • instance_type: article
    • instance_count: 10K<n<100K
    • instance_size: 100<n<10,000
    • validated: False
  • fname: ekantipur_com.json