wenh06 / numbda-webnews

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

license language metrics pipeline_tag
apache-2.0
zh
accuracy
text-classification

Model Card for numbda-webnews

中文版

numbda-webnews is a news classification model fine-tuned from roberta-base-finetuned-ifeng-chinese with a new dataset of approximately 40k news articles crawled from news websites in China, which is a sub-project of the AI-Testing project.

The dataset contains (not limited to) the following 14 categories:

  • 资讯
  • 财经
  • 体育
  • 时政
  • 娱乐
  • 社会
  • 科技
  • 汽车
  • 健康
  • 萌宠
  • 国际
  • 生活
  • 美食
  • 游戏

The above 14 categories have a total of 26k samples.

Model Details

Model Sources

Uses

Direct Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# "wenh06/numbda-webnews" can be replaced with local path to the model directory
tokenizer = AutoTokenizer.from_pretrained("wenh06/numbda-webnews")
model = AutoModelForSequenceClassification.from_pretrained("wenh06/numbda-webnews")

pipeline = pipeline("text-classification", model=model, tokenizer=tokenizer)

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

This model was fine-tuned using a new dataset of approximately 40k news articles crawled from news websites in China, which would be released latter some time.

Evaluation

Evaluation results and software/hardware information can be found in Weights & Biases.

Metric Score
top1-accuracy 0.768
top3-accuracy 0.944
top5-accuracy 0.981

Curves of Top n Accuracy

Top1 Accuracy Top3 Accuracy Top5 Accuracy
eval-top1-acc.svg eval-top3-acc.svg eval-top5-acc.svg

About


Languages

Language:Python 96.0%Language:Dockerfile 4.0%