feiosme / token-calc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Token Calc

计算任何数据集json的Token数,动态支持各种json格式,只需准备好你的json文件即可。

How to use

Quick start

pip install -r requirements.txt
python calc_token.py <your-file.json>

Local setup

Linux

In online mode computer

  1. start python venv
python -m venv env
source env/bin/activate
  1. install python packages
pip install -r requirements.txt
  1. run once
python calc_token.py <your-file.json>
  1. copy tmp file

Because tiktoken package need online download cl100k_base in cache. We can download the necessary file, then "trick" tiktoken into caching it. https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer

cp -r /tmp/data-gym-cache .
  1. tar.gz
cd ..
tar -czvf token-calc.tar.gz token-calc

In your offline mode computer

  1. setup env
tar -zxvf token-calc.tar.gz
cd token-calc
cd env/bin && rm python* && ln -s python3 python && ln -s python3 python3.10 && ln -s /usr/bin/python3 python3
vi activate # Edit "VIRTUAL_ENV" to your current dir
source env/bin/activate
  1. copy tmp files
cp -r data-gym-cache /tmp
  1. run python
python calc_token.py <your-file.json>

About

License:Apache License 2.0


Languages

Language:Python 100.0%