dluc / openai-tools

A collection of tools for working with OpenAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPT Tokenizer

.NET / C#

When using OpenAI GPT, you may need to know how many tokens your code is using for various purposes, such as estimating costs and improving results.

The GPT3Tokenizer C# class can help you count tokens in your prompts and in the responses received.

using AI.Dev.OpenAI.GPT;

string text = "January 1st, 2000";

// 5 tokens => [21339, 352, 301, 11, 4751]
List<int> tokens = GPT3Tokenizer.Encode(text);

The tokenizer uses a byte-pair encoding (BPE) algorithm to split words into subwords based on frequency and merges rules. It can handle out-of-vocabulary words, punctuation, and special tokens.

The result of this library is compatible with OpenAI GPT tokenizer that you can also test here.

Installation

Install AI.Dev.OpenAI.GPT NuGet package from nuget.org, e.g.:

dotnet add package AI.Dev.OpenAI.GPT --version 1.0.2

or

NuGet\Install-Package AI.Dev.OpenAI.GPT -Version 1.0.2

Python and Node.js

If you are looking for an equivalent solution in other languages:

Licensing

This library is licensed CC0, in the public domain. You can use it for any application, you can modify the code, and you can redistribute any part of it.

I am not affiliated with OpenAI and this library is not endorsed by them. I just work with several AI solutions and I share this code hoping to make technology more accessible and easier to work with.

About

A collection of tools for working with OpenAI

License:Creative Commons Zero v1.0 Universal


Languages

Language:C# 97.9%Language:Shell 2.1%