GPT-4 has trouble with calculating with numbers. We look at a technique to fix that.
Read the report - GPT-4 Equipped with Numeric Calculation
NOTE: The default model used when running this code is OpenAI's
text-davinci-003
and not GPT-4. The code can be adjusted to run against any completion API by changing the settings insrc/engine/options.ts
andsrc/engine/settings.ts
.When writing this report, we evaluated the technique using the completion API of a private release of GPT-4. The public release of GPT-4 currently only provides a "chat" API. As a result some aspects of the technique described may need to be reworked because of this, however we are providing the code and our investigation for reference. We believe the evaluation results remain valid although some variation in numbers is to be expected.
sudo apt-get install npm
npm install
npm run build
Define an API key:
export OPENAI_API_KEY=...
Try your own question:
./gpte ask --question "What is sin(13.31) where the input is in degrees?"
./gpte ask --question "What is sin(13.31) where the input is in degrees?" --arith
./gpte ask --questionfile test/samples/number-years-grow-30.txt
./gpte ask --questionfile test/samples/number-years-grow-30.txt --arith
These examples require GPT-4 token window size, code generation and reasoning:
./gpte ask --questionfile test/samples/msft-report-snippet.txt
./gpte ask --questionfile test/samples/msft-report-snippet.txt --arith
./gpte ask --questionfile test/samples/msft-goog-report-snippets-compared.txt
./gpte ask --questionfile test/samples/msft-goog-report-snippets-compared.txt --arith
./gpte ask --questionfile test/samples/gap-lulemon-financial-reports-compared.txt
./gpte ask --questionfile test/samples/gap-lulemon-financial-reports-compared.txt --arith
With an appropriate model, entire problem sets can be evaluated using ./gpte eval
.
This project is licensed under the terms of the MIT open source license. Please refer to MIT for the full terms.
Maintainers: @dsyme, @wunderalbert, @johanrosenkilde
This project is a concluded technical investigation by GitHub Next. It is provided for reference.
This project includes a modified version of the ASDiv.xml
dataset for numeric calculation problems, adapted primarily to clarify the precisions and formats required in answers, see the evaluation.