Are tokenizers supposed to work in the browser?
Vectorrent opened this issue · comments
Question
I'd love to use some pretrained tokenizers, right in my browser. On a number of occasions, I've tried to use this library to load and use a tokenizer in my browser, but it always fails with an error like this:
Uncaught (in promise) SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
getModelJSON hub.js:584
loadTokenizer tokenizers.js:62
from_pretrained tokenizers.js:4398
gv9xs tok.js:3
gv9xs tok.js:9
newRequire dev.42f35062.js:71
<anonymous> dev.42f35062.js:122
<anonymous> dev.42f35062.js:145
hub.js:584:16
gv9xs tok.js:3
AsyncFunctionThrow self-hosted:856
(Async: async)
gv9xs tok.js:9
newRequire dev.42f35062.js:71
<anonymous> dev.42f35062.js:122
<anonymous> dev.42f35062.js:145
Is there anything I can do to make this work? My code is rather simple:
import { AutoTokenizer } from '@xenova/transformers'
;(async function () {
const tokenizer = await AutoTokenizer.from_pretrained(
'Xenova/bert-base-uncased'
)
console.log(tokenizer)
const { input_ids } = await tokenizer('I love transformers!')
console.log(input_ids)
})()
I serve this code via a Parcel development server, but it's never worked for me. Any advice would be greatly appreciated!
Hi there 👋 Yes, they do work in the browser. This is most likely a duplicate of #483, and you can solve it with:
import { env } from '@xenova/transformers';
env.allowLocalModels=false;
Please note that you must refresh your cache in order for this to work properly, by following these steps:
- Open devtools
- Go to "Application" tab
- Go to "Storage"
- Click "Clear site data"
Bingo, that did the trick! Thank you so much for the quick response. This opens up a world of possibility for me!