booknlp / booknlp

BookNLP, a natural language processing pipeline for books

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BookNLP crashes without internet access even when models are already downloaded

quadrismegistus opened this issue · comments

I've been using BookNLP for the last couple weeks and love it; thanks for such a great package.

I realized working in the (wifi-less) subway today that even though I have the models downloaded, BookNLP crashes without internet access. That's unfortunate since there are of course many real-life situations in which internet access is impossible.

Here's the error (with internet turned off):

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.]()

Here's the full stack trace:

[File ~/github/lltk/lltk/model/booknlp.py:436, in get_booknlp(language, pipeline, model, cache, quiet, **kwargs)
    ]()[434](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=433)[ if not key in booknlpd:
    ]()[435](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=434)[     from booknlp.booknlp import BookNLP
--> ]()[436](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=435)[     booknlpd[key]=BookNLP(
    ]()[437](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=436)[         language=language,
    ]()[438](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=437)[         model_params=dict(pipeline=pipeline,model=model)
    ]()[439](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=438)[     )
    ]()[440](file:///Users/ryan/github/lltk/lltk/model/booknlp.py?line=439)[ return booknlpd[key]

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py:14, in BookNLP.__init__(self, language, model_params)
     ]()[11](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=10)[ def __init__(self, language, model_params):
     ]()[13](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=12)[ 	if language == "en":
---> ]()[14](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/booknlp.py?line=13)[ 		self.booknlp=EnglishBookNLP(model_params)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py:148, in EnglishBookNLP.__init__(self, model_params)
    ]()[145](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=144)[ self.quoteTagger=QuoteTagger()
    ]()[147](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=146)[ if self.doEntities:
--> ]()[148](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=147)[ 	self.entityTagger=LitBankEntityTagger(self.entityPath, tagsetPath)
    ]()[149](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=148)[ 	aliasPath = pkg_resources.resource_filename(__name__, "data/aliases.txt")
    ]()[150](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/english_booknlp.py?line=149)[ 	self.name_resolver=NameCoref(aliasPath)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py:19, in LitBankEntityTagger.__init__(self, model_file, model_tagset)
     ]()[16](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=15)[ base_model=re.sub("google_bert", "google/bert", model_file.split("/")[-1])
     ]()[17](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=16)[ base_model=re.sub(".model", "", base_model)
---> ]()[19](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=18)[ self.model = Tagger(freeze_bert=False, base_model=base_model, tagset_flat={"EVENT":1, "O":1}, supersense_tagset=self.supersense_tagset, tagset=self.tagset, device=device)
     ]()[21](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=20)[ self.model.to(device)
     ]()[22](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/entity_tagger.py?line=21)[ self.model.load_state_dict(torch.load(model_file, map_location=device))

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py:58, in Tagger.__init__(self, freeze_bert, base_model, tagset, supersense_tagset, tagset_flat, hidden_dim, flat_hidden_dim, device)
     ]()[54](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=53)[ self.rev_supersense_tagset[len(supersense_tagset)+1]="O"
     ]()[56](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=55)[ self.num_labels_flat=len(tagset_flat)
---> ]()[58](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=57)[ self.tokenizer = BertTokenizer.from_pretrained(modelName, do_lower_case=False, do_basic_tokenize=False)
     ]()[59](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=58)[ self.bert = BertModel.from_pretrained(modelName)
     ]()[61](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/booknlp/english/tagger.py?line=60)[ self.tokenizer.add_tokens(["[CAP]"], special_tokens=True)

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1724, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   ]()[1722](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1721)[ else:
   ]()[1723](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1722)[     try:
-> ]()[1724](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1723)[         resolved_vocab_files[file_id] = cached_path(
   ]()[1725](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1724)[             file_path,
   ]()[1726](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1725)[             cache_dir=cache_dir,
   ]()[1727](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1726)[             force_download=force_download,
   ]()[1728](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1727)[             proxies=proxies,
   ]()[1729](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1728)[             resume_download=resume_download,
   ]()[1730](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1729)[             local_files_only=local_files_only,
   ]()[1731](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1730)[             use_auth_token=use_auth_token,
   ]()[1732](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1731)[             user_agent=user_agent,
   ]()[1733](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1732)[         )
   ]()[1735](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1734)[     except FileNotFoundError as error:
   ]()[1736](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/tokenization_utils_base.py?line=1735)[         if local_files_only:

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py:1921, in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
   ]()[1917](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1916)[     local_files_only = True
   ]()[1919](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1918)[ if is_remote_url(url_or_filename):
   ]()[1920](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1919)[     # URL, so get it from the cache (downloading if necessary)
-> ]()[1921](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1920)[     output_path = get_from_cache(
   ]()[1922](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1921)[         url_or_filename,
   ]()[1923](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1922)[         cache_dir=cache_dir,
   ]()[1924](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1923)[         force_download=force_download,
   ]()[1925](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1924)[         proxies=proxies,
   ]()[1926](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1925)[         resume_download=resume_download,
   ]()[1927](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1926)[         user_agent=user_agent,
   ]()[1928](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1927)[         use_auth_token=use_auth_token,
   ]()[1929](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1928)[         local_files_only=local_files_only,
   ]()[1930](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1929)[     )
   ]()[1931](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1930)[ elif os.path.exists(url_or_filename):
   ]()[1932](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1931)[     # File, and it exists.
   ]()[1933](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=1932)[     output_path = url_or_filename

File ~/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py:2177, in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
   ]()[2171](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2170)[                 raise FileNotFoundError(
   ]()[2172](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2171)[                     "Cannot find the requested files in the cached path and outgoing traffic has been"
   ]()[2173](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2172)[                     " disabled. To enable model look-ups and downloads online, set 'local_files_only'"
   ]()[2174](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2173)[                     " to False."
   ]()[2175](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2174)[                 )
   ]()[2176](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2175)[             else:
-> ]()[2177](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2176)[                 raise ValueError(
   ]()[2178](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2177)[                     "Connection error, and we cannot find the requested files in the cached path."
   ]()[2179](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2178)[                     " Please try again or make sure your Internet connection is on."
   ]()[2180](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2179)[                 )
   ]()[2182](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2181)[ # From now on, etag is not None.
   ]()[2183](file:///Users/ryan/miniforge3/envs/booknlp/lib/python3.10/site-packages/transformers/file_utils.py?line=2182)[ if os.path.exists(cache_path) and not force_download:

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.]()

I turn wifi on and everything works normally.

Yes, thanks for bringing this up -- this is something I've been wanting to look into about the transformers library (which seems to require http calls for bert-based models even when the original bert model doesn't need to be accessed). Let me look into it (but if anyone else has seen this, let me know!)

One quick solution is to use transformers' "offline mode" when executing your code, which involves setting the environment variable TRANSFORMERS_OFFLINE=1. In your case (within the lltk/model directory), from the command line, this would be:

TRANSFORMERS_OFFLINE=1 python booknlp.py

This doesn't address why transfomers isn't able to read from the cache (where it stores model/tokenizer files) when there's no internet (it seems to do so when there is internet access, without redownloading every time) -- I'll dig into that more.