scrapy / scrapely

A pure-python HTML screen-scraping library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long' i on Windows 10

juhacz opened this issue · comments

I try to using scrapely on Windows 10 computer. I tested it on x32 and x64 python verions (3.7.4). When i try using scrape() i have error

Traceback (most recent call last):
File "D:/DEV/peojects_Python/test/test.py", line 28, in
print(s.scrape("https://xxxxxx"))
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely_init_.py", line 53, in scrape
return self.scrape_page(page)
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely_init_.py", line 59, in scrape_page
return self.ex.extract(page)[0]
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction_init
.py", line 119, in extract
extracted = extraction_tree.extract(extraction_page)
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 575, in extract
items.extend(extractor.extract(page, start_index, end_index, self.template.ignored_regions))
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 351, in extract
_, _, attributes = self._doextract(page, extractors, start_index, end_index, **kwargs)
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\regionextract.py", line 396, in _doextract
labelled, start_index, end_index_exclusive, self.best_match, **kwargs)
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\similarity.py", line 148, in similar_region
data_length - range_end, data_length - range_start)
File "D:\DEV\peojects_Python\test\venv\lib\site-packages\scrapely\extraction\similarity.py", line 85, in longest_unique_subsequence
matches = naive_match_length(to_search, subsequence, range_start, range_end)
File "scrapely/extraction/_similarity.pyx", line 155, in scrapely.extraction._similarity.naive_match_length
cpdef naive_match_length(sequence, pattern, int start=0, int end=-1):
File "scrapely/extraction/_similarity.pyx", line 158, in scrapely.extraction._similarity.naive_match_length
return np_naive_match_length(sequence, pattern, start, end)
File "scrapely/extraction/_similarity.pyx", line 87, in scrapely.extraction._similarity.np_naive_match_length
cdef np_naive_match_length(np.ndarray[np.int64_t, ndim=1] sequence,
ValueError: Buffer dtype mismatch, expected 'int64_t' but got 'long'

I try to run this on VPS Centos 7 and Python 3.6, all working fine. Problem is only on Windows.

Got the same issue on latest version on win10
In [7]: scrapely.version
Out[7]: '0.14.0'