julian-passebecq / gpt-crawl-multiurl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi url v0.1 08feb2024 made by julianp

don't use config ts to modify the url to scrawl, but use config ts to modify json name for your output document put the url to crawl in src/tocrawl.csv (toscrawl.csv in root folder is a old relicat)

optional put all url in allcrawl.csv use crawlee.py to select which url you want to put in tocrawl.csv why ? because i didn't modify config.ts, so the name given to the json is static use txt.py to convert your json in txt for custom gpt input, might have better performance

About

License:ISC License


Languages

Language:TypeScript 53.4%Language:Python 18.1%Language:Dockerfile 14.4%Language:JavaScript 10.4%Language:Shell 3.7%