360fish's starred repositories
GoogleScraper
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Monster-Crawler
A Tutorial Showing Scrapy Web Scraping and Data Visulization
lemon-agent
Plan-Validate-Solve (PVS) Agent for accurate, reliable and reproducable workflow automation
Plan-and-Solve-Prompting
Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models".
factory-pattern-vectorstore-interface
A pattern to let you try several vector databases and change a little code as possible
Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
dataflowkit
Extract structured data from web sites. Web sites scraping.
amazon-scraper
A simple web scraper to extract Product Data and Pricing from Amazon
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.