This script is a powerful, flexible command-line tool designed to search for specific keywords across multiple web pages. Built with Node.js and leveraging modern JavaScript (ESM), this tool is optimized for performance, scalability, and ease of use. It supports concurrent web requests, customizable search parameters, and provides clear, real-time feedback to the user.
This project is ideal for anyone looking to efficiently search for content across various websites, whether for data gathering, content verification, or competitive analysis.
- Concurrent Web Scraping: Efficiently handles multiple URLs with configurable concurrency limits.
- Customizable Search Parameters: Specify keywords, number of pages to search, and content offsets to fine-tune your results.
- Real-time Notifications: Instant notifications via desktop alerts when keywords are found.
- Modern JavaScript (ESM): Utilizes ES modules and modern async/await syntax for better performance and readability.
Before you begin, ensure you have the following installed:
- Node.js (version 18 or higher)
- npm (Node Package Manager)
You can verify your Node.js and npm installation by running:
node -v
npm -v
-
Clone the repository:
git clone https://github.com/src-dbgr/utils.git cd utils
-
Install dependencies:
Run the following command in your project directory to install the required npm packages:
npm install
This will install the necessary packages including
chalk
,node-notifier
, andnode-fetch
.
The script can be run from the command line and accepts several parameters to customize the search behavior.
node fetch.mjs --search=keyword1,keyword2 --urls=https://example.com,https://another.com
--search
(required): Comma-separated list of keywords to search for on the specified websites.--urls
(optional): Comma-separated list of URLs to search. Defaults tohttps://www.example.com/
if not provided.--pages
(optional): Number of pages to search per URL. Default is 1.--offsetneg
(optional): Number of characters before the keyword in the search results. Default is 20.--offsetpos
(optional): Number of characters after the keyword in the search results. Default is 100.--timeout
(optional): Timeout for each request in milliseconds. Default is 1500ms.--startpage
(optional): Page number to start the search from. Default is 0.
To search for "Node.js" and "JavaScript" across the first 3 pages of the specified websites:
node fetch.mjs --search=Node.js,JavaScript --urls=https://www.example.com,https://www.some-random-page.org
This command will:
- Search for "Node.js" and "JavaScript" of
https://www.example.com
andhttps://www.some-random-page.org
. - Display results in the console with the context around the found keywords.
- Trigger a desktop notification if any of the keywords are found.
You can display help information by running:
node fetch.mjs --help
This will print out all available options and their usage.
The script includes robust error handling to manage network issues and invalid inputs. If an error occurs, it will be logged in the console with a descriptive message to assist in debugging.
Contributions to enhance the functionality of this script are welcome. Feel free to fork the repository and submit a pull request with your improvements.
This project is licensed under the MIT License. See the LICENSE file for details.
If you have any questions or need further assistance, feel free to reach out via src.dbgr@gmail.com or through GitHub issues.
Note: This script was developed as part of a personal project since I had the need to search for certain keywords on multiple webpages. It demonstrates modern JavaScript practices.