Crawl Over Ten Thousands Deliverable Emails & Phone Numbers Per Day ✔️
Request-Handling Pipeline:
insert pending query -> query get scanned -> update query's status -> check results during or after the crawling
Front-End:
- login and schedule crawl queries
- check query status
- display results in tables
- export results to .csv files and download
- send bulk emails
Documentation:
html
Description
index.html
the homepage to login
search.html
input the query into database
result.html
show the result table fetch from database
php
Description
login.php
verifyt the account from database
searchQ.php
insert the query into database, fetch the exist query
result.php
show the result table fetch from database which searched from linkedin
result_sg.php
show the result table fetch from database which searched from salegenie
refresh.php
asynchronous refresh the query
delete.php
asynchronous delete the query from database
javascript
Description
table2CSV.js
convert the table in html to csv file that can be downloaded
Back-End:
- scan and process the pending queries multithreaded
- crawl target emails & phone numbers from Linkedin & SalesGenie & Google
- verify if the emails are valid and deliverable
- store results into MySQL
Documentation:
Package crawler
Description
EmailCrawlerAPI
Entrance of the program, launch the Spring Boot
EmailCrawlerConfig
Read configuration file
Package crawler.controller
Description
CrawlEmailController
Map RESTful API
Package crawler.DAO
Description
MySQLConnector
Encapsulate the methods of connecting to MySQL
RecnctThread
Reconnect to MySQL to avoid timeout
CompanyDAO
insert and update data to the Company table
CustomerDAO
insert and update data to the Customer table
EmailDAO
insert and update data to the Email table
ResultDAO
insert and update data to the Result table
ResultSgDAO
insert and update data to the ResultSg table
SalesgenieDAO
insert and update data to the SalesgenieDAO table
SearchQueryDAO
insert and update data to the SearchQueryDAO table
Package crawler.model
Description
CrawlerQuery
Data model of a query
Customer
Data model of a customer
Email
Data model of an Email
SalesGenieResult
Data model of a result from SalesGenie
Package crawler.service
Description
Callback
Interface of the callback when a query has been completed or failed
PollSearchQueryService
Check if there is any pending query in database. If yes, send it to the line of production
CrawlEmailService
The process of crawling email from Linkedin
CrawlSalesGenieService
The process of crawling SalesGenie
DriveBrowserService
The implementation of the general browser operations
DriveLinkedinService
The implementation of the browser operations for crawling Linkedin
DriveSalesgenieService
The implementation of the browser operations for crawling SalesGenie
EmailVerifyService
Verify if a given email address is deliverable
GeneratAccurateEmailsService
Generate a person's email addresses based on his name and companies
SendEmailService
Send email by Java code, should not be used to spam emails
LaunchWindowService
UI by Java Swing (discarded)
Package crawler.thread
Description
CrawlCompanyThread
Unit of task excuted when crawling emails of a company of a person (not used)
CrawlCustomerThread
Unit of task excuted when crawling emails of a person