PHP crawler and spider. working with UTF8, MySQL, Random host, Supports robots.txt and many more surprises
- on /sql folder you will find 'schema_create.sql' file run it in sql
- on /libs folder you will find 'config.php' you should configure as well
- give read + write + delete permissions to /writable dir
There two ways to run To run it with multi processes (if you configure on crowle.php ) Default 4 processes
sh do_it.sh
To run single proccess
php crowle.php
Create php file and run Method 1 :
<?php
include 'libs/general.php';
$urlArray=array(
'http://some-url-1.com/'=>"some url 1 description",
'http://some-url-2.com/"=>"some url 2 description"
);
Providers::insert_url_list($urlArray);
?>
Method 2 :
<?php
include 'libs/general.php';
$temp = Providers::get_or_create_url_by_url("http://some-url-1.com/");
$temp = Providers::get_or_create_url_by_url("http://some-url-2.com/");
?>
- or - info@korotkin.co.il
Hope it fine :-) Give me feedback !