platonai / pulsar-auto-mining

Extract almost every fields from a set of webpages using machine learning method, unsupervised.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pulsar Auto Web Mining Project

A demo project to show how we extract all possible fields from a set of webpages automatically using unsupervised machine learning.

Here is the complete result of an example auto mining.

Here is one of the 13 auto generated tables:

1.eXtracted 17 fields from page area #centerCol
 T2C2T2C3T2C4T2C5T2C6T2C7T2C8T2C9T2C10T2C11T2C12T2C13T2C14T2C15T2C16T2C17T2C18
1 Visit the ASUS Store4.3 out of 5 stars4,936 ratings|433 answe..questionsDetailsPrice:$221.61&FREE Returns$18.38 (8%)ASUSAbout this itemSee more ..t detailsCompare w..lar items detail>
2 Visit the Acer Store4.6 out of 5 stars33,817 ratings|766 answe..questionsDetailsPrice:$365.00&FREE Returns$34.99 (9%)AcerAbout this item  Compare w..lar items detail>
3 Visit the Acer Store4.6 out of 5 stars7,977 ratings|477 answe..questionsDetailsPrice:$248.00&FREE Returns$251.00 (50%)AcerAbout this itemSee more ..t detailsCompare w..lar items detail>
4 Visit the Acer Store4.6 out of 5 stars33,817 ratings|766 answe..questions Price:$438.00&FREE Returns AcerAbout this itemSee more ..t detailsCompare w..lar items detail>
5 Visit the..wed Store4.3 out of 5 stars2,184 ratings|211 answe..questionsDetailsPrice:$328.00&FREE Returns$16.00 (5%)AppleAbout this itemSee more ..t detailsCompare w..lar items detail>
6 Visit the..wed Store4.3 out of 5 stars474 ratings|107 answe..questionsDetailsPrice:$414.99&FREE Returns$33.15 (7%) About this itemSee more ..t detailsCompare w..lar items detail>
7 Visit the..ple Store4.8 out of 5 stars8,706 ratings|486 answe..questionsDetailsPrice:$949.99&FREE Returns$49.01 (5%) About this item  Show more detail>
8 Visit the..ple Store4.8 out of 5 stars8,706 ratings|486 answe..questionsDetailsPrice:$899.00&FREE Returns$100.00 (10%)AppleAbout this item  Show more detail>
9 Visit the..ple Store4.8 out of 5 stars3,711 ratings|219 answe..questionsDetailsPrice:$1,149.99&FREE Returns$149.01 (11%)AppleAbout this item  Show more detail>
10 Visit the..ple Store4.8 out of 5 stars8,706 ratings|486 answe..questionsDetailsPrice:$1,149.00&FREE Returns$100.00 (8%) About this item  Show more detail>
11 Visit the..wed Store4.2 out of 5 stars2,204 ratings|158 answe..questionsDetailsPrice:$83.00  $8.99 (10%)DellAbout this itemSee more ..t detailsCompare w..lar items detail>
12 Visit the HP Store4.5 out of 5 stars4,189 ratings|298 answe..questions Price:$263.00&FREE Returns HPAbout this item  Compare w..lar items detail>
13 Visit the HP Store4.5 out of 5 stars4,865 ratings|Climate P.. FriendlyDetailsPrice:$220.99&FREE Returns$119.00 (35%) About this item  Compare w..lar items detail>
14 Visit the..ovo Store4.5 out of 5 stars8,529 ratings|1000+ ans..questionsDetailsPrice:$215.00&FREE Returns$104.99 (33%) About this itemSee more ..t detailsCompare w..lar items detail>
15 Visit the..ovo Store4.3 out of 5 stars2,194 ratings|280 answe..questionsDetailsPrice:$204.97  $75.02 (27%)LenovoAbout this itemSee more ..t detailsCompare w..lar items detail>
16 Visit the..ovo Store4.5 out of 5 stars2,946 ratings|349 answe..questionsDetailsPrice:$355.99&FREE Returns$74.00 (17%) About this itemSee more ..t detailsCompare w..lar items detail>
17 Visit the..ovo Store3.9 out of 5 stars25 ratings|37 answer..questions Price:$669.99&FREE Returns 15.6About this itemSee more ..t detailsCompare w..lar items detail>
18 Visit the MSI Store4.4 out of 5 stars45 ratings|18 answer..questionsDetails  &FREE Returns$163.01 (18%)MSIAbout this itemSee more ..t detailsCompare w..lar items detail>
19 Visit the..zer Store4.5 out of 5 stars184 ratings|35 answer..questionsDetailsPrice:$1,799.99&FREE Returns$800.00 (31%) About this itemSee more ..t detailsCompare w..lar items detail>
20 Visit the..zer Store4.5 out of 5 stars1,541 ratings|158 answe..questionsDetailsPrice:$1,173.00  $326.99 (22%) About this itemSee more ..t detailsCompare w..lar items detail>
tp 20201420202017191917171612201313160
fp 006000000001000040
fn 1000006100000400150
tn 000000311333007700
precision 1.001.000.701.001.001.001.001.001.001.001.000.941.001.001.001.000.800.00
recall 0.671.001.001.001.000.770.941.001.001.001.001.000.751.001.000.930.760.00
f1 0.801.000.821.001.000.870.971.001.001.001.000.970.861.001.000.960.780.00

About

Extract almost every fields from a set of webpages using machine learning method, unsupervised.


Languages

Language:HTML 99.9%Language:Kotlin 0.1%Language:Java 0.0%