Mixnode / mixnode-warcreader-php

Read Web ARChive (WARC) files in PHP.

Home Page:https://www.mixnode.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mixnode WARC Reader for PHP

This library allows developers to read Web ARChive (WARC) files in PHP.

Installation Guide

We recommend Composer for installing this package:

curl -sS https://getcomposer.org/installer | php

Once done, run the Composer command to install Mixnode WARC Reader for PHP:

php composer.phar require mixnode/mixnode-warcreader-php

After installing, you need to require Composer's autoloader in your code:

require 'vendor/autoload.php';

You can then later update Mixnode WARC Reader using composer:

composer.phar update

A Simple Example

<?php
require 'vendor/autoload.php';

// Initialize a WarcReader object 
// The WarcReader constructure accepts paths to both raw WARC files and GZipped WARC files
$warc_reader = new Mixnode\WarcReader("test.warc.gz");

// Using nextRecord, iterate through the WARC file and output each record.
while(($record = $warc_reader->nextRecord()) != FALSE){
	// A WARC record is broken into two parts: header and content.
	// header contains metadata about content, while content is the actual resource captured.
	print_r($record['header']);
	print_r($record['content']);
	echo "------------------------------------\n";
}

About

Read Web ARChive (WARC) files in PHP.

https://www.mixnode.com

License:Other


Languages

Language:PHP 100.0%