t1gor / DocumentsParser

DOCx parser class in php

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOCx files parser

Parses the docx file and returns an html string. Any help or critics appreciated.

Supported elements:

  • paragraphs (w:p),
  • images (pic:pic),
  • links (w:hyperlink),
  • tables (w:t),
  • boorkmarks (w:bookmarkStart),
  • lists.

TODO:

  • shapes support
  • table cell styles
  • add links filter
  • images optimization (remove extra equal images)
  • add i18n

Known issues:

  • memory consuming
  • for big files, executing more than 30 sec (default timout time)

Usage example:

<?php
    // load lib
	require_once('DocumentsParser.php');

	// init parser
	$parserSettings = array(
		'filesDestinationFolder' => 'images',
	);

	$defaultStyles = array();

	$parser = new DocumentsParser($parserSettings, $defaultStyles);

	// parse DOCx
	$html = $parser->parseFile('test_document.docx');

	// save content to file
	file_put_contents('test_document.html', $html);
?>

Reminder links:

About

DOCx parser class in php


Languages

Language:PHP 100.0%