agitator / hachoir-parser

hachoir-parser : package of most common file format parsers written for Hachoir framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hachoir-parser is a package of most common file format parsers written for
Hachoir framework. Not all parsers are complete, some are very good and other
are poor: only parser first level of the tree for example.

A perfect parser have no "raw" field: with a perfect parser you are able to
know *each* bit meaning. Some good (but not perfect ;-)) parsers:

 * Matroska video
 * Microsoft RIFF (AVI video, WAV audio, CDA file)
 * PNG picture
 * TAR and ZIP archive

GnomeKeyring parser requires Python Crypto module:
http://www.amk.ca/python/code/crypto.html

Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser

hachoir-parser 1.3.4 (2010-07-26)
=================================

 * update matroska parser to support WebM videos

hachoir-parser 1.3.3 (2010-04-15)
=================================

 * fix setup.py: don't use with statement to stay compatible with python 2.4

hachoir-parser 1.3.2 (2010-03-01)
=================================

 * Include the README file in the tarball
 * setup.py reads the README file instead of using README.py to break the
   build dependency on hachoir-core

hachoir-parser 1.3.1 (2010-01-28)
=================================

 * Create MANIFEST.in to include extra files: README.py, README.header,
   tests/run_testcase.py, etc.
 * Create an INSTALL file

hachoir-parser 1.3 (2010-01-20)
===============================

 * New parsers:

   - BLP: Blizzard Image
   - PRC: Palm resource

 * HachoirParserList() is no more a singleton:
   use HachoirParserList.getInstance() to get a singleton
 * Add tags optional argument to createParser(), it can be used for example to
   force a parser
 * Fix ParserList.print_(): first argument is now the title and not 'out'.
   If out is not specified, use sys.stdout.
 * MP3: support encapsulated objects (GEOB in ID3)
 * Create a dictionary: Windows codepage => charset name (CODEPAGE_CHARSET)
 * ASN.1: support boolean and enum types; fix bit string parser
 * MKV: use textHandler()
 * AVI: create index parser, use file size header to detect padding at the end
 * ISO9660: strip nul bytes in application name
 * JPEG: add ICC profile chunk name
 * PNG: fix transparency parser (tRNS)
 * BPLIST: support empty value for markers 4, 5 and 6
 * Microsoft Office summary: support more codepages (CP874, Windows 1250..1257)
 * tcpdump: support ICMPv6 and IPv6
 * Java: add bytecode parser, support JDK 1.6
 * Python: parse lnotab content, fill a string table for the references
 * MPEG Video: parse much more chunks
 * MOV: Parse file type header, create the right MIME type


hachoir-parser 1.2.1 (2008-10-16)
=================================

 * Improve OLE2 and MS Office parsers:
   - support small blocks
   - fix the charset of the summary properties
   - summary property integers are unsigned
   - use TimedeltaWin64 for the TotalEditingTime field
   - create minimum Word document parser
 * Python parser: support magic numbers of Python 3000
   with the keyword only arguments
 * Create Apple/NeXT Binary Property List (BPLIST) parser
 * MPEG audio: reject file with no valid frame nor ID3 header
 * Skip subfiles in JPEG files
 * Create Apple/NeXT Binary Property List (BPLIST) parser by Robert Xiao

hachoir-parser 1.2 (2008-09-03)
===============================

 * Create FLAC parser, written by Esteban Loiseau
 * Create Action Script parser used in Flash parser,
   written by Sebastien Ponce
 * Create Gnome Keyring parser: able to parse the stored passwords using
   Python Crypto if the main password is written in the code :-)
 * GIF: support text extension field; parse image content
   (LZW compressed data)
 * Fix charset of IPTC string (guess it, it's not always ISO-8859-1)
 * TIFF: Sebastien Ponce improved the parser: parse image data, add many
   tags, etc.
 * MS Office: guess the charset for summary strings since it could be
   ISO-8859-1 or UTF-8

hachoir-parser 1.1 (2008-04-01)
===============================

Main changes: add "EFI Platform Initialization Firmware
Volume" (PIFV) and "Microsoft Windows Help" (HLP) parsers. Details:

 * MPEG audio:

   - add createContentSize() to support hachoir-subfile
   - support file starting with ID3v1
   - if file doesn't contain any frame, use ID3v1 or ID3v2 to create the
     description

 * EXIF:

   - use "count" field value
   - create RationalInt32 and RationalUInt32
   - fix for empty value
   - add GPS tags

 * JPEG:

   - support Ducky (APP12) chunk
   - support Comment chunk
   - improve validate(): make sure that first 3 chunk types are known

 * RPM: use bzip2 or gzip handler to decompress content
 * S3M: fix some parser bugs
 * OLE2: reject negative block index (or special block index)
 * ip2name(): catch KeybordInterrupt and don't resolve next addresses
 * ELF: support big endian
 * PE: createContentSize() works on PE program, improve resource section
   detection
 * AMF: stop mixed array parser on empty key

hachoir-parser 1.0 (2007-07-11)
===============================

Changes:

 * OLE2: Support file bigger than 6 MB (support many DIFAT blocks)
 * OLE2: Add createContentSize() to guess content size
 * LNK: Improve parser (now able to parse the whole file)
 * EXE PE: Add more subsystem names
 * PYC: Support Python 2.5c2
 * Fix many spelling mistakes

Minor changes:

 * PYC: Fix long integer parser (negative number), add (disabled) code
   to disassemble bytecode, use self.code_info to avoid replacing self.info
 * OLE2: Add ".msi" file extension
 * OLE2: Fix to support documents generated on Mac
 * EXIF: set max IFD entry count to 1000 (instead of 200)
 * EXIF: don't limit BYTE/UNDEFINED IFD entry count
 * EXIF: add "User comment" tag
 * GIF: fix image and screen description
 * bzip2: catch decompressor error to be able to read trailing data
 * Fix file extensions of AIFF
 * Windows GUID use new TimestampUUID60 field type
 * RIFF: convert class constant names to upper case
 * Fix RIFF: don't replace self.info method
 * ISO9660: Write parser for terminator content

Parser list
===========

Archive
-------

* 7zip: Compressed archive in 7z format
* ace: ACE archive
* bzip2: bzip2 archive
* cab: Microsoft Cabinet archive
* gzip: gzip archive
* mar: Microsoft Archive
* rar: Roshal archive (RAR)
* rpm: RPM package
* tar: TAR archive
* unix_archive: Unix archive
* zip: ZIP archive

Audio
-----

* aiff: Audio Interchange File Format (AIFF)
* fasttracker2: FastTracker2 module
* flac: FLAC audio
* itunesdb: iPod iTunesDB file
* midi: MIDI audio
* mod: Uncompressed amiga module
* mpeg_audio: MPEG audio version 1, 2, 2.5
* ptm: PolyTracker module (v1.17)
* real_audio: Real audio (.ra)
* s3m: ScreamTracker3 module
* sun_next_snd: Sun/NeXT audio

Container
---------

* asn1: Abstract Syntax Notation One (ASN.1)
* matroska: Matroska multimedia container
* ogg: Ogg multimedia container
* ogg_stream: Ogg logical stream
* real_media: RealMedia (rm) Container File
* riff: Microsoft RIFF container
* swf: Macromedia Flash data

File System
-----------

* ext2: EXT2/EXT3 file system
* fat12: FAT12 filesystem
* fat16: FAT16 filesystem
* fat32: FAT32 filesystem
* iso9660: ISO 9660 file system
* linux_swap: Linux swap file
* msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
* ntfs: NTFS file system
* reiserfs: ReiserFS file system

Game
----

* blp1: Blizzard Image Format, version 1
* blp2: Blizzard Image Format, version 2
* lucasarts_font: LucasArts Font
* spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
* zsnes: ZSNES Save State File (only version 143)

Image
-----

* bmp: Microsoft bitmap (BMP) picture
* gif: GIF picture
* ico: Microsoft Windows icon or cursor
* jpeg: JPEG picture
* pcx: PC Paintbrush (PCX) picture
* png: Portable Network Graphics (PNG) picture
* psd: Photoshop (PSD) picture
* targa: Truevision Targa Graphic (TGA)
* tiff: TIFF picture
* wmf: Microsoft Windows Metafile (WMF)
* xcf: Gimp (XCF) picture

Misc
----

* 3do: renderdroid 3d model.
* 3ds: 3D Studio Max model
* bplist: Apple/NeXT Binary Property List
* chm: Microsoft's HTML Help (.chm)
* gnomekeyring: Gnome keyring
* hlp: Microsoft Windows Help (HLP)
* lnk: Windows Shortcut (.lnk)
* ole2: Microsoft Office document
* pcf: X11 Portable Compiled Font (pcf)
* pdf: Portable Document Format (PDF) document
* tcpdump: Tcpdump file (network)
* torrent: Torrent metainfo file
* ttf: TrueType font

Program
-------

* elf: ELF Unix/BSD program/library
* exe: Microsoft Windows Portable Executable
* java_class: Compiled Java class
* pifv: EFI Platform Initialization Firmware Volume
* prc: Palm Resource File
* python: Compiled Python script (.pyc/.pyo files)

Video
-----

* asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
* flv: Macromedia Flash video
* mov: Apple QuickTime movie
* mpeg_ts: MPEG-2 Transport Stream
* mpeg_video: MPEG video, version 1 or 2

Total: 78 parsers

About

hachoir-parser : package of most common file format parsers written for Hachoir framework

License:GNU General Public License v2.0


Languages

Language:Python 100.0%