neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

For *.pptx DetermineFileFormat always returns null

v-pimenau opened this issue · comments

When I try to get format of the pptx file DetermineFileFormat always returns null

It should work - there is a minimal sample in the tests which passes.

The Powerpoint format works by attempting to search the pptx archive for a file named presentation.xml, which should be present. Perhaps there are writers which create non-standard versions (which I've seen before with Word documents). Are you able to provide a minimal file which cannot be detected?

Hi. sorry for the delay. It happens for each newly created *.pptx file.

I tried creating a fresh pptx via PowerPoint (Office 365 Version 2310) and it seems to be working as expected. Would you be able to upload a blank PPTX that you've created so I can have a look at it?

If that's not possible, then you can investigate yourself as follows:

  1. Unzip the .pptx file into a directory (it's a zip archive under the scenes)
  2. Open [Content_Types].xml from the root directory. This file will contain definitions of all the different
  3. Within that file look for a section similar to this:
    <Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/>

This defines the main presentation XML file and is what we are searching for to identify the PPTX. We look for 'presentation.xml' by default, with a slight fuzz factor so that small variations are also matched.

What do your PPTX files contain as the PartName for the presentation file?

Hi. Sorry for the delay. This issue appears when I create a new empty pptx document

Hey, thanks for sending the sample pptx - but when I try to download it, it appears to be zero bytes in size.
Could you try reuploading it / another sample?

Hey. It is should be zero bytes in size. Because this issue appears when pptx file is empty

Ah, this library works by reading the header bytes of a file to determine the format - so if the file has a zero length there isn't anything we can do, sorry. I had assumed you meant a blank document - which would work because it would contain the minimal zip/xml entries for a valid PowerPoint file.

Ok, got it. Thank you for your support. I will close this issue.