ruby-docx / docx

a ruby library/gem for interacting with .docx files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to read Binary String

Jasmeet2011 opened this issue · comments

Describe the bug

I am reading a Docx file saved as Blob field in Mysql database. The output from the Mysql table is in the form of a Binary String as extracted from "Event" of Logstash. I am able to write the binary string to a file and then read it using Docx. However, if i pass the data directly to Docx, it gives error.

To Reproduce

Steps to reproduce the behavior or put a short code to reproduce the bug.

example

require 'docx'
# I WRITE THE BINARY STRING TO A DOCX FILE AND READ IT
File.binwrite('c:\path\filename.doc', event.get('Blob field'))
doc = Docx::Document.new('/path/to/your/docx/filename.docx')
#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))
# TRIED TO CONVERT THE DATA TO A STRINGIO, BUT DID NOT WORK
file_to_read=StringIO.New(event.get('Blob field'))
doc = Docx::Document.new(file_to_read)

## Expected behavior

Is there a way to pass stringIO directly to Docx or any other way around to circumvent writing the file to Disk and then reading it.
Sorry for the wrong Label

## Environment
- Ruby version: [e.g 2.7.1]
- `docx` gem version: [e.g 0.5.0]
- Windows

What event.get('Blob field') returns exactly ?

Thanks for the response.
As per the documentation of Logstash
Syntax: event.get(field)
Returns: Value for this field or nil if the field does not exist. Returned values could be a string, numeric or timestamp scalar value.

  • In my case, the field is a Blob stored in Mysql table. According to definition of Blob:

BLOB values are treated as binary strings (byte strings). They have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in column values.
So event.get('Blob field') should return binary strings

@Jasmeet2011 can you provide a sample of your "binary string" ?

I can read the binary string and write it as a Word Document. I can send the Word doc as read from the Event API however the binary string when written as a file using
File.binwrite('new.docx',event.get('resume')) #Where 'resume' is the field containing the Blob.(url)
can be read using Docx.
I don't know of any other way to copy the Binary string. Pl suggest.
new.docx
Copy of the file

So i managed to view part of the Blob data content
#<Sequel::SQL::Blob:0x840 bytes=7093 start="PK\x03\x04\x14\x00\b\b\b\x00" end="<\x02\x00\x00c\x19\x00\x00\x00\x

#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))

@Jasmeet2011 could you give us the error messages and backtraces appearing at this line?

sure, i will revert

THIS DOES NOT WORK
doc= Docx::Document.new(event.get('resume'))

This is the Error I receive:

`][ERROR][logstash.filters.ruby    ] Ruby exception occurred: string contains null byte'

'C:/Users/sun/Downloads/elk/logstash-6.8.0/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated'
'{
    "first_name" => "Janine ",
           "dob" => 1980-01-03T18:30:00.000Z,
          "tags" => [
        [0] "_rubyexception"
    ],
         "email" => "janine.l@gmail.com\r",
      "@version" => "1",
    "@timestamp" => 2020-06-27T05:33:01.018Z,
            "id" => 4,
     "last_name" => "Labrune",
         "phone" => "(406) 785-5588",
          "type" => "docx",
        "resume" => #<Sequel::SQL::Blob:0x80a bytes=5568 start="PK\x03\x04\x14\x00\b\b\b\x00" end="<\x02\x00\x00n\x13\x00\x00\x00\x00">
}`