ruby-docx / docx

a ruby library/gem for interacting with .docx files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get XML from docx file

theasteve opened this issue · comments

I'm trying to convert a docx file into PDF. The process I thought about was as follows, convert the docx file into an HTML file and from HTML into PDF. However, using this process the outcome wasn't what I expected.
testing.pdf

This is what it looks like after the process mentioned above. Here is a link to the origin docx file
https://www.dropbox.com/s/f1klwguv4r9iyje/testing.docx?dl=0

I think word documents use XML so this might improve how documents are displayed if I saved the file from docx to xml and then into PDF(You might have better direction on this.)

So far I have doc = Docx::Document.open('testing.docx') When I try to get the XML from the document I get nil.

[61] pry(#<PDFProducer>)> doc.xml
=> nil

Can one get XML from the word document? Or am I wrong in my assumption that word documents use XML?

https://stackoverflow.com/questions/56450113/font-size-convert-docx-into-pdf-in-ruby-using-wickedpdf-and-docx

doc = Docx::Document.open('testing.docx')
File.open("testing.html", 'wb') do |f|
  f << doc.to_html
end

@unixmonkey Just saw your answer, I just updated my post. Should I closed it and open a new one? and bring the old question back to represent your answer? Yes, your answer is correct I came across it earlier.