converted from PDF to Word

I am working with a document that was converted from PDF format.

The document looks like a table but each piece of text is in a text box. Is there a way to convert this to a Word table and have the format remain the same? I know how to remove the text boxes and I've also tried converting them to frames but no matter what I do, I keep running into a problem with retaining the format.

I don't know of any way to do this once it is in Word. However there are several ways to extract the content from the PDF and perhaps you lucked into the wrong method for converting to Word.

What steps did you take to convert from PDF? What tools were used? Are you able to post the original PDF so we can test why your conversion method was flawed?

Andrew Lockton
Melbourne Australia

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Did you "save" the PDF file as a Word file? Once you save it in Word format you should not have trouble formatting.

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Frankly it would be quicker to re-type the document than reformat it to remove text boxes and retain the formatting. It is a complex process to convert what is essentially an image of the document into an editable document and retain the formatting. The results are also affected by the means by which the PDF was originally created. A PDF created from a Word document will be easier to process than a PDF created from a scanned document.

Quite a few methods produce a document made up of a series of editable texts in text boxes. Others produce documents formatted with frames, which are a bit easier to work with, and a few will produce a document formatted with styles. If your OCR software cannot create styles, then you would be better forgetting the formatting and extract to plain text, then add back suitable styles.

Graham Mayor (Microsoft Word MVP 2002-2019)
For more Word tips and downloads visit my web site
https://www.gmayor.com/Word_pages.htm

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Seconding Graham's remarks...

Converted documents (from anything, using any conversion method I know about) result in documents that are, at a minimum, difficult to edit. They usually will look like the original document and print like the original document; that seems to be the best conversion software can do.

They will often have a number of unneeded section breaks as well.

If I want to do much editing in a converted document or use it as a template for other documents I will either recreate it from scratch or copy it as plain text and add the formatting.

One other wrinkle with OCR is although it is very good, it is still far from perfect. Extensive proof-reading is required. Although an error rate of 1% or 2% isn't bad, it means a couple of words per page can be off.

Volunteering to "pay forward" the help I've received in the Microsoft user community.


Charles Kenyon
Sun Prairie, Wisconsin
wordfaq[at]addbalance[dot]com

Legal site: https://addbalance.com

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

This document is an invoice that is created from an outside vendor. They just recently converted to using a PDF format (previously was an Excel file). We take the data from this invoice and upload it into a custom application we have.

I have Adobe Acrobat X and used the "file > save as" method to save as a Word file.

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

What I need to do is take key pieces of data from this PDF and upload it to another application.

The Word macro I wrote to extract the data from the text boxes works except that there always seem to be a couple that are out of order in my end result. 

My macro is selecting them in the order they are numbered (textbox1, 2, etc.) not the order in which they are displayed. Is there a way I can select them the way they appear (left to right) in the file?

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Seriously, I would talk to the vendor and see if you can get it in Excel format, perhaps in addition to the pdf.
Volunteering to "pay forward" the help I've received in the Microsoft user community.


Charles Kenyon
Sun Prairie, Wisconsin
wordfaq[at]addbalance[dot]com

Legal site: https://addbalance.com

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Provided they are text boxes (and not frames) are always in the same positions and always have the same numbering sequence between documents, it shouldn't matter what that position is or what the text box numbers are. You just have to extract the data and place it where you require. 

You could perhaps base your code on something like the following which will give you the text box numbers and the current content. When you know which text box contains which content you can use the value of orng to collect the data from the appropriate box and put it where you want

Sub Macro1()
Dim oDoc As Document
Dim oShape As Shape
Dim oRng As Range
Dim i As Long
    Set oDoc = ActiveDocument
    For i = 1 To oDoc.Content.ShapeRange.Count
        Set oShape = oDoc.Content.ShapeRange(i)
        If oShape.Type = msoTextBox Then
            Set oRng = oShape.TextFrame.TextRange
            MsgBox "Text Box " & i & ": = " & oRng.Text
        End If
    Next i
End Sub

Graham Mayor (Microsoft Word MVP 2002-2019)
For more Word tips and downloads visit my web site
https://www.gmayor.com/Word_pages.htm

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

 
 

Question Info


Last updated October 1, 2021 Views 400 Applies to: