Is it possible to do OCR on a Tiff image using the OneNote interop API?

Is it possible to do OCR on a Tiff image using the OneNote interop API?

I've been using the MS Office Document Imaging Tools (MODI) api to do OCR, it has done well, but we are moving to Office 2010, and MODI no longer exists.

I see that OneNote has good OCR functionality (insert a Tiff image into a notebook page, right-click, choose "Copy Text From Picture" and it OCRs the image).  Is it possible to do this programmatically from the OneNote interop API as can be done now with MODI?


Tom Regan
 

Question Info


Last updated March 25, 2018 Views 6,616 Applies to:
Answer

Sadly MODI was removed from Office 2010.  That said, it is possible to get OCR results from OneNote programatically, but there's a catch (or two, as I'll explain below).  You can use the COM API (What's New for Developers in OneNote 2007) to extract content from OneNote, including OCR results.  OneNote content is returned via COM as XML, so you want the 'OCRData' children of the 'Image' elements (full schema at2007 Office System: XML Schema Reference).  You can also insert images via the COM API, and OCR will run on them.

Catch #1: The linked documentation is for OneNote 2007.  It will still work in OneNote 2010, but only if your notebook is in the 2007 file format.  Hopefully the 2010 schema will be posted soon...

Catch #2: Although you can insert images via COM, OCR runs in the background asynchronously so there's no guarentee OCR will be complete when you ask for it.  A Sleep(5000) should be sufficient in most cases, but lame...

Did this solve your problem?

Sorry this didn't help.

Great! Thanks for marking this as the answer.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this response?

Thanks for your feedback.