How can I convert scanned handwritten tables to Excel spreadsheets?

Until now, my grandparents handwrote their financial records, but their non-cursive handwriting is neater and more intelligible than the pictures beneath. After they scan each page, can Excel 2019 automatically and forthwith convert the scanned image to an Excel spreadsheet? Even if OCR recognizes the text and numbers, arranging each text and number will consume too much time. Here's the second picture's source. This 2016 Reddit post yields nothing helpful. enter image description here

asked Apr 2, 2019 at 23:45 user269574 user269574

I've used an OCR package (SharpDesk; not sure if it is still available) that came with a Sharp copier/scanner/printer. It did a pretty good job of converting a scan of columnar data to a format that Excel could ingest. Even so, there were errors, so we had to have employees review everything. But it was still faster than hand-entering the data. However, that was typed/printed images, not hand-written, like your example is. I suspect the accuracy with a hand-written source will be so low that it won't be worthwhile.

Commented Apr 2, 2019 at 23:52

What version of Excel currently you are using,, coz recently on Twitter @msexcel handle I got a news that using latest version, handwritten Table can be converted into Spreadsheet !

Commented Apr 3, 2019 at 4:43 @RajeshS Excel 2019 – user269574 Commented Apr 3, 2019 at 4:49

I'm sure 2019 has the feature to solve the issue,, check the Twitter handle and this also,, youtube.com/watch?reload=9&v=JNfDR-Nx4Qc !

Commented Apr 3, 2019 at 4:54 @Greek-Area51Proposal,, read this article microsoft.com/en-us/microsoft-365/blog/2019/02/28/… Commented Apr 3, 2019 at 9:19

2 Answers 2

With any computer to which you would have access, you can't do anything useful to go from handwritten records to Excel.

There are at least three difficult tasks:

Distinguishing "content" from non-content.
Recognizing the layout and translating that to cell locations.
Recognizing the handwritten characters and translating them to text.

Consumer software and online services are available and do a reasonable job of converting machine-printed text that is in clean table format to a spreadsheet file. But even the best can be far from perfect. That's just the task of assigning text to the right cell based on its position.

When you look at those images, your brain is very good at sorting out what is "preprinted form", what is content, what is noise, and what is human markings that aren't relevant. You can recognize how things are aligned, and what goes with what based on context. To the computer, everything that isn't the background color is "something". Figuring out what of that is important to you, and what could potentially be some kind of character to be translated is extremely difficult. And if the content overlaps preprinted lines, that introduces breaks and missing data that the computer can't easily handle.

Take your images, for example. The first image is a lost cause. Much of it ignores the lines and layout. You would have the additional task of separating and removing the preprinted grid from the content. In the second image, the content is mostly within the bounds of the grid, but there are lots of stray markings (slashes, underlines, etc.) that would require cleanup.

The toughest part, though, is recognizing handwriting and converting that to computer text. For image 1, even humans would have trouble figuring out what some of that is, and it would involve a lot of guessing based on context and familiarity with the words. In image 2, most of the numbers aren't too bad, but the text would be a problem.

If your grandparents' records are non-cursive, and neat, legible, consistent, and similar to machine printing, OCR might do a "reasonable" job on it. But you would still have a lot of cleanup.

For perspective, the US Postal Service has some of the most advanced handwriting recognition, which it uses to read addresses on mailpieces so they can be sorted with automated equipment. The only way they are able to do it is because the addresses are in a prescribed structure and format, and they know every possible address ahead of time. The objective is more to match the handwritten addresses to viable candidates than to get every character right.

There is a ton of redundancy. If you can only decipher half of the characters, there still may be only one or a few possible matches. Even with that, a substantial portion requires human intervention. When it's done and the mail gets to the carrier for delivery, the carrier knows the addresses and names on their route, and they check it all to ensure that the addresses weren't misinterpreted.

That's the level of handwriting OCR with state-of-the-art technology and an extremely controlled range of possibilities to compare against. Your task needs to translate every character. You don't have a master list of all the words that could legitimately be in those records (other than a dictionary of the entire language). OCR would require so much cleanup that it would be faster to simply read the records and type them into Excel. That's not an unusual task, and professional data entry people can do it pretty quickly and inexpensively.