You want to run your business, but letters, bills, receipts, and other paper documents keep piling up. Some must be kept for years as tax records. What can we do with this growing mountain of paper, besides buy another filing cabinet? Here’s the answer…
Don’t File It, Scan It!
We’re buried in paper, despite our best attempts to stay organized. Fred Sanford’s “put it back in the mailbox” solution only works for a limited time, so fortunately there’s a better solution. Digital scanners create space-saving electronic replicas of any document. And with the right software, information can be extracted from digital scans and saved in Microsoft Office and other useful formats that allow you to recall a document with a keyword search.
The Neat Company specializes in just that kind of solution. Originally named NeatReceipt, the firm bundles digital scanners with its proprietary software that extracts data from scanned documents and turns it into meaningful digital reports. Business cards scanned with the Neat system, for instance, are automatically converted into Outlook contact records. Receipts are parsed into Excel worksheet cells, generating expense reports for tax purposes. Paper documents can be turned into editable, searchable text.
EdocScan is a software package that will save scanned items in a searchable database. It can scan or import previously scanned invoices, bank statements, and other documents. And when needed, you can export the information to an Excel spreadsheet for analysis or tax preparation.
Optical Character Recognition (OCR) is at the heart of these scanning technologies. Raw digital scans are simply images composed of pixels, not text characters. OCR recognizes the patterns of pixels that form characters and translates them into text that can be manipulated and edited.
Many scanners today tout their ability to create PDF files. But not all take the extra step of using OCR to create an invisible overlay of human-readable text, which makes the PDF searchable. A searchable PDF is one whose text you can search when you have the PDF file open. It’s also useful to file search utilities that have the ability to search for words inside a file. Have you ever looked through a stack of paper documents, searching for one page that contained a specific word or phrase? For example, let’s say you have a bunch of printed bank or credit card statements. When it’s tax time, and you’re looking for a specific transaction. If you scan those documents into searchable PDFs, the search becomes almost trivial.
Popular Scanners and OCR Software
In addition to the Neat product line, the Fujitsu ScanSnap familyof scanners all feature scan-to-PDF and OCR capabilities. Prices range from about $189 to $495, and the ScanSnap scanners come in a spectrum of forms from ultra-portable to high-speed office workhorse. My accountant uses a ScanSnap to create PDF copies of all my tax documents, instead of making paper copies that would need to be filed in a cabinet. The Kodak Scanmate i1120 scanner can scan a business card or a document up to 34 inches long. The WorldocScan 400 by PenPower is a mobile scanner that sells for around $100. Canon’s CanoScan $79.99 LiDE210 Color Image Scanner ($79) also makes searchable PDFs.
If you already have a scanner, you need only some software to make searchable PDFs. Scan2PDF (http://scan2pdf.org) is a German software package that works with home or commercial scanners, and even with digital cameras, to produce searchable PDF files. It runs under all versions of Windows. Scan2PDF is free to try for 30 days, and costs about $40 to register.
Perhaps even easier, you can try a few online options that require no hardware or software. I recently discovered that Google Drive will automatically convert PDFs (up tp 2MB in size) into searchable text, if you configure the upload settings in a certain way. OnlineOCR.net is a free web-based service that lets you upload PDF, JPG and other graphical formats (up to 4MB), and convert into plain text, Word and other formats. In supports 30+ languages, and will even accept ZIP files containing multiple input documents.
No OCR software is perfect. The layout, color, contrast, font style, and many other aspects of the source document affect the quality of your results. If the critical key word(s) on your scanned receipt are misspelled by OCR software, you may never find it in a search. Since the searchable text in a PDF file is invisible, you can’t even proofread it. That’s why I often scan into a Word document, where the OCR’d text is visible and can be edited to clean up any errors.
Scanning paper documents to PDF or other digital formats is a great way to save space, preserve the document’s readability, and make them more useful. Well-designed file-naming and folder systems are all the organization that most users will need for their digital scans.
Do you have something to say about digital scanners, optical character recognition, or searchable PDFs? Post your comment or question below…