Solutions for managing knowledge as content & print
Key info for users and decision makers for Xerox DocuShare and ABBYY recognition products.
DocuShare 7's new Content Rules Manager
Content Rules step you through a set of dialogs allowing you to establish workflows essential to good business process management (BPM). Content Rules are included in DocuShare Enterprise and available as an add-on to DocuShare Standard. DocuShare 7 helps you manage content rules more easily by providing you with a new centralized Content Rules Manager feature. All content rules throughout your DocuShare server can be accessed by logging in as a DocuShare site administrator. Content administrators don't have access to this new resource.
DocuShare 7 trial
Whether you're interested in just archiving documents or helping everyone in your company to work together better, consider downloading a 30 day trial. Click here: DocuShare 7 trial
When not to use full-text search
I have this discussion with prospects and customers fairly frequently. Perhaps you'd appreciate my perspective on this matter.
It's generally assumed that it's always best to have full text search capability against the content of your uploaded documents, but it's often not. DocuShare's full text search engine uses a database that gets larger with increased matter and use, as they all do. Contracts, reports, and documentation for products, service, parts, policies, and other unstructured documents should be uploaded in a searchable format so that you can find whatever words and phrases you need to arrive at an answer to your inquiry. Many documents consist of forms where half or more of the text is part of the form itself. Why index the entire form for anything other than the title and purpose of the form? That's one or two short, discreet units of data. These are referred to as structured documents. Sure, you'll want to search for the responses: name, job title, date the form was filled, and the specific responses IF you or others ever intend to search for that information. If you start with paper you'd use scanning and capture software to extract that data and a template-driven means of uploading the documents to your shared repository utilizing an underlying database where such data can be stored and queried. That's pretty easy to set up and use with products like ABBYY FlexiCapture.
Semi-structured documents have a common set of data in them, but the format and layout of the page is generally different from one vendor or customer to the next. This is an area that colleges and universities should spend an hour teaching people about various capture technologies so graduates would know what works and what doesn't work when dealing with invoices and orders and the like. Using OCR-based capture on invoices can prove pretty challenging at times when a company omits their name and places a logo on the page instead. There are ways around this limitation involving database lookups using order numbers and addresses, but this is why workarounds exists in the first place. Back to the main topic with focus on invoices, there's no value in making the entire invoice searchable. Using full text search to find an order number like "83289" may present search results including addresses, zipcodes, amounts, license numbers, various IDs, and other misleading numbers. Why have nine bad hits returned with the one invoice you're looking for? Even though DocuShare has the ability to search against the content of the document and all properties with data, why not focus on the specific property and data? The Quick Search function allows you to easy build a refined search interface for querying one or several fields to find the one to several documents you care about without the additional data debris drifting into view. FlexiCapture combined with the FlexiLayout Studio makes capturing important data possible and often pretty easy. FlexiLayouts are essentially versatile templates where you indicate what information you're wanting to capture and letting it find that information within the document. That's why it works well with invoices and orders and other semi-structured documents. When saving the PDF of the invoice in DocuShare we emit them in an image-only format so the indexing engine has less work to do and your search results are fast and accurate.
What should we do with these old documents?
At some point you're going to want to either delete or archive older documents and other content. Xerox has two key approaches to this, one is to install and connect to a DocuShare Archive Server to move expired content to another server that you can still search and retrieve from, and the other is to use the DocuShare Lifecycle Manager to delete older content. The Lifecycle Manager can treat various object types differently, so documents can be retained indefinitely while invoices and orders are purged after 7 years. Everything is very configurable. Combined with Content Rules, it will delete the documents for you. You may want to use some safeguards, such as placing a Hold on certain documents and other content like calendar events. If so, contact us to work that out with you. It may be possible to use Lifecycle Manager and Content Rules together with the Export Document function on DocuShare Enterprise systems.
Over the years people have wanted to archive to folders on their file system or external storage media. There's not a simple way to do that, but we may be able to use our DSWalk utility for that. Again, we should talk to see if that's a good fit. The DSExport utility outputs everything to a Documents folder with either an XML or CSV file for the metadata and includes pointers to the documents. The documents are stored with handles and you can't just click on them to open them. Everything's in one folder. It's useful for transferring content to another system, but not good for archiving.
Fujitsu has a trade-in special available through the end of September in case you're thinking you'd like to update from an older scanner whose input capacity is too small or it's just too slow overall. This is good for either competitive scanners or older Fujitsu scanners. Small scanners are getting faster all the time. Two great scanners to consider for upgrade are the fi-7160 for small batches and the fi-6400 for medium size batches. If you're interested, write to me and indicate what you'd like to trade in and I'll inquire about the value of that trade-in and get back to you. Fujitsu will pick up your current scanner when they deliver the new one.
Capturing data from checks
Question: What's the best way to read the MICR (Magnetic Ink Character Recognition) line at the bottom of a check?
Answer: Use a check scanner. (Duh!)
What about reading other data from the check like the payee, date, bank name, account number, and the amount when you have a personal check? Well, you'll need OCR/ICR* for that. What check scanner does both? None, really, but you can use data capture software to extract this additional information. When I'd called around to find a supplier that would do this I found the pricing started at about $20,000. Yes, they're either out of their minds or hoping you are. We needed a simple and inexpensive solution that would combine both MICR and OCR/ICR and found one. They wanted to set us up as if we're a bank with multiple locations and tellers. All we needed was a utility to read the MICR line and output the image along with a data file containing the contents of the MICR line as a pair of similarly named files. Incidentally, some OCR software can read MICR characters, but when peoples' descenders (think: g, j, p, q, and y) swoop down over the MICR characters it interferes with optical recognition. The check scanner picks up on the magnetic shapes of the characters and doesn't care about a little ink from your pen getting in the way. It turns out that Canon MICR scanners come with this simple utility we needed to do the job. Cost? Free. The models that include this software utility are the CR-50 and CR-80. We sell these and can help you set them up with ABBYY FlexiCapture to capture all the important data from your checks and feed that into a database with an image of the check. Is that something you'd like some expert help with? Call us at 972-492-4428.
* OCR = Optical Character Recognition for typed or printed letters and numbers and other symbols like currency and punctuation.
ICR = Intelligent Character Recognition that extends to reading checkboxes, bubbles, barcodes, and handprint.
Scanner of the Month
When you get right down to it, who cares what the scanner of the month is? The question is, does it address your needs? Many people like combo scanners that can quickly work with documents and books or magazines.
For those requiring such versatility, the new Panasonic KV-S7097 is certainly worthy of their consideration. Have a look.
As always, we have a great price on these units. See it at our online store along with bigger and faster models.
If you're looking for a new scanner, be sure to shop with us for really good prices and the best side-by-side comparison page anywhere. You'll find this and our unique application enablers for DocuShare and ABBYY products on our CriteriaFirstWare site.