Solutions for managing knowledge as content & print

Key info for users and decision makers for Xerox DocuShare and ABBYY recognition products.

 

End of Life Announcement for the DocuShare Windows Client

Xerox Content Management has announced the end-of-sale and end-of-life for Xerox DocuShare Client, effective on September 17, 2017.

Self-serve information for the application is available on the DocuShare Customer Support Site, including this support article on how to continue to use existing customizations with the replacement product, Xerox DocuShare Drive.

Xerox DocuShare Drive

Direct customer support, maintenance builds, software fixes and upgrades will no longer be available for the Xerox DocuShare Windows Client as of September 17, 2017 in favor of our more recent offerings:

  • Xerox DocuShare Drive Outlook Integration – offers the ability to easily drag and drop email messages from Outlook to DocuShare collections for viewing and managing. The application supports managing DocuShare mail messages, documents, collections, and workspaces within Outlook.

  • DocuShare Integration for Microsoft Office – the lighter-weight alternative for simpler document access needs, and does not require DocuShare Drive.

DocuShare Drive can be downloaded from Xerox.com. System requirements are provided on the DocuShare Drive web page.
 
DocuShare 7 Update 1 and DocuShare Integration for Microsoft Office software can be downloaded from Xerox.com.

System requirements:

Microsoft Windows 7 with SP1 (32-and 64-bit), Microsoft Windows 8.1 (64-bit) or Microsoft Windows 10 (64-bit)
Microsoft Office 2013, Microsoft Office 2016 or Microsoft Office 365 (fully installed)

 

Midmarket Organizations Ready To Embrace Enterprise-Level AP Invoice Automation Strategies

Recently published in ECM Connection by Daneen Storc

Daneen Storc is a Senior Product Marketing Manager at ABBYY and is responsible for data capture and automation products and solutions with emphasis on invoice processing and accounts payable.  In this brief article she says that the gains made by large enterprise customers in processing invoices using data capture and automation technologies are now available and priced appropriately for smaller organizations.

Midmarket Organizations Ready To Embrace Enterprise-Level AP Invoice Automation Strategies

 

DocuShare Archive Server or Lifecycle Manager?

Should you use the DocuShare Archive Server or the newer Lifecycle Manager?  It depends.  If you want to retain documents for a long period of time in a manner than can be easily sought and found by DocuShare users while keeping your primary DocuShare server trim and easy to maintain, the the DocuShare Archive Server is your best choice.  If you've ever had to reindex a large DocuShare server with one million or more documents you remember that it can take a long time.  Days for about a million and perhaps several weeks for ten million documents.  Much of that depends upon whether you're indexing all possible document types including searchable PDFs and MS Office documents including spreadsheets and presentations.  Discussions with one of our customers with about 1.5M documents led us to conclude that turning off indexing for a lot of large Excel documents was in their best interests.  They generally search by customer and title anyway.  Some customers with really large repositories don't submit searchable PDF documents, but rather upload them with searchable metadata.  If the documents are forms or invoices all you really care about is the searchable metadata anyway.  Why bloat your IDOL index with labels and standard verbiage you won't be searching for?  So, by setting the expiration_date property to a date in the future you can instruct DocuShare to move your documents to the archive server on that date.  We can facilitate that by using DAVupDoc or DocuPage Pro or other solutions we build for customers.  Once on the DocuShare Archive Server you can go to Advanced Search and select whether to search your primary server by default or to search in the archives.  Properly configured, all your metadata will be passed to the archive server and help you to find your documents.  Full text search is also available on the archive server which can be used alone or in combination with search for metadata.  Your archive server can be less robust than your primary server as it will be used less.  If you're upgrading your primary server you can delegate the current one for use as the archive server.  VMs work, too.  There's one feature missing from the DocuShare Archive Server and that is the ability to delete files you no longer want.  Our solution to that is to use our DocuShare Archive Assistant add-on and to create another property we generally call the deletion_date to allow you to sweep through and select the files to delete and process them in batch mode.  We also like to add two more properties: hold and hold_reason.  We can flag these to not be deleted until the hold is reset to false.  The one complaint I've heard about the DocuShare Archive Server is that it doesn't recreate the collection or directory structure you see in the primary server.  Given that such structures are sometimes reorganized, that would be a bad idea.  If you can search for content and metadata, it won't be a problem to find your documents in the archive server.

Lifecycle Manager was introduced by Xerox for customers who don't want to set up two servers.  It's primarily used to delete old content and bypass an archival process.  Much of this came about due to the increasingly scalable nature of DocuShare over the years.  Xerox felt that you could keep all your files in one place and delete them when you no longer want them.  The efficacy of this has been helped by the inclusion of the ability to back up the full text IDOL indexes.  We recommend you have a revolving backup of your indexes so that if you find you do need to reindex your documents you can begin by restoring the previous day's backup and then reindexing any new content added since then.  We would also recommend that procedure for the DocuShare Archive Server.  Generally speaking, the Lifecycle Manager is used in conjunction with Content Rules so that you can automatically delete old content based upon policies you set up in Lifecycle Manager.  These can be quite flexible allowing you delete content based on the original file date or the last modified date.  Without Content Rules, Lifecycle Manager merely flags the content ready to be deleted and emails a report to you indicating that you can click on each file one-by-one to delete them.  That can be pretty tedious.  If you'd like the idea of combining archival and lifecycle management, note that Lifecycle Manager doesn't work on the DocuShare Archive Server.  Our DocuShare Archive Assistant can be used on both servers.

For additional questions about implementation, cost, and functionality, please contact us and we'll discuss and demonstrate the alternatives for you.  You probably shouldn't just pile on the old content forever.

 

Do you need structured or semi-structured capture?

Customers are often confused by these terms.  While "semi-structured" may not be the best term for such documents, they are the terms most commonly used in the data capture marketplace.  I hope this overview clears up the differences for you.

Structured documents are usually characterized as forms or surveys in which data appears in the same place every time for a particular form.  They may be structured even though there are varying versions or renditions of the form.  If you'd changed your form this year for 2017 so that some or all of the data shows in different places than the 2016 form, it can still be considered structured.  You'd accommodate that when using FlexiCapture by duplicating your 2016 document definition (template with rules) and making the necessary changes to accommodate the 2017 form.  FlexiCapture will be able to tell them apart if there are significant difference between the two.  If they're very similar, you could note the version of the form some place on the form, usually at the bottom, as the form ID.  You can then go to the Static Elements layer of the document definition and have it identify the specific form by the form ID by its name.  Barcodes can also be used to contain the form ID if the scans are of poor enough quality that reading the text of the form ID isn't reliable.  You can also use an additional square or corner anchor somewhere on the page to indicate its uniqueness from the other.  Changing the position of a solid square anchor along the bottom line should be sufficient.  You can accommodate a lot of different forms in this manner as FlexiCapture will analyze the page and match it to the appropriate document definition for capturing data. 

Sometimes structured (often referred to as fixed-form or zonal) forms still won't work well.  That can occur if the form is filled in using when HTML or Word or Excel or another form-filling medium where adding more text to a fill-in area of the form pushes the remainder of the form down the page.  This means that the structured approach won't work well when that happens.  That's when the need for semi-structured pages come into play.

ABBYY FlexiCapture accommodates semi-structure documents by using the FlexiLayout Studio.  Instead of drawing boxes for where you expect certain data to be, you set up rules to searching for and capture the data.  This means that when your form-filler person gets unusually verbose in their response and makes the form's box taller thereby pushing subsequent sections down the page or onto the next page that it won't affect the capture results.  More typical examples are invoices, orders, statements, and reports where everyone who sends you data uses their own unique layout.  It may be possible in some cases to create a unique structured document definition if you have a small number of respondents or suppliers sending you data that doesn't paginate down and onto subsequent pages, but that could turn out to be a lot of trouble if you have too much variation in them.  That's why invoices in particular are processed using semi-structured methodologies, as is accomplished using FlexiLayouts. 

With a FlexiLayout you would indicate that you're looking for an invoice number on the page.  You'd first say you're expecting to find "Invoice Number", "Invoice #", "Inv #", "Inv No,", or any other variation of that label to that effect.  When the FlexiLayout finds any one of these terms it notes its location on the page.  The next step is to say where you expect the data to be located in relationship to this label you've found.  We may generally assume the invoice number is located to the right of the label, but that may not always be the case.  You can create a unique region of the area to look for it so that if the invoice number sometimes appears to the right or below or above or even to the left of the label it's found, it can capture it.  You would do the same for the other fields on the page until you've captured all you need to extract to store in a file, spreadsheet, or database.  You can also indicate that your label and data may always be on page one or that it's generally on page two or three or the last page, as is often the case with a total amount.

While we're on this topic, let's discuss unstructured documents for a minute.  These are documents where data isn't typically labeled very well in the document, so it's important to use the right tools and methods to extract the information you're seeking.  These are often contracts or technical documents where more sophisticated methods of finding the information you need to extract is somewhere in the document, but you're often not sure where to find them.  Sometimes this can be done using FlexiCapture with FlexiLayouts and searching for data of a known format, such as SSNs, phone numbers, amounts, dates, etc..  Sometimes, that's not enough.  Here's another ABBYY product for unstructured documents.

ABBYY Compreno

Using advanced natural language processing (NLP) technology, ABBYY Compreno turns unstructured content such as contracts, leases and reports into powerful business intelligence assets.  Our unique technology “reads” documents to identify entities, facts, and the relationships between them.  Complex case documents are automatically analyzed, accelerating and optimizing decision-making by providing critical data points to knowledge professionals.  Compreno solutions also understand and accurately classify high volumes of unstructured information without the need for tedious, time-consuming manual reviews.  Integrating ABBYY’s leading OCR technology Compreno handles image and text documents with ease to drive integrated text analytic solutions.  Businesses can solve their “dark data” problems by allowing Compreno to shed light on their information, leaving knowledge professionals free to focus on using that information to make better-informed strategic decisions.

 

Rather than draw your attention to the scanner of the month, just contact us and let us know how you intend to use the scanner and we'll help you pick the right one and take advantage of any promotions or sales currently in force.  As always, we have great prices on our scanners. See them at our online store for models suitable for your needs.

If you're looking for a new scanner, be sure to shop with us for really good prices and the best side-by-side comparison page anywhere.  You'll find this and our unique application enablers for DocuShare and ABBYY products on our CriteriaFirstWare site.

If this is a good source of information, then forward our newsletter or links to others that could benefit.  We'd like to add them to our list, so write to us at This email address is being protected from spambots. You need JavaScript enabled to view it. and ask to be added and tell us what you'd like to know more about.