Are All Documents Created Equally?

Every once in a while, headlines are made of an airline reservation system failing for the day, causing inconvenience to thousands and losses of millions to the airline company. Similarly, banking websites go down, as do email systems and so many other things that rely on software and the internet.

We know it’s not intentional, and when whatever change was made that caused the failure to occur, it was most likely done carefully and thoughtfully. So what happened? Since the causes of these events are generally kept secret, we can only speculate. But those of us in the software industry know the likely cause – not all of the test scenarios were tried and something unanticipated or considered irrelevant rose up and bit them. It’s a complex, complex world out there.

I still recall a study done probably 20 years ago evaluating software control of nuclear reactors where the determination was made that no amount of testing would be enough to assure 100% safety. They concluded hardware control was better because it had a finite number of control paths and they could test all of those.

Why am I getting into this? Because even in the document processing industry, what you don’t know can bite you. My partner taught me years ago that programming to an image or document specification was only about 50% of the effort in providing robust support. The remaining 50% is in handling the corrupt, faulty or borderline documents that aren’t covered by the spec. The mature players in any industry typically have learned this lesson the hard way with trial and error and now know how to make it work.

Did you know that Adobe Acrobat fixes corrupt documents as you open them so they can be handled properly? And if there’s an error message, it generally flashes by invisibly? Most users never know their documents are malformed. Even though Adobe wrote the spec, they realized that being sticklers to the exact interpretation was a losing proposition--both from preserving the reputation and acceptance of the Acrobat standard and for their continued business. It’s true for Snowbound as well. Over the 14 or so years we’ve been developing our Adobe Acrobat PDF support, we have learned to handle thousands of improper documents. Sometimes we even handle documents Adobe can’t or won’t do themselves.

So some words of advice:

  • Beware of the new player on the block. Reading some standard documents correctly does not guarantee all documents will work - and you would hate to find this out in the middle of a production run of millions of documents.
  • New document creators come along frequently and they often don’t create documents correctly. The PDF spec is over 1,000 pages. (Often the same developer that created the document output code created the reader that could open it and they may not have bothered checking if Adobe or others could handle the output.)
  • Relating to above, just because you are processing many documents and they’re working well doesn't necessarily mean that a new supply of documents will be guaranteed to work. You always have to be prepared for surprises – meaning you need to make sure you perform test runs on new document sets.
  • What should you do if you’re taking on a new vendor (even one like Snowbound with years of experience) and you have millions of documents in your repository? Frankly, do a lot of testing! Maybe you can’t test them all, but try getting representative varieties of documents and do test those in your pilot run.
  • Keep your old document reading software around and working. You may need it in an emergency if you learn that an old supplier wasn’t adhering to the spec and the newer products won’t read those old documents. Additionally, as the digital age gets “more” mature, products are discontinued and companies get bought or are merged. (Check how many document companies Oracle or IBM have bought over the past five years.) Perhaps the servers that ran those older programs die? There’s an increasing risk that older document formats become unreadable because you won’t have the products running that used to read them. Ultimately, if you need to keep those documents available, you may need to convert them to a “safe” format like PDF or at least keep testing your current product against those older documents.
  • Lastly, make sure your current vendors are actively improving their product. There are new document generators continuously coming to market, and with competitive pressures what they are, testing may be sacrificed. So all reputable players in the document handling market need to keep enhancing and improving their products.  That’s why Snowbound still develops and enhances our own document technology wherever we can and does extensive testing of any technology we license.

  So let’s toast to proper testing and being ready for the surprises that inevitably come.

Online Demo