I have briefly written about this topic in earlier Blog postings, but I wanted to elaborate a bit more on the topic. Here is an article of fundamental importance that I have kept as reference material over the years: http://www.networkworld.com/news/2007/012307-wasted-searches.html
I think Susan Feldman at IDC is one of the leading thinkers in this area, and I completely agree with her views. Simply put: content without context is incomplete, just as search without metadata is incomplete. Here is another article from back in 1999: http://www.21cfrpart11.com/files/library/miscellaneous/metadata_cio_council0299.pdf
The key highlights are as follows:
- Metadata is one of the biggest critical success factors to sharing information.
- Metadata can make your information sharing and storage efforts great successes, or great failures. Metadata can get you in trouble with the law, or keep you out of such trouble.
- The alternative to metadata management is information chaos.
Even the Government is starting to understand the importance of metadata for information sharing: http://civsourceonline.com/2010/05/06/new-report-suggests-using-a-metadata-process-to-improve-gov-info-sharing-accuracy/
I have not worked with a single company who has addressed the problems above, and that means that there is still information chaos within every single company. I know this is a strong statement, but I am willing to stand by it!
There are some great search tools out there. But search tools can only find what is ‘indexable’ (and a bit more, via combining it with text mining, and semantic approaches). But this this is still not enough. I strongly believe that what is needed is to track all metadata on the object level across the enterprise, and to combine this with search results, in a faceted result set. Why is this important? Because Enterprise Search is not Web Search! We are not looking for Web pages, and ranking algorithms based on how many hyperlinks point to Site! We are looking for documents, and we often need to find every single one of them, for compliance or other reasons. The only way to do this is a faceted result set, which allows us to drill down precisely in the result set. And metadata is the metadata is the ‘sorting mechanism’ that allows us to do that. Now: what kind of metadata do we need exactly? We need the following: taxonomy-driven metadata, folksonomy-driven metadata, user-defined metadata (on an individual level), and semantic (or meaning-based) metadata.
The above may sound scary and complicated, no doubt. But the good news is that a whole generation of new technologies is coming along to solve this problem. First of all, the Office 2007 System in and of itself is a revolutionary product. For the first time, what we have is an ‘encapsulated nugget of information’ – that means that the metadata ‘travels with the document’, given that there is a separate ‘document part’ for metadata within the document. This combines content and context, and solves the problem all legacy ECM systems have: when a document is checked out, it knows nothing about itself any more. The document has been removed from the system, but its metadata still reside within a database table in the ECM system. This leads to a huge compliance-related risk that companies are not equipped to handle. But I will admit that only a small fraction of corporate content resides in Office 2007 today. However, we have a great set of tools to manage metadata on the back end, on an enterprise level. As I have written about earlier, the the NextPage Information Tracking Platform is able to track any content across the enterprise via its unique ‘digital threading technology’. When all this comes together in an integrated fashion, we can finally start addressing the information chaos that has been reigning across the enterprise. And, as I also stated earlier, it is not about technology (which is the enabler, of course), but more about people, processes and change management. All this has to be seamless, easy to use, and the complexity has to be hidden from the end user. But I think we are finally getting to that point. And once we do, then the whole process of e-Discovery will become a far less onerous problem than it is today! I know of several large companies who are spending between $10 million and $70 million just to address their e-Discovery requirements. That is almost too hard to believe, but true. Of course, when we think about the amount of money involved in class action lawsuits, then we can understand their motivation. It still boggles the mind, though.