The Evolution of Content Management and the Emergence of Intelligent Content

I am starting to gather my thoughts about the upcoming Intelligent Content 2010 Conference, where I have the honor of being Keynote Speaker, next to a luminary such as Bob Glushko.  I am in illustrious company, so I had better deliver a great session.  Nothing like putting some pressure on myself 🙂

I tend to think in terms of Enterprise Content being evolutionary in nature.  I was thinking of an illustrative slide showing a good image of evolution.  Thankfully, there is Bing Search with some great images I will be able to use (I especially like the one which shows people evolving into ‘PC Apes’).  To me, the Evolution of Content into ‘Intelligent Content’ bears some similarity to the evolutionary process that led to Homo Sapiens, the ‘knowing man’ (and I do not intend to get into a discussion about Creativism vs. the Theory of Evolution here, even though DITA refers to Darwin…). 

This brings me to another important point: what exactly is ‘Intelligent Content’.  It is a fairly new definition that thought leaders in the ECM space such as Ann Rockley started using a few years ago.  I found a great interview with her titled What Constitutes “Intelligent Content”? which describes Intelligent Content in this manner: Intelligent content is structurally rich and semantically aware, and is therefore automatically discoverable, reusable, reconfigurable, and adaptable.  I also found an excellent article by Joe Gollner about The Emergence of Intelligent Content, with the subtitle ‘The evolution of open content technologies and their significance’  where he focuses mostly on the evolution of Structured Content.  

Indeed, what we need much more of in the ECM space is Intelligent Content, and what we need less of is ‘dumb content’.  Interestingly enough, my friend Gerald Kukko and I talked many years ago about the need for a ‘self-contained Nugget of Information’.  These self-contained nuggets would contain content and its associated semantically rich metadata, and could be assembled according to certain assembly rules.  Finally, the OpenXML file format gives us such a self-contained nugget, but that alone does not make it Intelligent Content.  We also need to enhance the metadata with semantics, and add the assembly rules.

Another way to approach Intelligent Content is to draw a parallel to Product Lifecycle Management (PLM) in Manufacturing.  PLM is a well-researched and well-documented topic.  It is about how to track and plan all the parts of a complicated assembly, such as a machine.  Interestingly enough, all the experts at PLM who I have talked to failed to recognize that there is also a need to track all the documentation and its lifecycle associated with this machine, especially if there are many modular and re-usable parts.  And even the people who are familiar with DITA for Technical Publications usually miss the bigger picture of the Lifecycle of Content supporting PLM processes.  I recently had an interesting conversation about this with my friend Jim Averback, and we came to the conclusion that when it comes to managing documents, most of these smart people are spending time and effort to fix existing, fundamentally broken processes – this is especially true in the Life Sciences industry, where there is absolutely no notion of structured content authoring and content re-use, even though this is one of the industries that would benefit most from this approach.  With all the sophistication of PLM systems and concurrent engineering, the documentation processes are still run as if they were medieval guilds.  This is pretty incredible, but we have seen evidence in many places that prove that this statement is accurate.

I have blogged about our own Intelligent Content Framework (ICF) initiative before.  I do believe that this represents one of the most advanced states of the Evolution of Content.  It builds on DITA, and takes it further.  Here are some of the key tenets that I am sure some of the ‘DITA purists’ might not agree with.  Perhaps I have the benefit of being an engineer, and to me this is all pretty much just engineering.   1.) The power in DITA above and beyond using it for technical publications and other complex publishing applications is that it provides the Information Model that is missing from today’s ECM systems.  The current approach of ‘folders within folders within folders’ and Virtual Documents is not an Information Model.  2.)  DITA can be applied to individual documents, and not only topics, and can be applied to complex specialized document structures like eCTD.  3.) DITA alone is not sufficient, because its metadata model is limited (but extensible), and focused on publishing applications.  So we are enhancing DITA with rich metadata and an enterprise metadata model via MetaPoint.  4.) Word and Office OpenXML can be the platform for a powerful native DITA Editor that anyone can use, as long as we hide the DITA and other XML complexity behind the scenes.  All the user has to focus on is the science and the writing, and not the formatting and other requirements, and the application has to look exactly like any other Word application.  5.) In most cases, you do not really need specialized XML databases to store DITA Topics, and to be able to query these.  For ICF, we can do this very well by using a standard ECM system like SharePoint to store the topics and their related metadata, and enhance it with faceted search.  6.) Collaboration is very important.  Modern ECM systems need integrated tools for collaboration (workflow, tasks, email notification etc.)  Again, all this is built seamlessly into SharePoint, making it an ideal platform for ICF.

As stated in an earlier Blog post, ICF also needs to be complemented by content modeling, content design, content strategy, content reuse strategy, taxonomy, workflow and so forth: we call this Intelligent Content Design (ICD).  This is the focus of the effort Jim is working on now.  He is building a tool called DITA-Talk to support ICD.  What is also very exciting to me is that he is leveraging additional elements of the integrated Microsoft stack.  DITA-Talk is being built on WPF and WCF.  I recently saw a preview of where he is going with it, and it is one of the most exciting applications I have come across in the world of Enterprise Content Management.  Imagine being able to visually design a Document Process, and the end result would be an automatically generated DITA Map, along with some content that it automatically pulled into Topics from back-end systems (including database tables) and existing topics.  We need to move the world of Enterprise Content Management into an evolved state – and we are finally doing that!

Of course, Evolution does not mean that previous species die out right away – for a while the Old and the New will co-exist.  This is why I believe that in the near future, a modern Content Architecture for a Life Sciences company will look something like the diagram below, with Topic-based Authoring and ICF an integral part of the overall architecture:

Compliant ECM Architecture of the Future