SharePoint 2010 – the new Information Operating System for the Enterprise

I spent last week at the SharePoint Conference 2009 in Las Vegas.  It was absolutely amazing!  More than twice the size of the last conference, and completely sold out.  Over 7,400 attendees from all over the world, which in itself is amazing, considering today’s economic climate.

I talked to a lot of customers and partners, and they all spoke in superlatives.  Here are some quotes:




“The new Information Operating System for the Enterprise” 

I agree with all of these statements!

For a good sense of what is coming, check out the SharePoint Team Blog: and Joel Oleson’s Blog:

Here is a good overview:

Additional technical information can be found here:

Over the next few weeks, more content will be posted here:

When it comes to ECM, I am particularly excited about two things: Word Automation Services:  and the rest of the major ECM-related upgrades:  Here is another good high-level summary, focused on ECM:

We are entering a new era in Enterprise Information Management, and this is possibly the most exciting period in the history of SharePoint.  I am glad to be part of it!


The Evolution of Content Management and the Emergence of Intelligent Content

I am starting to gather my thoughts about the upcoming Intelligent Content 2010 Conference, where I have the honor of being Keynote Speaker, next to a luminary such as Bob Glushko.  I am in illustrious company, so I had better deliver a great session.  Nothing like putting some pressure on myself 🙂

I tend to think in terms of Enterprise Content being evolutionary in nature.  I was thinking of an illustrative slide showing a good image of evolution.  Thankfully, there is Bing Search with some great images I will be able to use (I especially like the one which shows people evolving into ‘PC Apes’).  To me, the Evolution of Content into ‘Intelligent Content’ bears some similarity to the evolutionary process that led to Homo Sapiens, the ‘knowing man’ (and I do not intend to get into a discussion about Creativism vs. the Theory of Evolution here, even though DITA refers to Darwin…). 

This brings me to another important point: what exactly is ‘Intelligent Content’.  It is a fairly new definition that thought leaders in the ECM space such as Ann Rockley started using a few years ago.  I found a great interview with her titled What Constitutes “Intelligent Content”? which describes Intelligent Content in this manner: Intelligent content is structurally rich and semantically aware, and is therefore automatically discoverable, reusable, reconfigurable, and adaptable.  I also found an excellent article by Joe Gollner about The Emergence of Intelligent Content, with the subtitle ‘The evolution of open content technologies and their significance’  where he focuses mostly on the evolution of Structured Content.  

Indeed, what we need much more of in the ECM space is Intelligent Content, and what we need less of is ‘dumb content’.  Interestingly enough, my friend Gerald Kukko and I talked many years ago about the need for a ‘self-contained Nugget of Information’.  These self-contained nuggets would contain content and its associated semantically rich metadata, and could be assembled according to certain assembly rules.  Finally, the OpenXML file format gives us such a self-contained nugget, but that alone does not make it Intelligent Content.  We also need to enhance the metadata with semantics, and add the assembly rules.

Another way to approach Intelligent Content is to draw a parallel to Product Lifecycle Management (PLM) in Manufacturing.  PLM is a well-researched and well-documented topic.  It is about how to track and plan all the parts of a complicated assembly, such as a machine.  Interestingly enough, all the experts at PLM who I have talked to failed to recognize that there is also a need to track all the documentation and its lifecycle associated with this machine, especially if there are many modular and re-usable parts.  And even the people who are familiar with DITA for Technical Publications usually miss the bigger picture of the Lifecycle of Content supporting PLM processes.  I recently had an interesting conversation about this with my friend Jim Averback, and we came to the conclusion that when it comes to managing documents, most of these smart people are spending time and effort to fix existing, fundamentally broken processes – this is especially true in the Life Sciences industry, where there is absolutely no notion of structured content authoring and content re-use, even though this is one of the industries that would benefit most from this approach.  With all the sophistication of PLM systems and concurrent engineering, the documentation processes are still run as if they were medieval guilds.  This is pretty incredible, but we have seen evidence in many places that prove that this statement is accurate.

I have blogged about our own Intelligent Content Framework (ICF) initiative before.  I do believe that this represents one of the most advanced states of the Evolution of Content.  It builds on DITA, and takes it further.  Here are some of the key tenets that I am sure some of the ‘DITA purists’ might not agree with.  Perhaps I have the benefit of being an engineer, and to me this is all pretty much just engineering.   1.) The power in DITA above and beyond using it for technical publications and other complex publishing applications is that it provides the Information Model that is missing from today’s ECM systems.  The current approach of ‘folders within folders within folders’ and Virtual Documents is not an Information Model.  2.)  DITA can be applied to individual documents, and not only topics, and can be applied to complex specialized document structures like eCTD.  3.) DITA alone is not sufficient, because its metadata model is limited (but extensible), and focused on publishing applications.  So we are enhancing DITA with rich metadata and an enterprise metadata model via MetaPoint.  4.) Word and Office OpenXML can be the platform for a powerful native DITA Editor that anyone can use, as long as we hide the DITA and other XML complexity behind the scenes.  All the user has to focus on is the science and the writing, and not the formatting and other requirements, and the application has to look exactly like any other Word application.  5.) In most cases, you do not really need specialized XML databases to store DITA Topics, and to be able to query these.  For ICF, we can do this very well by using a standard ECM system like SharePoint to store the topics and their related metadata, and enhance it with faceted search.  6.) Collaboration is very important.  Modern ECM systems need integrated tools for collaboration (workflow, tasks, email notification etc.)  Again, all this is built seamlessly into SharePoint, making it an ideal platform for ICF.

As stated in an earlier Blog post, ICF also needs to be complemented by content modeling, content design, content strategy, content reuse strategy, taxonomy, workflow and so forth: we call this Intelligent Content Design (ICD).  This is the focus of the effort Jim is working on now.  He is building a tool called DITA-Talk to support ICD.  What is also very exciting to me is that he is leveraging additional elements of the integrated Microsoft stack.  DITA-Talk is being built on WPF and WCF.  I recently saw a preview of where he is going with it, and it is one of the most exciting applications I have come across in the world of Enterprise Content Management.  Imagine being able to visually design a Document Process, and the end result would be an automatically generated DITA Map, along with some content that it automatically pulled into Topics from back-end systems (including database tables) and existing topics.  We need to move the world of Enterprise Content Management into an evolved state – and we are finally doing that!

Of course, Evolution does not mean that previous species die out right away – for a while the Old and the New will co-exist.  This is why I believe that in the near future, a modern Content Architecture for a Life Sciences company will look something like the diagram below, with Topic-based Authoring and ICF an integral part of the overall architecture:

Compliant ECM Architecture of the Future

Infonomics Magazine – Paperless Clinical Trials

Infonomics magazine just published a very useful article by Ken Lownie of NextDocs, titled ‘In Search of "Paperless" Clinical trials’:  

I am reminded of a related article I read a few years ago called ‘Tortured by Paper’:  It is exciting to see that SharePoint is emerging as a platform of choice for managing Clinical Trials.  See my related Blog Post as well about Clinical Trials and the Microsoft platform.

Update on the Intelligent Content Framework

We are continuing to make good progress with ICF.  We have recently talked to many of the leading Life Sciences companies, and there is general consensus that there is a great need for Structured Content Authoring (SCA) approaches.  Many companies have been already been burned by failed implementation of SCA technologies, and are only slowly willing to dip their toes in the water again.  I remain convinced that ICF will bring a much-needed breakthrough.

There is also great progress on the OASIS front.  A DITA Pharmaceutical Content Subcommittee has been formed and there is great interest in developing a pharma-specific standard that can be used for SCA.  More and more participants from Big Pharma are signing up, and momentum is building.  I think there is also good potential to liaise with some of the work going on within DIA and within HL7.

I am extremely excited that I have been invited as a Keynote Speaker to the Intelligent Content 2010 Conference.  It is a real privilege and honor to be listed on the same page as a luminary such as Bob Glushko.  His book on Document Engineering contributed greatly to refining my thinking about Intelligent Content.

Exciting times ahead – there is a desperate need for new approaches to ECM in Regulated Industries, and I believe ICF will deliver what is needed.

Social Networking meets Clinical Trials

I have been looking at Social Computing applications for business for a while.  Frankly, it took me a little time to get by head around all this.  I just don’t see how there would be a single Killer App to solve all issues.  However, there is no doubt that Social Computing applications are becoming more mainstream.  I wanted to capture some of the more interesting applications I have come across recently.  What is also really cool about the second one is how ‘it also allows would-be subjects with a personal health record (PHR) in Microsoft HealthVault or another system to import the information in lieu of filling out an online pre-screening form to be matched for trials.’

Health 2.0: Patients as Partners

Social networks like PatientsLikeMe let people take charge of their own care—changing the nature of drug research and the practice of medicine:

New Patients Can Tweet for Trials

Clinical Trials and the Microsoft platform

My friend and colleague Les Jordan has recently added a new posting to his MSDN Blog called CTMS & EDC: A system to do both – and more.  Lately, I have been thinking a lot about Clinical Trials as well, so I wanted to add my own perspective to his excellent posting.

I think Clinical Trials are another area within the Life Sciences industry that are incredibly balkanized.  Most of large pharmaceutical companies I have worked with have between 20 and 25 clinical systems that need to work together.  The effort and expense to make all this function is simply mind-numbing!  A partner just sent me some presentations from a recent CTMS conference, and I just could not believe the incredible pain and expense companies have to go through to make these systems work together.  They have no idea how much easier their lives could be by considering some alternative approaches.  And it is scary to think they actually believe they are doing cutting edge stuff.  Well, in a way they are doing cutting edge – by pouring money and resources into legacy systems which they believe to be ‘best-of-breed’.  They are literally spending millions (or even tens of millions) annually to keep these systems running and working together.  However, the reality is that nobody can afford these approaches any more, and things will have to change drastically in the Clinical Trials space, just like in all the other facets of the life sciences industry.

If I had to start from scratch, I would build a system on the following elements of the Microsoft stack: MOSS, SQL Server, Project Server, and Dynamics CRM.  These are just four ‘moving parts’, but they are designed to work together, and work with a single integrated development environment in Visual Studio.

When I think about it, it is mostly about collaboration and business process management around data and documents.  Let’s consider a simple overview of Clinical Trials, without going into too much detail, and consider what systems need to be in place.  It is a complex, convoluted process, and I would like to show that most can be done with just four ‘moving parts’.

  • First, the Clinical Trial needs to be designed.  There are a number of specialized tools that are used for this, but no doubt collaboration plays a key role.  How can this collaboration take place: MOSS, of course! 
  • Then, a Clinical Protocol needs to be written.  This is usually a complex, and highly structured document which lends itself well to automation.  We are currently working on using the Intelligent Content Framework and DITA-Maps (don’t be scared of the specialized term: think of a DITA-Map as a Table of contents and a set of rules to build a document) and to make this process as automated as possible.  The same approach also applies to Clinical Investigator Brochure.
  • The Clinical Protocol needs to be reviewed and approved by an Institutional Review Board or an Ethics Committee.  Again, this is a collaborative process.  The best tools for this are MOSS, in combination with Unified Communications.
  • Once the trial has been approved, the sponsor (i.e. the pharmaceutical company, or the CRO who they have contracted – as an aside, MOSS is a great platform for managing contracts between the sponsor and the CRO) has to select the Clinical Sites (these are mainly hospitals or research centers) where the trial will be conducted.  There is a lot of business process management and collaboration that has to take place during the so-called Site Initiation Process, and other documentation needs to be managed, such as Patient Consent Forms, Investigator Brochure etc.  Most of this is very document and forms-centric.  Our partner NextDocs has built a great module to do this – on MOSS!  There is also the need to manage all Training Records for Clinical Investigators and Research Associates, and to manage the published Clinical Investigator BrochuresNextDocs has this Clinical module built out, too.  In fact, I am even aware of a free SharePoint Services 3.0 Application Template for Clinical Trial Initiation and Management.
  • In order to conduct the Clinical Trial, the sponsor or the CRO usually puts up a Clinical Investigator Portal, where investigators log in to get all the information about the particular trial, and protocol, and can upload all their pertinent information, such as CV, completed Statement of Investigator (also known as 1572 form), etc.  MOSS is ideal for this application.  Of course, Dynamics CRM can also work in conjunction with this to serve as an Investigator Database (which needs some CRM capabilities)
  • Sponsors have to recruit patients who are interested and willing, and qualified to participate in the trial: they often put up a Patient Recruitment Portal for this application.  Again, MOSS is ideal for this application.  Our partner Quilogy has built some very impressive applications in this space (both for Patient and for Investigator Recruitment).  And of course, Dynamics CRM can also supplement the Patient Recruitment Portal application as well.  Dynamics CRM is also an ideal platform for Investigator Grant Management, another area that is ripe for a major technology upgrade – most companies are still using antiquated technologies for this application.
  • Sponsors also have to track payments and disbursements to investigators, have to track clinical supplies, etc.  This is also often referred to as a CTMS application.  There is an excellent solution for this by our partner TranSenda.  We have been collaborating closely with them on their Office-Smart Clinical Trial Manager, and they have posted an excellent White Paper about SharePoint in Clinical Trials.  We are also doing some very innovative and forward-thinking work with them around managing Clinical Trial process data, which is not an area that is covered by the Clinical Data Interchange Standard Consortium (CDISC).  See here for a great article that covers the problem space that we are addressing with Cortex™ .  The part about the Clinical Trials Interchange Platform (CTIP) is especially relevant!  The Veterans Health Administration have also built their own CTMS system on MOSS technologies.  They have the same requirements as the biopharma industry, but a fraction of the budget, so they have to be innovative and forward-thinking.  However, shouldn’t that be the business paradigm for everyone these days?  I have uploaded some materials of their solution here.  To quote Dave Rose, Chief Architect of VA’s CTMS/EDC solution set: “The system leverages current investments, scales and is absorptive of future technologies.  Imagine managing your clinical trials on the same platform you use to manage the enterprise….”  In fact, there is now also a commercial version of the application available:  And of course Dynamics CRM is an excellent development platform for this, too.  I posted some White Papers on the latter here.
  • Now we come to the conduct of the trial.  There are many Project-related activities related to this, and of course Project Server can be the tool of choice.  The beauty of this application is that it is built on the same foundation as MOSS, so that Project data can easily be surfaced in the Portal, and even managed from there.
  • Once trials are up and running, there is a ton of data being generated related, which is made up of patient data and process data.  Increasingly, companies are moving to Electronic Data Capture to achieve this.  This data is then fed to Clinical Databases (SQL Server is used by several partners of ours for this purpose).  Once again, InfoPath and Forms Services, part of the MOSS stack, are ideal for this.  Here is a great application from Qdabra that can seamlessly connect forms to databases.  It has great potential for EDC applications.  The problem I see all too often that there is a lot of legacy technology that has been built, and vendors are very reluctant to cannibalize their old legacy EDC tools.  It is such a shame, because InfoPath and Forms Services are built on XML, and are ideal for this ‘smart form’ application.  However, there are already several applications out there that leverage this capability.  There are our friends at the VA, there is Tenalea, and there is also InferMed
  • I do not see paper going away completely for a while as a means to collect data via Case Report Forms.  Imaging and Capture applications are also needed to get Data into the the system.  Among the solutions I have seen out there, I am a particular fan of KnowledgeLake, as a capture system for CRF data.
  • During the conduct of Clinical Trials, sponsors also need to track and manage Adverse Events data, which is fed into an Adverse Events Report System, which is essentially a specialized database.  The biggest problems many companies face are how to capture and process this data, prior feeding it into the Adverse Events Report System.  Once again, InfoPath and Forms Services, in combination with a capture and imaging system, driven by business process management in the back end an ideal solution.
  • All the EDC and scanned Case Report Forms have to be published into a Casebook, which then are collected as part of a Trial Master file.  MOSS is an ideal platform as an Electronic Trial Master File.  I just read an interesting article about this recently.  However, the NextDocs Clinical module does this, too.  There is also the issue of making sure all documentation is properly tagged and uploaded into the system to form the Trial Master file.   We have a great Case Study on the subject, where I closely collaborated with the customer and the partner who built the solution – on MOSS!
  • There is a lot of interest in Secure Document Exchange for Clinical Trials these days.  It is mindboggling to consider how much paper is still being processed, huge bills being paid to shipping and logistics companies to manage all this paper, and the interpretation of ‘electronic’ in many cases is still data burnt on a CD or DVD, and then shipped via a courier service.   In fact, the excellent article from 2002 titled ‘Tortured by Paper’ is still very much current.   Not too much progress in 7 years…..  However, as it happens, MOSS is a solution for this problem, too – and the NextDocs Clinical module is built with this purpose in mind.  I have no doubt that they will be very successful with this application.  Certainly, paper will not disappear entirely for a long time – but there are some excellent Capture solutions build for MOSS that can help reduce or eliminate the ‘torture by paper’ part: KnowledgeLake, BlueThread and Clearview are the ones I usually recommend.

By no means is the above a complete listing of all the intricacies and details of Clinical Trials, and I have stayed away from specialty areas such as randomization, IVR, biostatistics, pharmacokinetics, etc.  However, I wanted to provide a high-level overview of some of the main clinical systems and processes, and also emphasize again that it is mostly about collaboration and business process management around data and documents.  The latest Microsoft software stack is ideal for this, and it is time people realized that they can simply no longer afford the balkanized approaches of taking all these so-called ‘best-of-breed’ solutions and try to make it all work together smoothly (even if some of them still believe that Web Services will be a panacea to fix all of this).  Best-of-breed can a misnomer, too – because the Microsoft software platform is mature and powerful enough to be able to build any of the systems mentioned above, and to deliver a superior solution at far lower cost.  Why not build a best-of-breed suite on a platform that was designed to work together from the ground up?

Update February 2, 2010: we just released a brand new Case Study that I sponsored, and I am really proud of:  What a great success story for SharePoint being used at a CRO, helping them lower costs and run their business more efficiently!

Mr. Metadata’s Musings on Legacy Migration, e-Discovery and ECM 2.0

I had written earlier about what makes for an Intelligent Content Solution: you need Intelligent Content Design to supplement the Intelligent Content Framework.

Recently, we also started talking to some of our large customers about legacy migration approaches.  The fact is that despite the phenomenal success of MOSS, over 80% of unstructured corporate content still resides within file systems today.  Often, it is just one big mess, and leads to a tremendous loss of corporate knowledge, while creating a huge litigation liability.  There is simply no one-size-fits-all approach to solving this problem.  Applying Search technologies alone does not solve the issue.  There are a few specialized companies who I have worked with over the years who specialize in this field, such as Delve Information Group and the Gimmal Group.  I worked with a large Pharma customer a few years ago who had a team of people working on a project just to sort out the ownership of legacy content on their file systems.  It took them two years just to create a database that stored information on who owned what document.  But this information could not be turned into actionable intelligence for file systems migration.

When migrating legacy content, the following considerations need to be taken into account: 1.) Who is the owner of the legacy content?  If the person is no longer with the company, can the information be deleted, or archived?  A good tool to help with this Information Governance issue for file systems is Varonis This could be a key aid in migrating content from file systems to MOSS, and maintaining the ownership governance.  Just relying on a migration tool like Metalogix or or MetaVis only solves a fraction of the problem.

2.) Another area of consideration are the multiple redundant copies of legacy files.  According to Cohasset Associates, each content artifact has up to 18 identical copies scattered all over the place.  A key question when trying to manage the Corporate Truth is which document is the original, and which are copies thereof?  This issue alone lends itself to several approaches to e-Discovery and ‘document forensics’.  Many search engines which are combined with hashing capability can actually be adopted to find duplicate documents, and the server will store the date the document was uploaded.  So that is a start.  But that is not sufficient.  The best solution by far to address this problem is the new solution by NextPage Information Tracking Platform.  Their ‘digital threading’ technology is exactly what is needed, and I consider it revolutionary.  There is also the issue of document forensics.  This is they key consideration: if I take a document, and modify just one word in it and save it – does this make the new document completely unique (a hashing-only approach would create an entirely new hash for the document) or is this a ‘closely related’ document?  The so-called vectoring capabilities of FAST ESP can help with this problem of ‘near duplicate’ content.  There is also a tool from Equivio that can be used. This leads to some interesting possibilities when used in combination with NextPage.  This technology is actually extremely important in the case of e-Discovery, i.e. the ability to track parent-child relationships of related content, which is also a key element of an emerging area called document forensics.  There are some excellent SaaS or on-premise tools available to support the e-Discovery process, such as Digital Reef or Stratify.  Recently, we have engaged in a project where we brought together Navigant Consulting, Digital Reef and NextPage to deliver a comprehensive and integrated solution to e-Discovery.  I am very excited about the capabilities that these partners offer together.  Some of my colleagues have also been working closely with WorkProducts, who have a very interesting approach called Evidence Lifecycle Management (ELM).  There are also some interesting packaged legacy cleanup and migration tools available from Vamosa and Active Navigation.  Both vendors are emerging leaders in the Enteprise Information Governance space.

There are also several partners which take an archiving approach to e-Discovery.  See here:  However, in most cases, I prefer the federated Information Tracking approach that NextPage offers, simply because it is simply not realistic to archive all enterprise content: how about the content located on Desktops, Thumb Drives, etc. ?  And these solutions also lack the specialized capabilities that most customers need, so a solution like Digital Reefis still needed on top.

3.) Suppose we are able to perform all this cleanup and preparation work prior to being ready to move legacy content to MOSS.  Now we are still confronted with the issue of metadata.  Given that file systems have no concept of metadata, the process of metadata enrichment is extremely important.  There is some basic metadata that most search engines can extract from within documents, such as date, author name etc.  However, this information needs to be associated with the document as metadata, so it is more readily available.  However, I also have a strong belief that content without context is incomplete: there is an excellent article that I read a few years ago that describes the problem.  Legacy content needs to be metadata enriched before migration.  FAST ESP is an ideal tool for this metadata enrichment process, and for automatically building taxonomies.  We are also working on the new Microsoft Semantic Engine – the demo can be watched on-demand here:

4.) The final step is moving the content to MOSS, and applying all this metadata to making it useful and findable.  This is where the new MetaPoint Server by SchemaLogic comes into play.  FAST ESP and MetaPoint as an integrated solution working with MOSS are a key part of solving this problem space.  I have recently started thinking about what a an integration of MetaPoint and NextPage would look like, and delivered as a Service via the new Microsoft Azure Services Platform.  The possibilities are truly exciting!

A recent update to the above is that SchemaLogic and Vamosa have formed a technology partnership around Enterprise Content Governance.  I think this is exactly the kind of solution that companies need to address their needs around content quality, to support legacy migration, e-Discovery and Information Governance.

So now after all this discussion about Legacy Migration and metadata enrichment, let’s get back to the Intelligent Content Framework.  How does it all belong together?  The simple truth is that with Intelligent Content Design to begin with, there will be no need for legacy migration going forward.  I had already talked about Intelligent Content solutions needing the Intelligent Content Framework in combination with an Intelligent Content Design approach.  It occurred to me that if we add FAST ESP to the mix, we have now also introduced semantics, and the concept of the Semantic Web into the world of Enterprise Content Management.  This is why I am calling it ECM 2.0 – it is completely analogous to Web 2.0.  This is really exciting to me, all the more so because the tools to make ECM 2.0 happen are available here and now – and all built seamlessly to work with MOSS and Office 2007.  And of course, for the ultimate in legacy migration, we can set up services to pull in legacy content, analyze and ‘X-ray’ it and enrich it with metadata, break it into re-usable topics, and pull it all into the Intelligent Content Framework.

I do not mean to trivialize the effort required to get us there.  But we can get there – and we will!  Fact of the matter is that current approaches to ECM in Big Pharma are broken – the ‘Digital Scriptorium’ model of manually creating content on the Desktop, managing it in an ECM system, and cutting and pasting with no control of source and target is no longer viable.  It actually never was viable, but there was nothing better available for a long time, and companies could afford to throw money and bodies at the problem.  Those days are gone, and they are not coming back.  Flexible and innovative business models and approaches are the only alternative!

Update June 8, 2011: It was just announced that Microsoft is acquiring Prodiance Corporation, a leader in Enterprise Risk Management  This is definitely very exciting news, and a huge step in the area of Compliance and support for e-Discovery.  We also recently released a very relevant article on Technet: Microsoft IT Uses File Classification Infrastructure to Help Secure Personally Identifiable Information.  We are definitely ramping up the Compliance capabilities of our stack!