FUNDAMENTALS OF CONTROLLED DOCUMENT MANAGEMENT

WHAT IS A CONTROLLED DOCUMENT

Controlled documents have characteristics that differentiate them from other types of documents. The management of these types of documents are governed by strict rules, often dictated by regulatory or compliance obligations. Examples of controlled documents include standards, procedures and policies.

The following are typical key characteristics of controlled documents.

Review and approval. All controlled documents are subjected to review and approval. A document must be formally approved before it can be released and used.
Formal issuance and withdrawal. Controlled documents are formally issued for use, and withdrawn from use when no longer valid.
Currency status and distribution control. All controlled documents must have a currency status. That is, a document may have a status of current, superseded, obsolete or cancelled for example. Controlled documents that are no longer current must be withdrawn from use so that they are not inadvertently used.
Issue history. Different issues or releases of a controlled document must be uniquely identified (e.g. with a revision or version number).
Directive authority. Controlled documents, such as standard operating procedures for example, typically carry directive authority. That is, the document gives instructions that must be followed.
Periodic review. Directive authority type controlled documents, such as standards, procedures and policies, must be reviewed on a periodic schedule and revised and re-issued as appropriate. Engineering drawings are a class of controlled document but are generally not subjected to review on a time-based schedule but are revised based on other triggers.
Unique identification. Controlled documents must be uniquely identifiable. Generally this is via a document ID, being a string of characters, a sequence number or a combination thereof.

Effectively managing controlled documents requires a system specifically designed to handle these unique characteristics.

ESSENTIAL REQUIREMENTS OF A CONTROLLED DOCUMENT MANAGEMENT SYSTEM

Structure and Relationships

The characteristics of controlled documents dictate that they must be managed in a system which has structure. Furthermore, the management of relationships between different entities must be embodied within the structure. For example, the different releases of a particular document are all related to the same document ID. At the same time, the document ID must be unique.

To effectively manage these relationships a robust data management platform is essential. Relational databases are the most appropriate platform for this purpose. A relational database natively enforces referential integrity and manages relationships. To effectively manage controlled documents in a relational database the database structure must be designed to mirror the real-world structure of how controlled documents must be managed.

Document Identification

A fundamental rule of controlled document management systems (CDMS) is that each document must be uniquely identified. The CDMS must enforce the uniqueness of each document ID. The most effective way to implement this requirement is via a single register of document IDs. Hence the first requirement of a CDMS is that it must contain a single register of document IDs, with an associated constraint that all entries must be unique.

The document ID which must be unique is that which is displayed on the published document. Document IDs are cross-referenced in the text of documents, and they must be able to be readily verbally communicated and transcribed, i.e. they must be human-friendly. For this reason GUIDs (globally unique identifiers) are unsuitable for use as document IDs, even though they can be guaranteed to be unique without needing to be managed in a single register. The identifiers of national and international standards (e.g. ISO 9001, IEC 60167, IEEE 1584, ANSI Z535.1) are good examples of document IDs.

Conceptually a document ID is an abstract entity separate from the files which represent the actual document object. With a dedicated document ID register the function of reserving document IDs, for example, is straightforward. There is no need to supply or create dummy files in order to reserve document IDs.

For various reasons (such as inflexibility, explained in more detail elsewhere on the web) embedding metadata into document IDs is poor practice. The advent of relational databases many decades ago obviated the need to use a structured document ID as the primary means of searching, sorting and grouping documents. The only structure which may be applicable is one which indicates the namespace or originating register (e.g. ISO, IEC, IEEE, ANSI), ensuring its unique context.

Despite the fact that metadata should not be embedded within a document ID, apart from enforcing uniqueness a CDMS should not impose constraints on the format of the document ID. Most organizations will use some type of document ID format. The organization should be able to decide on the document ID format independently of the CDMS. The CDMS should also maintain only a single document ID register and not maintain a parallel or proxy document ID register. To guarantee that no duplicate document identifiers can be created, the CDMS must not maintain different registers for different document types and hence allow the configuration of duplicate identifiers across different registers.

Relationship Between Documents and Files

While enforcing the uniqueness of the document ID, the CDMS must also cater for the relationship structure that a single document ID is related to a series of releases. Within each release there will typically be two renditions. For example, there will be a source file (e.g. .docx or .dwg) and a published file (.pdf). Theoretically, the published rendition of a single release may comprise multiple files, such as a main document and a separate appendix file in a different format. The document ID is not a metadata attribute of the file. The file is related to the document ID. The document ID is the parent and the file is the child. This logical separation of the document ID from the physical files is critical, allowing for flexible management of various renditions and formats over time. The CDMS must be flexible enough to handle these requirements and mirror the real-world nature of the data. Relational databases are specifically designed to efficiently manage these types of relationships.

User Interface and Control

Document search results must be presented to end users in such a manner that users are not likely to inadvertently use a document which is not the latest release of a current document. Documents which have been cancelled, or releases which have been superseded, should not be presented without additional deliberate actions by the user or perhaps specific authorization. The most efficient way to handle these requirements is via metadata.

Given that users must be able to download document files for various reasons, a strategy must be applied to reduce the risk of non-current documents being used. The simplest and most efficient strategy for ensuring that users are only referring to the current release of a document is to explicitly communicate on each document that the master copy resides in a given nominated single repository. Documents viewed through channels outside of that repository are defined as uncontrolled by default. Thus, the designation of what constitutes a 'controlled copy' is not by what is displayed on the document, or whether or not it is printed, but by the user interface channel through which it is being viewed.

Metadata

Metadata is used in a CDMS for both controlling documents and also for searching, sorting and grouping documents and communicating information.

Some metadata is related to the document ID and some is related to the file. The metadata which is related to the document ID should only be that which does not change from one release to the next. For example, the title and subject matter would generally be related to the document ID, remaining constant across releases, while the specific release number, author and release date would be related to the file issued for a particular release. These relationships reflect the logical structure of the real-world data. The two are combined in search results data and the fact that they are stored separately should be invisible to the end user.

Release identifiers are variously referred to as version or revision number. (Engineering drawing releases are generally referred to as revisions, and pre-release copies as versions, but the terminology is often reversed in other contexts.) Organizations will have different policies as to how releases are identified. While typically referred to as 'numbers', release identifiers can sometimes be letters, or even dates. The CDMS should not constrain how release identifiers are formatted. The set of release numbers for a given document ID must also remain as a single unified set, for example across changes in file type of the underlying source files.

Basic principles of metadata management dictate that document control metadata must be related to files, not embedded within the file. For example, the status of a file as current latest release or superseded should be by linking a reference to the file to an element in a lookup table. The status is changed by changing the link table without accessing or modifying the file, and the status is read without needing to read the file.

In keeping with basic data structure principles, the metadata should be in a normalized structure. This ensures data integrity and efficiency by storing each piece of data only once, referencing it via keys across related tables.

Storing Files

The principle of ensuring that data remains consistent with the associated files is that of transactional integrity. Specifically, the transactions must be ACID-compliant. Otherwise the data can readily become corrupted. A relational database engine can guarantee transactional integrity, but only if the binary content of the file is under the full control of the database.

The approaches to storing files and associated metadata have evolved over time. The most recent development is cloud object storage. Cloud object storage offers extreme levels of scalability and other benefits, but has the drawback of compromising transactional integrity compared with frameworks where the file binary content is under the full control of the database. For some solution needs transactional integrity is less important. For a CDMS solution however the integrity of the data is critical. The prospect of users being presented with an obsolete document, or a sensitive and restricted document, due to data corruption caused by a failed transaction for example, would not be acceptable. Transactional integrity with cloud object storage can be managed by the application layer, but it inherently can never be as robust as where it is entirely managed within the relational database engine.

Ultimately a CDMS user must be able to extract a document file from the system. As with document IDs, the CDMS should not constrain the user or organization with regards to how extracted files are named. The filename should generally simply match that given when the file was uploaded.

A SHORT HISTORY OF FILE STORAGE AND RELATIONAL DATABASES

The binary content of files can be stored directly within a binary column within a database table (referred to as a Binary Large Object – BLOB). This approach tends to create scalability and maintainability issues however. Conversely, when file binary data is stored external to the database the database loses its inherent ability to guarantee transactional integrity. If the data is stored external to the database then the application layer must assume responsibility for managing transactional integrity. In 2007/8 Oracle and Microsoft introduced solutions to this problem with SecureFiles and FILESTREAM respectively.

However SecureFiles and FILESTREAM still present limitations in the context of extremely large and distributed architectures. Hence for very large scale and distributed requirements (e.g. petabyte scale volumes), cloud object storage (such as Azure Blob Storage, Amazon S3 and Google Cloud Storage) have been developed as a more cost-effective solution. These solutions allow multi-tenanted, centrally managed storage for any scale, small or large.

The drawback of cloud object storage is that transactional integrity with an associated relational database must be mediated by an application layer. For solutions where transactional integrity is important, cloud object storage is inherently less robust in its integrity guarantees than options where the file binary data is under the full control of the relational database engine. Cloud object storage also introduces a security framework which is separate from and in addition to that of the relational database. All these challenges are manageable, but the solutions entail compromises. A security and integrity framework comprising an encapsulated single security model with transactional integrity fully assured, all within a single system without involvement of an application layer, is not possible with cloud object storage. However this framework is possible with a solution based on a relational database which retains full control of the file binary data content.

Silkwood Software

March 2026