The Four Fundamentals of a Successful Data Architecture

There are lots of people who call themselves Data Architects, and they have produced even more Data Architectures. Every data system, in fact, has a data architecture – whether by design, or by chance. Not all of these data systems, unfortunately, have good Data Architectures. The question is, how do you tell the difference between a good architecture, and a not so good one?

Here are my top four indicators.

1) Data Model(s)

These are the blueprints of your Data Architecture. Your data models guide and shape everything else. Data Models come it three flavours: Conceptual, Logical, and Physical. A good Data Architecture includes all three.

Conceptual

The conceptual model is the 10,000 foot view of your data. It captures the main data entities, ideally from a business perspective, and records how they are related to each other.

I'm going to be a bit controversial here (for a Data Architect anyway), and state that this is the most important model in any data architecture. It is, unfortunately, also the most frequently skipped. 

Why is this the most important model? Because this is the only model that depicts the data the way the business uses it. This is truly the foundation; the blueprint. Without a conceptual model, the rest of your data models (if there are any) are necessarily created in reactive mode.

Logical

This is where details get added in. Tables, attributes, and the fiddly structures that allow the data to be validated, and later pulled back out of the database intact. This model is very similar to the physical model, except that it is kept at a level where it has no dependencies on any technical solution.

Physical

Where the rubber hits the road! This is were everything that was built before is tuned for the specific technical solution: hardware, DBMS, file system, whatever.

2) Meta Data

Meta data is something that is enjoying a bit of a renaissance right now, but it is still not taken as seriously as it should be. Meta data is all the extra data that enables the rest of the Data Architecture, and ultimately the data system, to be understood by someone who wasn't there when it was all built. Meta data also helps the data team keep the Data Architecture on track.

There are three main classifications of Meta Data according to the National Information Standards Organization (NISO). Yes, there are other classification systems, but this one is my favourite because it really helps differentiate between data, and meta data.

Structural

Any meta data that is tied to the structure of the data. The data containers, if you will: data types; field sizes; table, file, and field descriptions. This is the stuff that most people think of when they think of meta data. This classification also includes data lineage, or the meta data that tells you where a particular piece of data came from (particularly important for any type of BI system).

Descriptive

This is the meta data that often gets mislabeled as data, because in modern systems it lives with the data. The units that go with a measurement, for example.

Administrative

Finally we have the meta data that helps us manage the data. Audit information like the userid of the person that created a record, and the time it was last updated, and security data that tells the system who can access the system, or even specific parts of it.

3) Standards

Standards are something that often develop organically, like data, as their need becomes apparent. The problem with this approach is that it almost always leaves parts of the data system grandfathered. That is, using the previous unstandardized approaches. You can never fully get away from this, but a good Data Architecture can, and will, put standards in place to deal with the majority of these issues by ensuring they never happen.

Data standards also make a system far easier to understand, and maintain. No matter how good your data model and meta data are, bad or no standards will still result in a messy, and difficult to maintain data system.

There are two main types of standards that a good Data Architecture will define:

Naming standards

These standards relate, as you would expect, to the way that files, tables, fields, and so on are named. Good naming standards will make navigating a model, and understanding its components much simpler.

Design standards

Design standards deal with standard approaches for dealing with repeated situations (such as hierarchies, and code tables). Once a person learns how one part of the data system works, other parts are recognized much more quickly, resulting in fewer mistakes.

4) The ability to evolve

No data architecture exists in a vacuum. The business changes, and it’s data needs change. A good Data Architecture will anticipate some of these changes, but a great Data Architecture will recognize that change is inevitable, and actually plan for it.

How do you plan for a change you don't know about? Firstly, by recognizing when assumptions about the data have affected the structure, and having a plan in place to deal with a change to that structure. The following are some questions that should be answered by your Data Architecture:

  1. How do you change a structure without compromising the data contained in it? Add to it? Split it? Merge it with another structure?
  2. How do you deal with historic data for elements that are new to the Data Architecture?
  3. What happens if the source of the data changes?
  4. How do you retire a data element, or a whole structure?
  5. How do you handle a change in a validation rule?
  6. How do you handle a change in a relationship between two data entities?