I was asked the other day to explain how I would implement an architecture for meta data management as opposed to data management. The question actually stopped me for a moment because it is not a straight forward as it first sounds. In fact, it may be simpler than it sounds.
What is meta data?
The text book definition is that meta data is data about data. What the heck does that mean, you ask? It sounds like double speak, but it is accurate. In fact, there are many different types of meta data, and even data professionals sometimes don’t consider them all.
My favourite scheme for categorizing meta data was developed by NISO (National Information Standards Organization). NISO uses three broad categories:
- Meta data about the container
- Eg Database / Table / Field / File
- This includes file/table/field names, data types, and business definitions
- Meta data about individual instances of the data
- Usually stored with the data, and thus often mistaken for data
- Provides information to help manage a resource
- Eg Audit information, Access rights, etc
I like the NISO scheme because it makes it easier to differentiate between data and meta data.
Let’s try an example: In an accounting system we may have a piece of numeric data called balance. Let’s say this particular instance of balance has the value 250. Great. You may say, “That makes sense, but what does it have to do with meta data?”
Well, in order to make sense of the number 250, we need to have metadata. The number by itself has no real meaning, but we already have been given some metadata. We know that balance is numeric (that’s structural meta data), so we can be sure 250 represents a number. You can do math with it. From its name (more structural meta data), we know it represents a balance. But that’s still not enough.
In days gone by, there may have been a data dictionary that would have held an explanation that balance referred to an account balance amount in Canadian Dollars (CAD). There would have been no question that this was meta data. More modern systems, however, are designed to be flexible, so a currency field would be added (this describes the individual instance of balance, so it is descriptive meta data). Currency would contain CAD in our example, but could contain a different currency in another record. The currency allows us to better understand the balance since 250 CAD is different than 250 USD or 250 JPY.
Now that the meta data is sitting in a field in the database right next to the data, the distinction becomes less clear. This leads to the fallacy that, “One person’s meta data is another person’s data”.
Meta data IS Data
Back to the original question… Most people don’t differentiate between data, and descriptive meta data because it all sits together in the same container. For the most part, administrative meta data also sits in the same containers so it’s not that different either. These two types of meta data used to sit in different containers and have different processes around them, but thinking has evolved to merge them together.
Meta data does still matter. Probably more now than ever before. The problem is that the lines are blurring, so that it is becoming more difficult to differentiate between data and meta data. In fact, this may not be a problem at all.
If we treat data and meta data as two instances of the same thing, then we don’t need to have separate architectures around them. Meta data IS data. Business, and data lineage meta data is just more data. It’s not different, except in what it describes.
Here is the key: Business data is the data that results from doing business (whatever your business is). Business meta data is the data that results from managing business data. It’s data, and any data management architecture should be able to easily handle it.