Published on February 17, 2014
Information is at the heart of all architecture disciplines: There’s more to Data Modeling than you ever thought Chris Bradley, Chief Information Architect & Enterprise Services Director 19 Eastbourne Terrace, London, W2 6LG, London, UK firstname.lastname@example.org January 2014
CONTENTS Data Modelling is A critical Technique and at the heart of all architecture disciplines ........ 3 Data Modelling Introduction ............................................................................................... 5 Background & history ........................................................................................................ 6 Data Modelling for DBMS development ............................................................................. 7 Data Modelling incorrectly taught at University .................................................................. 9 What needs to change? ................................................................................................... 10 Modelling for the “new” technologies ............................................................................... 11 Demonstrating benefits.................................................................................................... 22 The greatest change required .......................................................................................... 23 What needs to stay the same? ........................................................................................ 25 About The Author ............................................................................................................ 26 ENTERPRISE ARCHITECTS © 2014 2
DATA MODELLING IS A CRITICAL TECHNIQUE AND AT THE HEART OF ALL ARCHITECTURE DISCIPLINES Many years ago people believed the World was flat & if they sailed over the horizon, then they would fall off the edge. They also believed that the planet Earth was at the centre of the heavens and that all the planets orbited around Earth. But they were wrong. People who believe Data Modelling is just for DBMS design are just as misinformed. Data modelling, particularly Conceptual Data Modelling is an absolutely critical technique and is at the heart of all architecture disciplines. Here’s why: Since data has to be understood to be managed, it stands to reason that something whose purpose is to gain agreement on the meaning and definition of concepts will be a key component. That is precisely what a data model provides. But just what do I mean when I state that Data Modelling is at the heart of all architecture disciplines? Figure 1: Data Modelling is at the heart of all architecture disciplines At its heart, the data model provides the unifying language, lingua franca, the common vocabulary upon which everything else is based. Each other modelling technique within the complimentary architecture disciplines will interact with each other, forming a supportive; ENTERPRISE ARCHITECTS © 2014 3
cross checked, integrated and validated set of techniques. It’s not just (sometime it’s never) about technical DBMS design So for a few simple examples we see in: The Business Architecture Domain: A Project Charter documents the rationale, the objectives, the business scope, and success the measures for the project. It uses the language of the high level data model to describe the business concepts. The Process Architecture Domain: A Workflow Model describes the sequence of steps carried out by the actors involved in the process The Application & Systems Architecture Domain: A Use Case describes how an actor completes a step in the process by interacting with a system to obtain a service. A Service Specification describes some form of business service that is initiated to complete a business event The Information Architecture Domain: A Data Model depicts the critical data things, and the attributes or facts about them. These are the data things of importance that the organization wishes to know or hold information on, and is the stuff that processes and systems act on. Every type of model references the things of significance (the entities) in the conceptual data model, showing why conceptual data modelling is such a vital technique. Getting agreement on the language and definition of the data concepts always must always start first; and following that the detail about processes can be added: • Initially we discover the Nouns: ie the things of interest to the organization , e.g. “Product” “Customer” “Location” • New we discover “Verb – Noun” pairs: These are activities that must be performed (process, sub-process, etc, …) in order for the organization to operate, e.g. “Design Product” “Ship Order” • Then we discover “Actor – Verb – Noun “ combinations: These form the Use Cases or a steps within a business process, , e.g. “Lead Architect Designs New Product” ENTERPRISE ARCHITECTS © 2014 4
DATA MODELLING INTRODUCTION The problem for many Data Architects is that “Data modelling” has, in far too many companies received a lot of bad press. Have you heard any of these? “It just gets in the way”, “It takes too much time”, “What’s the point of it”; “It’s not relevant in today’s systems landscape”. “I don’t need to do modelling, the package has it all” Yet when Data modelling first came onto the radar in the mid 1970’s the potential was enormous: We were told we’d realise benefits of: "a single consistent definition of data" "master data records of reference" “reduced development time” “improved data quality” “impact analysis” to name but a few. Do organisations today want to reap these benefits? You bet, it’s a no-brainer. So then, why is it that now, here we are, 30+ years on and we see in many organisations that the benefits of data modelling still need to be “sold” and in others the big benefits simply fail to be delivered? What’s happened? What needs to change? As with most things a look back into the past is a good place to start. ENTERPRISE ARCHITECTS © 2014 5
BACKGROUND & HISTORY Looking back into the history of data management; we see a number of key eras. 1950’s – 70’s: Information Technology (at that time often called Automated Data Processing (ADP)) was starting to enter the mainstream world of commerce. Later during this period we saw the introduction of the first database management systems such as DL1, IMS, IDMS and TOTAL. Who can remember a DBMS that could be implemented entirely on tapes?** At that time the cost of disc storage was exceptionally high, and the notion of exchangeable disc packs was just coming into the data centre. The concept of “database” operations came into being and the early mentions of “corporate central databases” appeared. ** It was IMS HISAM if you really want to know. 1970 – 1990: Data was “discovered”. Early mentions of managing data “as an asset” were seen and the concepts of Data requirements analysis and Data Modelling were introduced. 1990 – 2000: The “Enterprise” became flavour of the decade. We saw Enterprise data management coordination, Enterprise data integration, Enterprise data stewardship, Enterprise data use. An important change began to happen in this period, in that there was a dawning realisation that “technology” alone wasn’t the answer to many of the information issues and we began to see Data Governance starting to be talked about seriously 2000 and beyond: Data quality, Data as a Service, Data Security & Compliance, Data Virtualisation, Services Oriented Architecture (SOA), Governance (still) and Alignment with the Business were (and still are) the data management challenges of this period. And all of this has to be undertaken in these rapidly changing times when we have a “new” view of Information: Web 2.0, Blogs, Mash-ups, Data Virtualisation. It seems anyone can create data! At the same time we have a greater dependence on “packaged” or COTS applications such as the major ERPs. Also there’s more and more use of SOA, XML, Business intelligence and less reliance on traditional “bespoke” development. Notice I sneaked in “mash-ups” (or web application hybrid) there? See the Wiki article http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid) for more on mash-ups. There are many powerful facilities available now that enable you to create your own mash-ups. Make no mistake, these are now becoming the new “shadow IT” of this decade. Remember the home grown departmental Excel macros of the 90’s and onwards that became “critical” to parts of the business? Now mash-ups are doing the same thing. But just who is looking at the data definitions, the data standards, applicability etc? Certainly not the data management group – because frequently they don’t even know that these things are being built in departmental silos, and anyway the “data team” is pigeon holed as being only involved in DBMS development. So that leads us on to examine the belief that many people still have (too many unfortunately) that Data Modelling is only for DBMS development. So why is that? Firstly we’ll look at Data Modelling for use in DBMS development. ENTERPRISE ARCHITECTS © 2014 6
DATA MODELLING FOR DBMS DEVELOPMENT In its early days data modelling WAS primarily aimed at DBMS development. We’ll have a look at the two main techniques in a moment. Just to illustrate this we can look at 4 typical roles that may be considered as “customers” of the data modelling output: The Enterprise data customer: This might be at Director or CxO level. The accuracy of data is critical, they are reports users, and the data “products” that data professionals produce are key to serving the needs of this level of user. The Data Architect: This person knows the business and its rules. He/she manages knowledge about the data and defines the conceptual direction and requirements for capturing of data. The DBA: This person is production oriented, manages data storage and the performance of databases. He also plans and manages data movement strategies and plays a major part in data architecture by working with architects to help optimise and implement their designs in databases. The developer DBA: This role works closely with the development teams and is focused on DBMS development. They frequently move and transform data often writing scripts and ETL to accomplish this. Data models (more accurately the metadata) were (and are) seen as the glue or the lingua franca for integrating IT roles through the DBMS development lifecycle. All of the roles above depend on metadata from at least one of the other roles. What then are the steps for developing a DBMS and utilising Data models? Firstly a word of warning; this could be the subject of a huge paper in its own right, but I’ll try and summarise it simply here: There are two “main” approaches to creating DBMS’s from models: One is the “top down” or “to-be” approach and the other is termed the “bottom-up” or “as-is” approach. Top Down (To-Be) approach Step 1: Speaking with Business representatives, discover then document the business requirements and agree on high-level scope. The output is typically some form of Business Requirements Document (BRD). Understand at high level in concept where the data is used by business processes (and vice versa). Step 2: Create a more detailed business requirement document with subscriber data requirements, business process and business rules. Step 3: Understand and document the business keys, attributes and definitions from business subject matter experts. From this create and continually refine a logical data model. Determine what are the master entities & what is common to other business areas. Step 4: Verify the logical data model with the stakeholders. Walk a number of major business use cases through the model and refine the model. With knowledge of the technical environment that you are going to implement the solution on, apply the technical design rules, use known volumetric and performance criteria and create a first cut physical ENTERPRISE ARCHITECTS © 2014 7
data model. Remember the same logical model could be implemented in different ways upon differing technology platforms. Step 5: Generate the Data Definition Language (DDL) from the physical model. Refine the physical design with DBA support and implement the DBMS using the refined physical model. This top down approach has an advantage that the “new” or “To-Be” business and data requirements are foremost. In the early days there were not too many “existing systems” to consider, a good job because the approach doesn’t take into account any of the hidden nuances & rules that may be deep down within the existing systems. Bottom up (As-Is) approach The primary purpose of the Bottom-up (or As-Is) approach is to create a model of an existing system into which the new requirements can be added. Frequently, the bottom-up approach is used because a model of the current system simply doesn’t exist often because it’s evolved and /or the original design staff have retired, died, or moved on in some other way and the documentation has not been kept up to date. The main steps in the bottom-up approach are: Step 1: Reverse engineer the database of file schema from the system that is already implemented. From this you will have the database catalog, table, column, index names etc. Of course these will all be in “tech” language without any business definitions. Step 2: Profile the real data by browsing and analysing the data from the tables. Scan through the ETLs to find out any hidden relationships and constraints. Modern data profiling tools are invaluable here as they will allow you to gain real insight to the data way beyond simply trying to understand from the column names. You did know that SpareField6 really has the alternative delivery location? Step 3: Find out foreign key relationships between tables, from IT subject matter experts and verify the findings. The typical output here is a refined physical model. Step 4: Document the meanings of columns, and tables from IT subject matter experts Step 5: Try to understand the business meanings of probable attributes and entities that may be candidates for logical data model. From here the result is a “near logical” model. Now pretty obviously, the bottom up approach is great for capturing those hidden “gotchas” that are tucked away inside the current system. However it doesn’t give any serious attention to new requirements. Thus, a third way is a hybrid of these frequently called the “middle out” approach which employs the best parts of the top-down and bottom-up approaches. It’s this approach I favour when designing a new model which is likely to have a better than evens chance of ultimately being used for a technology solution. ENTERPRISE ARCHITECTS © 2014 8
DATA MODELLING INCORRECTLY TAUGHT AT UNIVERSITY As part of my DAMA education brief and to be honest as a way of giving something back to the community I am frequently asked to speak not just at conferences but with academic institutions. Over the past 10 years or so I have been taken aback at what I have observed regarding the way in which Data Modelling is portrayed on courses at many Universities in the UK and USA (and I suspect in other places too). Here are a few snippets I have pulled from 5 separate universities recently regarding data modelling on the Computer Science Bachelors & Masters courses: “The purpose of a Data model is to design a relational database system” “An ER Model is used to specify design and document Database design” “A Data model is a pictorial representation of the structure of a relational database system” “… it is a description of the objects represented by a computer system together with their properties and relationships” “ER Modelling is a Database design method” At one of these I dug deeper and examined several of the course assignments. One assignment asked students to prepare a model to represent an office environment and in part of the detailed description within the assignment brief it mentioned the “Rolodex” and “IBM Selectric” that were on the desks in this office. Now, I’m not talking here of reading an assignment paper set for a course in 1975, this was one I saw in 2013!! Now with all of these uses of data models that I’ve described so far, and the history of data modelling, and the way it’s still being taught in some Universities, and judging from much of the literature from the data modelling tool vendors themselves; it not surprising that many many people are left with the impression that data modelling is just for DBMS’s. ENTERPRISE ARCHITECTS © 2014 9
BUT THIS IS WRONG WHAT NEEDS TO CHANGE? The use and benefit of Data modelling is considerably greater than its current “one trick pony” press would suggest. To make data modelling relevant for today’s systems landscape we must show that it’s relevant for the “new” technologies such as: ERP packages; SOA & XML Business Intelligence Data Lineage Data Virtualisation and not forgetting that an appropriate level Data Model is an awesome communication tool so it can for used for communicating with the business. See also “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”; Technics Publishing; ISBN 978-0-9771400-7-7; We also need to break away from the “you must read my detailed data model” mentality and make the information available in a format users can readily understand. This for example means that Data Architects need to recognize the different motivations of their users and re-purpose the model for the audience: Don’t show a business user a data model! Information should updated instantaneously, and we must make it easy for users to give feedback, after all you’ll achieve common definitions quicker that way. We need to recognize the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data modelling play a real part in our business then it’s up to us to demonstrate and communicate the genuine benefits that can be realized. Remember, Data modelling isn’t a belief system, just because you “get it” don’t assume that the next person does. ENTERPRISE ARCHITECTS © 2014 10
MODELLING FOR THE “NEW” TECHNOLOGIES I feel I must make a confession here. The technologies are not really all that new! It’s just that “traditionally” Data modelling has not been seen as being relevant to these areas. To break out of this “modelling is a one trick pony” view we need to show how and why data modelling IS relevant for today’s varied IT landscape. Thus we must show that it’s relevant for the “new(er)” technologies such as: ERP packages; SOA & XML Business Intelligence Data Lineage Data Virtualisation ERP packages As data architects, when faced with projects that are embarking upon the introduction of a major ERP package, have you ever heard the cry: “We don’t need a data model – the package has it all”? But, does it? Is data part of your business requirement? Of course it is. So just how do you know whether the package meets your overall business data requirements? You did assess the data component when doing your fitness for purposes evaluation didn’t you? A data model will assist in both package configuration and fitness for purpose evaluation How can you assess that the ERP package has compatible data structures, definitions and meanings as your legacy systems? Again a good data model will assist this. What about data integration, legacy data take on and master data integration – how can these readily be accomplished? You guessed it – a data model can help here too. The critics say that modelling isn’t needed for ERP packages. But that’s because they are wedded to the old-world view that modelling is only used for DBMS development. It’s not. In this case, when we are implementing ERP systems, the model will NOT be required to generate a DBMS from, however for all of the other aspects described above it IS invaluable. So what’s’ the problem? Why can’t we just point our favourite data modelling tool at the underlying DBMS of the package? Simply put, for the most part the problem is that Database System Catalog does not hold useful metadata. Several well-known ERP systems do not hold any Primary Key(PK) or Foreign Key (FK) constraints in the Database itself. It’s ENTERPRISE ARCHITECTS © 2014 11
only within their application layer that this knowledge is held. It is within the proprietary ERP Data Dictionary where anything resembling a ‘Logical View’ of the data and the definitions are held. Figure 2: Part of an ERP reverse engineered directly from the DBMS What we really need is to be able to get the ERP metadata into a useful format similar to that shown in figure 2 below. ENTERPRISE ARCHITECTS © 2014 12
Figure 3: Useful model from an ERP How can we do that? Well there isn’t space in this article to go into the detail, and much of it varies from ERP to ERP. However with for example SAP, there is a metadata extraction facility independently available called SAPHIR. Additionally, you can also validate a model created from SAPHIR be examining key screen items such as in the example illustrated below. Figure 4: Validating an ERP model from transaction screens ENTERPRISE ARCHITECTS © 2014 13
Summary: Why develop data models for package implementation: So why do we need to bother undertaking data modelling when implementing an ERP system? 1) 2) 3) 4) 5) 6) 7) For requirements gathering. If your business data is part of your requirement, you need to model them. For a fit for purpose evaluation. Surely you must have evaluated the suitability of the package before deciding to implement it? For gap analysis: Even if you are told “it’s a done deal – we are going with package X”, the data model will give you rich insight to gaps in key areas of functionality. I have used this many time with clients when implementing major well known packages to help spot areas where a work round or manual implementation will be required. For configuration. Using models as a communication vehicle to demonstrate use case is invaluable. From these the many options in the ERP system can be examined and then configured with confidence. For legacy data migration and take on. For master data alignment. The ERP may have its own master data sets. You can use the model to ensure correct alignment of these with your corporate master data initiative. Don’t fall into the trap of letting the tail wag the dog! Fundamentally, this is the key one. It’s all about ensuring that your ERP data can integrate within your overall Information Architecture SOA and XML I don’t intend to give a detailed exposition on the subject of SOA, however it’s worth reminding ourselves of the fundamental components in the architecture. The Bus in SoA is a “conceptual” construct, which helps to get us away from point to point thinking. An approach for integrating applications via a bus is using Message Oriented Middleware (MOM). Message Broker is a dispatcher of messages and comes in many varieties. The broker operates upon a queue of messages within the routing table. Adapters are where the different technology worlds are translated, eg UNIX, Windows, OS/390 and so on. Fundamentally, SoA is built upon a message based set of interactions, ie all interaction between components is through messages. These are generally XML messages, so it is true to say that XML is at the core of SoA. But there is a potential problem. XML is a hierarchical structure, but the real world of data is not. ENTERPRISE ARCHITECTS © 2014 14
Figure 5: Book example Let’s illustrate this with a real world example – a book. Looking at figure 5, we see that this book is entitled “Data Modelling For The Business”. When we look at this real example we see data such as: Title, Author(s), ISBN, Price, Publisher, Amazon URL and so on. Looking at the authors, (myself, Steve & Donna) there is also some information (on the back cover) relating to each of us. We can develop a data model to represent this “real world” data and show it in an Entity Relational format. Typically these ER models can represent real world data pretty accurately. Figure 5 shows an example ER model for let’s say the “book authoring” data subject area. The business assertion that this data model makes is that: A book can be written (authored) by at least one & possibly several writers (in this case, me, Steve and Donna). A writer may be the author of many books (eg Steve has also written “Data Modeling Made Simple”). Thus Book <> Writer is a many to many relationship. However the intersection entity is a real world concept; it’s the “Book Authorship” entity and this is shown in figure 5 ENTERPRISE ARCHITECTS © 2014 15
Figure 6: Book example ER model Now, when we want to use data in this model within an XML based system we have to remember that XML messages are hierarchic; that is a child entity can only have one parent entity; whereas an entity relationship (ER) model allows a child entity to have several parent entities. Thus we need to do something to turn the ER model representation into a hierarchic XML representation. To accomplish this we need to decide whether to make “Book” the parent of Book Authorship or to choose “Writer” to be the parent. In figure 6 below, the resultant XML model has been created after choosing Book as the parent. Figure 7: Book XML model Whilst simplistic (for the sake of the example), the XML model in figure 7 now represents the XML schema were going to use. Within our SOA based system, we may have a transaction which utilises an XML message called “Book Details”. Figure 8 below, shows how the XML message has been created from the XML schema and is utilised (in the message queue) in our SOA solution. ENTERPRISE ARCHITECTS © 2014 16
Figure 8: Book details XML message So clearly, data modelling IS a key component required in an SOA implementation. It’s kind of ironic that this “new” SOA concept and the representation of data in a hierarchic form (XML messages) draws heavily on the approaches we had to employ when designing a database schema for IMS & DL1 which were hierarchic DBMS’s! Business Intelligence When looking at Business Intelligence and Data Warehouses, we are trying to ensure that the data utilised by the business for their queries and reports is reliable. In order to accomplish this, not only do we need to manage the data that the business utilises, but also the metadata. We all know by now that much of this metadata is contained within the data models. So, what are the main reasons for managing this model metadata? 1. Reduce Cost: In addition to all the other points below, the goal here is to reduce the overall cost of managing a significant part of the IT infrastructure. Managing metadata helps automate processes, reduce costly mistakes of creating redundant/non-conformant data, and reduce the length of time to change systems according to business needs. 2. Higher Data Quality: Without proper management, the same type of data may be managed differently in the places it is used and degrade its quality/accuracy. 3. Simplified Integration: If data is understood and standardized, it reduces the need for complex and expensive coding and scripting to transform and massage data during integration. 4. Asset Inventory: Managing the knowledge about where data lives and what you store is critical for eliminating redundant creation. 5. Reporting: Creating a standard definition of data types and making it easy for the enterprise to find will reduce cost in application development (e.g. time to research and create new objects) as well as facilitate a general understanding of the enterprise’s data. 6. Regulatory Compliance: Without metadata management, you are not complying with regulations. Bottom line: An audit trail of data, starting with its whereabouts, is critical to complying with government mandates. The top 5 benefits from managing this model metadata for reporting are: ENTERPRISE ARCHITECTS © 2014 17
#5 Data Structure Quality. Models ensure that the business design of a data architecture is appropriately mapped to the logical design, providing comprehensive documentation on both sides. #4 Data Consistency. By having standardized nomenclature for all data – including domains, sizing, and documentation formats – the risk of data redundancy or misalignment is greatly reduced. #3 Data Advocacy. Models help to emphasize the critical nature of data within the organization, indicating direction of data strategy and tying data architecture to overall enterprise architecture plans, and ultimately to the business’s objectives. #2 Data Reuse. Models, and encapsulation of the metadata underpinning data structures, ensure that data is easily identified and is leveraged correctly in the first place, speeding incremental tasks through reuse and minimizing the accidental building of redundant structures to manage the same content. #1 Data Knowledge. Models, combined with an efficient modelling practice, enable the effective communication of metadata throughout an organization, and ensure all stakeholders are in agreement on the most fundamental requirement: the data ER Models vs. Dimensional models for reporting Much has been written previously about the appropriateness of ER vs Dimensional models for BI and Data Warehousing. To dispel any myths it’s worth looking at the key features of each type of model: Features of an ER model • Optimised for transactional processing (arrival of new data) • Normalised – typically in 3rd (or 5th normal form) • Designed for low redundancy of data • Relationships between business entities are explicit (e.g. Product determines Brand determines Manufacturer) • Tightly coupled to current business model Features of a dimensional model • “Star Schema” (or snowflake or even star flake) • Optimised for reporting • Business entities are de-normalised • More data redundancy to support faster query performance • Relationships between business entities are implicit (it’s evident that a Product has a Brand and Manufacturer, but the nature of the relationship between these entities is not immediately obvious) ENTERPRISE ARCHITECTS © 2014 18
• Loosely coupled to business model – changes to the business model can often be accommodated via graceful changes without invalidating existing data or applications. Data lineage Don’t forget data lineage – it’s applicable to many aspects, and now with regulatory compliance requirements in many sectors this is now a statutory need. In BI and DW applications, mappings and transformations determine how each field in the Dimensional Model is derived. The derivations could actually drive the ETL process. In lineage, like BI the metadata is vital! What is the problem? Fundamentally we need to be able to help business users to answer questions or concerns raised such as: That figure doesn’t look right! Where does it come from? How can we prove to the auditor that financial data has been handled correctly? Not only do we need to help our primary customers (the business folks), but also we need to be able to help IT staff to answer questions such as: I need to integrate the data supplied from your system with the data in my system. How can I understand where your data has come from and what it means? And finally, we need to be able to help systems to answer questions such as: When a piece of source data is updated, which items in the Data Warehouse will need to be recalculated? So why does data lineage matter? We aim to have an increased understanding of where data comes from and how it is used, which will lead to increased confidence in the accuracy of data The knowledge of how data is transformed is itself valuable intellectual property that should be retained within a business, and very importantly it is absolutely necessary for compliance with the Basel II Accord and Sarbanes-Oxley Act (SOX): SOX requires that lineage & transformation of financial data is recorded as it flows through business systems. Two key aspects of Data Lineage Transformations: What has been done to the data? Business Processes: Which business processes can be applied to the data? ENTERPRISE ARCHITECTS © 2014 19
What type of actions do those processes perform (Create, Read, Update, Delete)? Audit Trail – who has supplied, accessed, updated, approved and deleted the data and when? Which processes have acted on the data? So where do I need Data Lineage? For the design of ETL processes, the creation of Dimensional Models, the transforming data to XML (typically from ER) and for workflow design. Data Virtualisation One of the great newer technologies to emerge recently is Data Virtualisation. Most of us will be familiar with Storage Virtualisation and even Server Virtualisation. The purpose of virtualisation in the IT world is to mask complexity and present a virtual representation of the thing as if it were a real instance itself. So with data virtualisation, data can be federated from a very wide variety of heterogeneous environments & data storage systems, but presented to an application as if it were a real SQL table, XML message, Web service, SOAP call or whatever. Figure 9 illustrates a typical data virtualisation architecture. But what is going to be presented to the applications? We’ve got all sorts of different data formats, rules, characteristics and so on in the source data, so just what are we going to show in our nice new uniform view of the data that is presented to the applications? It’s the data model that is absolutely the language, the key which unlocks the potential of Data Virtualisation. The data model informs the federation layer of the DV toolset, and it against the definitions & structures of the data model that the consuming applications access the data. You can almost imagine Data Virtualisation as being “views on steroids” ENTERPRISE ARCHITECTS © 2014 20
Figure 9: Typical Data Virtualisation Architecture Communicating with the business Finally, data modelling can play a very useful role in helping to communicate with the business. Data models can be produced at different levels (Enterprise, Conceptual, Logical, Physical). At the higher levels a model is a phenomenal tool for getting across ideas, concepts and gaining a good understanding of the language and meaning of the major data concepts in the business. At the highest level, an Enterprise Data Model documents the very high level business data objects and definitions. Its scope is Enterprise wide and is there to provide a strategic view of Enterprise data. The Enterprise data model is there to get across big picture, high level concepts. In a Conceptual Data Model, the business key, attributes and definitions of major business data objects are developed. It also shows the relationship between major business data objects. Common understanding before progressing too far into detail. It is used to communicate with the Business, to give an overview of the main entities, super types, attributes, and relationships. It will contain lots of Many to Many & multi meaning relationships and these relationships show multiple meanings. All of this is addressed in the more detailed logical data model, after there is agreement on scope and definitions from these high level models. Fundamentally, these high level models have different perspectives and levels of detail for different uses. ENTERPRISE ARCHITECTS © 2014 21
DEMONSTRATING BENEFITS As I mentioned earlier, we constantly need to demonstrate the benefits accruing from data modelling. Nobody owes us a living, and no matter how important WE believe the place of modelling to be, it is beholdant upon us to demonstrate (and sell) the benefits within our organisations. So just how can you gain traction, budget and Executive buy-in? Here are a few tips: 1. Be Visible about the program: • Identify key decision-makers in your organization and update them on your project and its value to the organization • Focus on the most important data that is crucial to the business first! Publish that and get buy in before moving on. (e.g. start small with a core set of data) 2. Monitor the progress of your project and show its value: 3. Define deliverables, goals and key performance indicators (KPIs) 4. Start small—focus on core data that is highly visible in the organization. Don’t try to “boil the ocean” initially. 5. Track and Promote progress that is made 6. Measure Metrics where possible • “Hard data” is easy (for example # data elements, #end users, money saved, etc.) • “Softer data” is important as well (data quality, improved decision-making, etc.) Anecdotal examples help with business/executive users e.g. “Did you realize we were using the wrong calculation for Total Revenue?” (based on data definitions) Remember, soft skills are becoming critically important for Information professionals, and whilst you might not like it, the hard facts are that part of YOUR job nowadays IS marketing ENTERPRISE ARCHITECTS © 2014 22
THE GREATEST CHANGE REQUIRED As Information Professionals, we need to break away from the “you must read my detailed data model” mentality and make the appropriate model information available in a format users can readily understand. This for example means that Data Architects need to recognize the different motivations of their users and re-purpose the information they present to be suitable for the audience: Don’t show a business user a data model! Information should be updated instantaneously, and we must make it easy for users to give feedback, after all you’ll achieve common definitions quicker that way. We need to recognize the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data modelling play a real part in our business then it’s up to us to demonstrate and communicate the benefits that are realized. Remember, Data modelling isn’t a belief system, just because you “get it” don’t assume that the next person does. So what can we do? 1. Provide Information to users in their “Language” • Repurpose information into various tools: BI, ETL, DDL, etc. • Publish to the Web • Exploit collaboration tools / SharePoint / Wiki and so on. What about a Company Information Management Twitter channel? • Business users like Excel, Word, Web tools, so make the relevant data available to them in these formats. 2. Document Metadata • Data in Context (by Organization, Project, etc.) • Data with Definitions 3. Provide the Right Amount of Information • Don’t overwhelm with too much information. For business users, terms and definitions might be enough. • Cater to your audience. Don’t show DDL to a business user or business definitions to a DBA. 4. Market, Market, Market! • Provide Visibility to your project. • Talk to teams in the organization that are looking for assistance • Provide short-term results with a subset of information, and then move on. 5. Be aware of the differences in behaviour & motivations of different types of users, for example a DBA is typically: • Cautious • Analytical • Structured ENTERPRISE ARCHITECTS © 2014 23
• Doesn’t like to talk • “Just let me code!” However a Data Architect is: • Analytical • Structured • Passionate • “Big Picture” focused • Likes to Talk • “Let me tell you about my data model!” And a Business Executive is: • Results-Oriented • “Big Picture” focused • Has little Time • “How is this going to help me?” • “I don’t care about your data model.” • “I don’t have time.” As Information professionals we’ve got to get these softer skills baked into ourselves and our colleagues. Some of the key things as a profession we can do is to: • Develop Interpersonal skills • Avoid methodology wars & notation bigots. Please don’t air discussions about Barker vs IE vs UML class diagrams in front of business users. Yes, sadly enough I have seen this done! • Remember, nobody owes us a living, so we must constantly demonstrate benefits. As data professionals we constantly need to fight for their existence • Examine professional certification (CDMP / BCS etc). This shows we are serious about our profession. ENTERPRISE ARCHITECTS © 2014 24
WHAT NEEDS TO STAY THE SAME? So having highlighted the areas that need to change in order to make modelling more relevant to our business colleagues, and the information environments of today, are there any things that should stay the same? Yes indeed. We must keep the disciplines and best practices that have existed in the modelling community for many years. These can be categorised into 3 major areas as follows: 1) Modelling rigour: Development of Conceptual, Logical and Physical Data models with good lineage and object re-use. Structures created in the most appropriate normal form (typically 3rd normal form); Good and consistent data definitions, for all components of the data model. 2) Standards & Governance These cover standards for both development and usage of information models, including aspects of data quality. Data Governance including ownership, stewardship and operational control of the data. 3) Object reuse via a common repository Not only used for data modelling, the metadata that is captured whilst developing Conceptual, Logical and Physical Data models is of immense use for many aspects of the business. Interestingly, several organisations are now beginning to use this metadata as the basis of their Business Data Dictionaries. The key here is holding the metadata in a common, repository and reusing the objects where appropriate. So, through this paper we have examined many aspects of data modelling, starting with its history, its use in DBMS development, the way it is taught in some Universities and firmly refuting the criticism that it’s only appropriate for DBMS development. However as data professionals, it’s up to us to make the biggest change necessary to make it appropriate to the new technologies and business environments of today. We need to grasp the nettle and engage effectively within our businesses. Go to it ……… ENTERPRISE ARCHITECTS © 2014 25
ABOUT THE AUTHOR INDUSTRY STANDING Chris is Enterprise Services Director (UK) & Chief Information Architect at Enterprise Architects; an international strategy & architecture professional services firm headquartered in Melbourne Australia, providing strategic architecture delivery and support services to Fortune 500's globally. OVERVIEW Christopher Bradley has spent over 30 years in the vanguard of Information Management, working for leading organisations in Information Management Strategy, Data Governance, Data Quality, Information Assurance, Master Data Management, Metadata Management, Data Warehouse and Business Intelligence. Studying Chemical Engineering at University Mr. Bradley’s post academic career started at the UK Ministry of Defence where he worked on several major Naval Database systems and on the development of the ICL Data Dictionary System (DDS). His career spans: Volvo as Lead Database Architect, Thorn EMI as Head of Data Management, Readers Digest Inc as European CIO, and Coopers and Lybrand (later PWC) where he established and ran the International Data Management specialist practice. During this time he led many major international assignments including data management strategies, data warehouse implementations and establishment of data governance structures and the largest data management strategy in Europe. Chris established and ran the Business Consultancy practice at IPL, a UK based consultancy and has worked for several years with prestigious international clients including GSK, TOTAL, Bank of England, American Express, BP, Statoil and Saudi Aramco. Chris advises global organisations on Data Governance, Information Management best practice and how organisations can genuinely manage Information as a critical corporate asset. Frequently he is engaged to evangelise the Information Management and Data Governance message to executive management worldwide, introduce data governance and new business processes for Information Management and to deliver training and mentoring information management professionals. Chris is an officer of DAMA International, a contributor to DMBoK 2.0, a member of the Meta Data Professionals Organisation (MPO) and holder at “master” level and examiner for the DAMA CDMP professional certification. He recently co-authored a book “Data Modelling for the Business – A handbook for aligning the business with IT using high-level data models”. He is a regularly sought after and highly rated conference speaker at Information Management events worldwide. He also authors the Information Asset Management “Expert channel” on the BeyeNETWORK, and is a regular Tweeter as @InfoRacer and blogger on Information Management (http://infomanagementlifeandpetrol.blo gspot.com/). Chris can be contacted at email@example.com RECENT SPEAKING ENGAGEMENTS DAMA Australia: (DAMA-A),18-21 November 2013, Melbourne, Australia: “Exploiting DAMA DMBoK 2.0” Keynote; “Information Management Fundamentals workshop”; “CDMP Examination workshop” Enterprise Data & BI Conference Europe: November 2013, London, UK “Workshop: Data Modelling Fundamentals” IPL / Embarcadero IM series: June 2013, London, UK, “Implementing Effective Data Governance” Riyadh Information Exchange: May 2013, Riyadh, Saudi Arabia, “Big Data – What’s the big fuss?” Enterprise Data World: May 2013, San Diego, USA, “Workshop: Data and Process Blueprinting – A practical approach for rapidly optimising Information Assets” Data Governance & MDM Europe: April 2013, London, “Selecting the Optimum Business approach for MDM success…. Case study with Statoil” E&P Information Management: February 2013, London, “Case Study, Using Data Virtualisation for Real Time BI & Analytics” E&P Data Governance: January 2013, Marrakech, Morocco, “Workshop: Establishing a successful Data Governance program” Big Data 2: December 2012, London, “Big data projects aren’t one man shows” Financial Information Management Association: November 2012, London, “Data Strategy as a Business Enabler” Data Modeling Zone: November 2012, Baltimore USA,“Data Modelling for the business” Data Management & Information Quality Europe: November 2012, London, “DAMA CDMP Workshop” ECIM Exploration & Production: September 2012, Haugesund, Norway, “Enhancing communication through the use of industry standard models; case study in E&P using WITSML” Preparing the Business for MDM success: Threadneedles Executive breakfast briefing series, July 2012, London
Summary and highlights for conceptual modeling in software engineering. ... System is processing data ... Share Conceptual modeling of information systems.
6. 6Data ScienceData => Actionable information ... Why a Data “Garage Sale” is Essential ... Information is at the heart of all architecture ...
Telomerase Is Essential for Zebrafish Heart ... where only one heart was analyzed. Data ... Information is at the heart of all architecture disciplines ...
The Unified Process insists that architecture sit at the heart ... nine disciplines: Business Modeling, ... Unified Process; Essential ...
... and building information modeling ... All state projections data are available at www ... For information about careers in architecture, ...
Integration and Conceptual Modeling. ... (in a data warehouse architecture) ... thought patterns and cultures of the disciplines. Conceptual models ...
... UVA School of Architecture, ... candidates in all the School’s CERTIFICATE disciplines the ... Information Modeling (BIM) in architecture?
Modeling and Simulation ... Information in the book is from various sources, ... Architecture identifies all the products ...