Par: Gregory Keller
Résumé: Jason Tiret's paper on designing and standardizing database designs
A bottom-up approach to designing and standardizing database design
By Jason Tiret
ER/Studio Product Manager, Embarcadero Technologies, Inc.
100 California Street, 12th Floor
San Francisco, CA 941111
Embarcadero Technologies, Ltd.
Thames House 17 Marlow RoadMaidenheadBerkshireSL6 7AAUnited Kingdom
Table of Contents
This paper takes a bottom-up approach to dissecting and resolving design problems. It then discusses how teams can work together in a secure collaborative environment where standards can be enforced in the maintenance and construction of database designs.
Organizations face many challenges when building and maintaining database systems. Enforcing standards in database construction is very important as a database structure evolves, development times decrease and teams grow. Without specific guidelines and standards in place, potential problems can arise in the database that can lead to data inconsistencies, data integration issues, and even data loss. From development to implementation to maintenance, various groups are responsible for care and evolution of the database. Developers down the hall from each other may be throwing new tables into existing databases as change requests are submitted, but who is to say that every one is on the same page? What happens if one developer is using VARCHAR(30) for “name” columns while another developer is using VARCHAR(50)? Inconsistencies in the design can arise if those developers don’t have a glossary of data elements and business rules defined with specific standards in mind. This can in turn, have costly trickle down effects such as complex data integration and reporting inaccuracies.
Modeling tools can help tremendously in our pursuit design standardization and data integrity analysis. For this problem we will use Embarcadero’s data modeling solution, ER/Studio, because of its sophisticated data dictionary system and advanced collaborative server, ER/Studio Repository. In order to perform our analysis we will need to visualize the database. We will start with reverse engineering an Oracle schema. ER/Studio will connect to the database and build a logical and physical model based on the metadata in the system catalog. This will then allow us to dissect the schema to see potential problems in the design that could lead to data integration problems. The problems can then be corrected directly in the modeling environment and the lessons learned can be applied to new designs that are created in the future.
The roadmap we will use will be the ER/Studio diagram. ER/Studio will build for us a data dictionary with a first-cut list of domains and tie them to a logical and physical model. Once we have a data model constructed, we will have the blueprint we need to assist in our standardization of the database.
IMPORTANT NOTE: This option will build a domain for every column that is unique by name AND data type. If ER/Studio comes across two columns that have the same name and data type, then they will map to the same domain, otherwise multiple domains are created and each respective column will have its own domain. This is how we will do our analysis of the database design.
Now that the model has been created and the domains have been inferred, the analysis can begin. We will start by looking at the various data elements in the database tables. This is done by going to the domains. The domains will be the “short list” of data elements (i.e., columns) from the database. The infer operation will show us where the properties of the data elements differ in any manner. Duplicate domains are a red flag, since ER/Studio will automatically create new domains if it finds the same column name with different properties, like data type, rule, constraint etc. Let’s take a look at the list.
Below is part of the list that was created from the infer operation.
Notice we have two DESCRIPTION domains. By taking a closer look in the respective domains editors, you will see that one is VARCHAR(255) and the other is VARCHAR(500). This is a potential example of a divergence from standards. The difference here may be on-purpose or not. If it is on-purpose, then we should look at possibly improving naming conventions to DESCRIPTION for short strings and NOTES for longer strings. If it was not on-purpose, then we have just isolated a problem in the design that could lead to data loss if data from each of these columns ever needed to be integrated.
To perform further analysis, you can right click on the domain and going to “View Domain Bindings.” This will show two things:
If we do this for DESCRIPTION we get:
This tells us exactly where this domain is used in the logical and physical models. You can also see this information in the “Binding Information” tab of the domain editor. That tab can also be used to assign the domain to other columns or attributes in either the physical or logical models.
Assuming that the conflicting data types are in fact a problem, the problem now needs to be corrected. Since domains are used, this is very easy to do. Because DESCRIPTION_2 is the domain that has the larger width – 500 – this is what we will use as our standard for description data. What needs to be done is all of the columns and attributes that reference DESCRIPTION need to be directed to DESCRIPTION_2. We can also look through the domain list to see any other domains that could be candidates for description data.
We have already seen how we can determine where the domain is used by using the context menu or the “Binding Information” tab. The easiest way to do this, since both domains are currently in use, is to leverage ER/Studio’s Automation Interface and employ one of the macros that ships with the product, Switch Domain Bindings.
We have now re-factored the domain list so that we have only one standard for description data listed. Now for any developer, DBA, or data architect who uses the domain on a new column, the standard properties will be used and the organizational standards will be followed.
What we have done thus far is analyze and enforce standards within the scope of a single Oracle schema. We have done nothing to make this available to our peers who may need to run their own analysis, make changes to the logical or physical design, or possibly generate reports on the metadata. We also haven’t incorporated any of our organizational business rules, standard data elements, or extended metadata that may be defined on a more global scope across applications. In order to broaden the scope of our development and standards enforcement across a team, we will leverage ER/Studio’s collaborative server, ER/Studio Repository. This will allow real-time, team-based access to diagrams managed within the Repository collaboration server in a safe and secure manner. The Repository installation can be deployed on Oracle 8.x/9.x, DB2 UDB 7.x/8.x, SQL Server 2000, Sybase 12.x.
In order to work collaboratively on the model that was reverse-engineered, it will need to be added to the ER/Studio Repository. This will save the diagram information into the Repository database so that any other ER/Studio user can access this model provided they have the proper permissions to do so.
The diagram can be initialized in the following dialog:
During the process of putting the diagram in the Repository, we can add any enterprise data dictionaries that exist in the Repository to it. The Enterprise Data Dictionary system will allow us to globally define domains and other dictionary objects, such as rules, attachments, defaults and UDT, so that we can enforce standards across any diagram in the ER/Studio Repository. Notice in the screen shot on the previous page that we added the EMBT dictionary to our diagram before it got added to the repository. This will attach all of the standard components from that dictionary to the diagram so that we can leverage everything contained in that dictionary.
We can slowly start to promote attributes and columns that don’t have domains or that are still leveraging the local data dictionary to reference domains in the EMBT dictionary allowing us to globally manage them more easily. This will also allow us to perform the same impact analysis, but this time across the entire Repository because that dictionary is used on other diagrams that are housed in the Repository. For example, we may have an Identifier domain in our EMBT dictionary that we use to manage all of our ID attributes and columns. By binding ID columns from various models (NOTE: This can be done through the Binding Information tab of the enterprise domain or by using the macro to promote attributes/columns to enterprise domains), we will be able to manage all of our ID attributes/columns through the use of one domain. We will also have traceability across our collaborative environment so that we can see where Identifier is used.
It is now time to decide what we want to manage through the enterprise dictionary and what may be okay if left referenced to the local dictionary. We still have the list of domains that were inferred during the reverse engineering operation. Some of those can be used for application-specific purposes on this particular diagram that manages the Oracle schema. Others we may want to start promoting to reference domains in EMBT dictionary. The description attributes/columns may be good candidates for that. We consolidated them to one domain in the local dictionary, but looking in the EMBT dictionary, there is a “Long String” domain that can be used for similar types of data elements. Looking at the properties of the domain, it looks like it is the right fit to manage the attributes/columns that were assigned to the DESCRIPTION_2 domains.
It makes sense to manage all of these by one global domain. That way if organization-wide initiatives are driving changes to database structures, they can be swiftly implemented, and standards are enforced on all applications throughout the development process. To promote the description data elements that reference the local DESCRIPTION_2 domain, we can employ the same Switch Domain Bindings macro. This time we need to be careful about selecting the proper source and target dictionaries and the domains.
The DESCRIPTION_2 domain can then be deleted from the local dictionary so that no one uses it by mistake.
With the help of ER/Studio and its collaboration server, ER/Studio Repository, we were able to isolate potential integration problems within an Oracle database. What we saw by leveraging domains and the infer functionality within the product was how conflicting data types between common data elements can surface in a live database. How they got there may not be known, but this can lead to a variety of data integration issues. The problem was efficiently corrected by consolidating on one standard while new standards were brought into the design process with the repository so that the problem wouldn’t surface again. Data integrity problems can arise in many ways, but can be prevented by incorporating and enforcing standards when working in a team environment.
Embarcadero, the Embarcadero Technologies logos and all other Embarcadero Technologies product or service names are trademarks of Embarcadero Technologies, Inc. All other trademarks are property of their respective owners.
Réponse serveur de: ETNASC04