Basic structures: classes and schemes
The UML class
A UML class (ER term: entity) is any “thing” in the enterprise that is to be represented in our database. It could be a physical “thing” or simply a fact about the enterprise or an event that happens in the real world.
Example: We’ll build a sales database—it could be for any kind of business. To sell anything, we need customers, so a Customer will be our first class (entity) type.
• The first step in modeling a class is to describe it in natural language. This helps us to know exactly what this class (“thing”) means in the enterprise. We can describe a customer like this:
“A customer is any person who has done business with us or who we think might do business with us in the future. We need to know this person’s name, phone number and address in order to contact him or her.”
• Each class is uniquely defined by its set of attributes (UML and ER), also called properties in some OO languages. Each attribute is one piece of information that characterizes each member of this class in the database. Together, they provide the structure for our database tables or code objects.
• In UML, we will only identify descriptive attributes—those which actually provide real-world information (relevant to the enterprise) about the class that we are modeling. (These are sometimes called natural attributes.) We will not add “ID numbers” or similar attributes that we make up to use only inside the database.
Class diagram
The class diagram shows the class name (always a singular noun) and its list of attributes.
Other views of this diagram: Large image - Data dictionary (text)
Relation scheme
In an OO programming language, each class is instantiated with objects of that class. In building a relational database, each class is first translated into a relational model scheme. The scheme is identified by the plural form of the class name, and starts with all of the attributes from the class diagram.
Other views of this diagram: Large image - Data dictionary (text)
• In the relational model, a scheme is defined as a set of attributes, together with an assignment rule that associates each attribute with a set of legal values that may be assigned to it. These values are called the domain of the attribute. We’ve chosen to show the scheme graphically, but we could also have written it in set notation:
Customers Scheme = {cFirstname, cLastname, cPhone, cStreet, cZipCode}.
• There is no convenient graphical way to represent domains; we’ll discuss this issue in a later page. For the moment, our Customers relation scheme looks exactly like the Customer class diagram, only drawn sideways. It won’t stay that way for long.
• It’s important to recognize that defining schemes or domains as sets of something automatically tells us a lot more about them:
- They cannot contain duplicate elements. Our Customers scheme, for example, cannot have two cPhone attributes (even if they are called cPhone1 and cPhone2).
- The elements in them are unordered. It doesn't matter if a customer's name is listed in order “Last, First” or “First Last”—they mean the same thing.
- We can develop rules for what can be included in them and what is excluded from them. For example, zip codes don’t belong in the domain (set) of phone numbers, and vice-versa.
- We can define subsets of them—for example, we can display only a selected set of attributes from a scheme, or we can limit the domain of an attribute to a specific range of values.
- They may be manipulated with the usual set operators. In a later page, we will show how both the union and the intersection of schemes are used to join (combine) the information from two or more tables based on different schemes.
Table structure
When we actually build the database, each relation scheme becomes the structure for one table. The SQL syntax for creating the table includes a data type for each attribute, which is needed for the database but is not the same as the domain of the attribute.
CREATE TABLE customers (
cfirstname VARCHAR(20) NOT NULL,
clastname VARCHAR(20) NOT NULL,
cphone VARCHAR(20) NOT NULL,
cstreet VARCHAR(50),
czipcode VARCHAR(5));
In this example, VARCHAR is simply a variable-length character string of no more than the number of characters in parentheses. Consult your own system documentation for supported data types. We will explain the extra keyword NOT NULL when we look at rows and tables.