Tidying Up

Before we get started on normalization proper, we need to do a little preparation and tidying up. Normalization relies on certain assumptions about the way data is represented, and we need to make sure that these are valid. There are also some problems the normalization does not solve, and is better to address these at the outset, rather than carrying excess baggage through the whole normalization process.

One Fact Per Column

Each column in the table should represent one fact only. The Priority/Date columns hold both a priority of project and deadline date. These two facts should be recorded in separate columns.

Hidden Data

The second piece of tidying up involves making sure that we have not lost any data in transition to tabular form. The most common problem here is that we cannot rely on the rows of the table being stored in any particular order. Suppose that the first employee in list is a team lead. If we wanted to preserve this data, we would need to add an Employee Sequence column. As this should be simple example we will skip this step.

Derivable Data

One of fundamental objective of good data model is non-redundancy. We should remove any data that can be derived from other data in table and amend the columns accordingly. The Total Stuffis derivable by counting assigned employees, and the Average Hourly Rate is derivable by adding hourly rates for all assigned employees and dividing that sum with the number of employees. So, we can remove this two columns.

Primary Key

Finally, we determine a primary key for table. Primary key is a minimal set of columns that contains a different combination of values for each row of the table. Another way of looking at primary keys is that each value of the primary key uniquely identifies one row of the table. In this case Project Code will do the job.

Figure below shows the result of tidying up initial model shown on previous figure. Initial data model after tidying up