Data WarehouseFrom the perspective of computer science in general and telecommunications in particular, data integrity refers to the wholeness or completeness of data during operations involving transfer, storage and retrieval. It also refers to the preservation of data so that whatever process in undergoes through, it will still remain to be what it has been intended for. In other words, data integrity is the assurance that data will always be correct, consistent and accessible.
Think of a furniture, say, a chair. You can a move a chair from place to another, and you will soon discover that at some point in its movement something will get broken.
Today data traveling and getting broken is frequent and inevitable circumstance. In fact, with today's internet protocols, data will be broken into packets. The point with data integrity is that when data breaks, it can be recovered.
There are many ways wherein data integrity can be compromised. Human error in data entry is one of the top causes. Another could be the instability of communications medium when transmitting data. Software applications having bug and viruses could also compromise data integrity. Also, hardware malfunctions such disk crashes is a cause of compromised data integrity.
In relational databases which are key technologies behind data warehouses, data integrity is focused on the correctness, validity and accuracy of data within the database. One of the most common types of database integrity is the referential integrity. This type of data integrity involves prevention of errors in the relationship between a foreign key and primary key. An example problem would be an orphan child record that is missing its parent record which was deleted and the keys in the relation were not properly defined.
Data integrity in relational database can be achieved by having careful database planning and design. A database designer or developer should use integrity constraints in order to enforce all business rules that are closely associated with the database. This can ensure that end users cannot enter invalid information or data consumers can alter data without the right privileges. And when someone with the appropriate privileges deletes or alters data, relationships through keys can be maintained so no record can be left orphaned.
In general sense, threats to data integrity can be minimized by having regular data backups, controlling access of data by defining roles and privileges and installing security tools, designing user interfaces that can warn or prevent invalid input, and using error detection and correction tools necessary in data transmission.
