GeekInterview.com
Series: Subject: Topic:
Question: 82 of 129

What is surrogate key ? Where we use it explain with examples?

Asked by: Interview Candidate | Asked on: Oct 18th, 2005

Editorial / Best Answer

Answered by: saravanan

Answered On : Dec 2nd, 2005

Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, bz the production key may be alphanumeric or composite key but the surrogate key is always single numeric key. Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more space, so it is not advisable to join/index, bz generally all the datawarehousing fact table are having historical data. These factable are linked with so many dimension table. if it's a numerical fields the performance is high

Showing Answers 1 - 23 of 23 Answers
mallakishore

Answered On : Oct 20th, 2005

View all answers by mallakishore

Surrogate key is the primary key for the Dimensional table.

  
Login to rate this answer.
rvishnudas

Answered On : Oct 30th, 2005

View all answers by rvishnudas

Surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but, not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' The entire new turnovers have to belong to the new Business Unit 'BU2' but the old one should belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes

Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

  
Login to rate this answer.
venkatkalpana

Answered On : Oct 31st, 2005

View all answers by venkatkalpana

Dear User,

When creating a dimension table in a data warehouse, we generally create the tables with a system generated key to uniquely identify a row in the dimension. This key is also known as a surrogate key. The surrogate key is used as the primary key in the dimension table.

The surrogate key will also be placed in the fact table and a foreign key will be defined between the two tables. When you ultimately join the data it will join just as any other join within the database.

Venkat

Yes  1 User has rated as useful.
  
Login to rate this answer.

A surrogate key to a datawarehouse is what a primary key is for an OLTP source. It is used to uniquely identify a record in dimension tables. it provides the solution for the critical col., problem.amar

  
Login to rate this answer.
girish

Answered On : Nov 28th, 2005

Surrogate key is system generated artificial primary key values

eg: any candidate key can be considered as surrogate key.

  
Login to rate this answer.
saravanan

Answered On : Dec 2nd, 2005

Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, bz the production key may be alphanumeric or composite key but the surrogate key is always single numeric key.

Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more space, so it is not advisable to join/index, bz generally all the datawarehousing fact table are having historical data. These factable are linked with so many dimension table. if it's a numerical fields the performance is high

Yes  1 User has rated as useful.
  
Login to rate this answer.
sithusithu

Answered On : Jan 3rd, 2006

View all answers by sithusithu

Surrogate key in a data warehouse is more than just a substitute for a natural key. In a data warehouse, a surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design. Sithu

  
Login to rate this answer.
ravi kumar guturi

Answered On : Mar 9th, 2006

Surrogate key is a system generated key, It is mainly used for critical in dwh,Here critical means nothing but it is a column which when we updated in the them most dwh in to OLTP systems

  
Login to rate this answer.
sunil

Answered On : Mar 17th, 2006

Surrogate keys are that which join dimension tables and fact table.

  
Login to rate this answer.
Giridhar

Answered On : Mar 29th, 2006

Surrogate Key is the solution for critical column problems.

For example the customer purchases different items in different locations,for this situation we have to maintain historical data.

By using surrogate key we can introduce the row in the data warehouse to maintain historical data.

  
Login to rate this answer.
Jyothsna Tallapally

Answered On : Apr 27th, 2006

A surrogate key is any column or set of columns that can be declared as the primary key instead of a "real" or natural key. Sometimes there can be several natural keys that could be declared as the primary key, and these are all called candidate keys.So a surrogate is a candidate key. A table could actually have more than one surrogate key, although this would be unusual. The most common type of surrogate key is an incrementing integer, such as an auto_increment column in MySQL, or a sequence in Oracle, or an identity column in SQL Server.Use of surrogate key:Every join between dimension tables and fact tables in a data warehouse environment should be based on surrogate keys, not natural keys. It is up to the data extract logic to systematically look up and replace every incoming natural key with a data warehouse surrogate key each time either a dimension record or a fact record is brought into the data warehouse environment.

  
Login to rate this answer.
Surajit Das

Answered On : Jun 13th, 2006

Nice example. One more point to add:

Surrogate key by nature is number , so for joining or while processing complex querry it will take lesser time for integer comparison comparative to character comparison.

  
Login to rate this answer.
Guest

Answered On : Jul 15th, 2006

Hai Friends

Surrogate Key a simple concept.

Correct n exact answer for SURROGATE KEY IS BELOW:

Definition of Surrogate Key:

Alternate of Primary Key that allows duplication of datas/records.

Need, Where & Why we use Surrogate Key:

OLTP is of "Normalised Form" whereas OLAP (i.e.) Datawarehouse is of "De-normalised form".

Actually theDWH concept is to maintain the historic datasfor analysing. So itsshould denormalized form.

To be denormalise form duplication should be allowedinDWH. Whendatas entering the DWHSurrogate keya new column namedserial number is introducedto allow duplication in OLAP Systems to maintain historic datas.

You all know one thing a single mobile is used by other person if it is not in use for more than one year. how is it posssible just because of this Surrogate Key.

Thanks

Suresh

  
Login to rate this answer.
suresh kanth.C

Answered On : Nov 20th, 2006

We can say "Surrogate key" is a User defined primary key..

  
Login to rate this answer.

I think there already enough statements of what a surrogate key is. From experience, the huge advantage:-

Our warehouse captured data from an existing system which identified SALES by a unique number. We implemented this as the "Natural" or "Business" Key but also generated a unique "Surrogate" key. Seems pointless as the incoming business key is numeric already. But we did it.

Six months into the project, the users announced the feeder system would be replaced, but the two SALES systems would run in parallel for a year. Our existing system had a numeric natural key, the new system had a unique ID of a 20 character field.

We simply added the 20 character field to the existing dimension, and the ETL looked up using the appropriate key (existing system key or new replacement ID). Either way it matched and was converted a source system independent surrogate key.

No fuss, no re-design. It just worked. Surrogate key importance proven.

So there's little room for confusion:-

Dimension: SALESMAN

SALES_ID Number not null /* Primary Key */
LEGACY_SALE_IDENTIFIER Number /* Legacy system business key */
NEW_SALE_IDENTIFIER varchar(20) /* New system business key */
...Other attributes...


The SALES_ID the the "Surrogate key" which is the Foreign Key link to the Fact Table(s). This key is simply a generated sequence whenever a new SALESMAN entry is created.

The design (as already described) can be extended to support type 2 Slowly changing dimensions.

  
Login to rate this answer.
sudhakarnbh

Answered On : May 12th, 2008

View all answers by sudhakarnbh

Surrogate is mainly used in slowly changing dimensions,it maintaining the uniqueness in the table.it is used to track the old value with the new one.And it is derived from primary key.

  
Login to rate this answer.

Surrogate is mainly used in slowly changing dimensions

Is not strictly true. While a surrogate key can be used to support a SCD, it's primary purpose is to insulate the warehouse from changes in the source systems.

As a by-product it can also reduce disk space usage in fact tables as foreign keys to dimension tables are simple integers.

I have implemented a number of systems with Dimension Tables which did not use type 2 or 3 SCDs, but did implement surrogate keys.

  
Login to rate this answer.

Consider,we are analyzing a table with columns "NAME,AGE,DOB"suppose the key for this table be "NAME".
So,as the "NAME" column was keyed, it was kept in the virtual memory for quicker retrieval.
If the "name" containing some characters was in the memory it cannot retrieve soon.
So,system is in need of some integers to store in the memory instead of storing characters.
Hence the arrival of surrogate key for that primary key"NAME" will be easier for the system to retrieve quicker than before.

  
Login to rate this answer.
somesh

Answered On : Nov 28th, 2010

View all answers by somesh

Actually your explanation is very useful. but still I am not clear. Can I use the surrogate key as a foreign key in another table and reference to the master table. In that case will it work ok. Is it a good practice to take surrogate key as primary key in each table except of some of like customer table, employee detalils table etc.

  
Login to rate this answer.
somesh

Answered On : Nov 28th, 2010

View all answers by somesh

Is it somewhere necessary for the table to have an additional key with surrogate key to uniquely identify each row or during join.

  
Login to rate this answer.
santosh kumar

Answered On : Apr 2nd, 2012

Surrogate key and primary key both are same, but point of application is different.

Primary key is part of transactional system but surrogate key is part of analytical system.

Surrogate key is used uniquely identify each and every records in dimension tables in data warehouse, primary key is used to identify ,each and every records in transactional system in OLTP.

  
Login to rate this answer.
rj_dwbi

Answered On : Apr 2nd, 2012

View all answers by rj_dwbi

Surrogate key is a numeric or integer value in a table to identify each row uniquely. Primary key and surrogate key are same but surrogate key is a system generated value has a define incremental value for each row in a table. Surrogate key does not have any business importance for the value it holds but primary key has a significant business value.

  
Login to rate this answer.

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

Related Open Questions

Ads

Connect

twitter fb Linkedin GPlus RSS

Ads

Interview Question

 Ask Interview Question?

 

Latest Questions

Ads

Interview & Career Tips

Get invaluable Interview and Career Tips delivered directly to your inbox. Get your news alert set up today, Once you confirm your Email subscription, you will be able to download Job Inteview Questions Ebook . Please contact me if you there is any issue with the download.