What is a staging area? Do we need it? What is the purpose of a staging area?
Answered by Laxmi on 2005-05-13 16:42:53: Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data , and perform data cleansing and merging , before loading the data into warehouse.
One or more database schema(s) or file stores used to “stage” data extracted from the source OLTP systems prior to being published to the “warehouse” where it is visible to end users.
Data in the staging area is NOT visible to end users for queries, reports or analysis of any kind.It does not hold completed data ready for querying.
It may hold intermediate results, (if data is pipelined through a process)
Equally it may hold “state” data – the keys of the data held on the warehouse, and used to detect whether incoming data includes New or Updated rows. (Or deleted for that matter).
It is likely to be equal in size (or maybe larger) than the “presentation area” itself.
Although the “state” data – eg. Last sequence loaded may be backed up, much of the staging area data is automatically replaced during the ETL load processes, and can with care avoid adding to the backup effort. The presentation area however, may need backup in many cases.
It may include some metadata, which may be used by analysts or operators monitoring the state of the previous loads (eg. audit information, summary totals of rows loaded etc).
It’s likely to hold details of “rejected” entries – data which has failed quality tests, and may need correction and re-submission to the ETL process.
It’s likely to have few indexes (compared to the “presentation area”), and hold data in a quite normalised form. The presentation area (the bit the end users see), is by comparison likely to be more highly indexed (mainly bitmap indexes), with highly denormalised tables (the Dimension tables anyway).
The staging area exists to be a separate “back room“ or “engine room”of the warehouse where the data can be transformed, corrected and prepared for the warehouse.
It should ONLY be accessible to the ETL processes working on the data, or administrators monitoring or managing the ETL process.
In summary.A typical warehouse generally has three distinct areas:-
Several source systems which provide data. This can include databases (Oracle, SQL Server, Sybase etc) or files or spreadsheets
A single “staging area” which may use one or more database schemas or file stores (depending upon warehouse load volumes).
One or more “visible” data marts or a single “warehouse presentation area” where data is made visible to end user queries.This is what many people think of as the warehouse – although the entire system is the warehouse – it depends upon your perspective.
The “staging area” is the middle bit.
Above answer was rated as good by the following members: njvijay, butair
RE: What is a staging area? Do we need it? What is the purpose of a staging area?
Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data and perform data cleansing and merging before loading the data into warehouse.
RE: What is a staging area? Do we need it? What is the purpose of a staging area?
In the absence of a staging area the data load will have to go from the OLTP system to the OLAP system directly which in fact will severely hamper the performance of the OLTP system. This is the primary reason for the existence of a staging area. In addition it also offers a platform for carrying out data cleansing.
RE: What is a staging area? Do we need it? What is the purpose of a staging area?
Staging area is a temp schema used to 1. Do Flat mapping i.e dumping all the OLTP data in to it without applying any business rules.pushing data into staging will take less time becasuse there is no business rules or transformation applied on it.
2. Used for data cleansing and validation using First Logic.
RE: What is a staging area? Do we need it? What is the purpose of a staging area?
A staging area is like a large table with data separated from their sources to be loaded into a data warehouse in the required format. If we attempt to load data directly from OLTP it might mess up the OLTP because of format changes between a warehouse and OLTP. Keeping the OLTP data intact is very important for both the OLTP and the warehouse.
RE: What is a staging area? Do we need it? What is the...
According to the complexity of the business rule we may require staging area the basic need of staging area is to clean the OLTP source data and gather in a place. Basically it s a temporary database area. Staging area data is used for the further process and after that they can be deleted.
RE: What is a staging area? Do we need it? What is the...
Staging area is a common place to keep data from different source systems before performing transformation (cleansing)operations. The decision to store data in physical staging area versus processing it in memory is ultimately the choice of the ETL architect. The main reasons for staging data before it is loaded into the data warehouse: 1. Recoverability
RE: What is a staging area? Do we need it? What is the...
Staging area is a place where the data will be stored before applying transformation operations and from there the data will be stored to the data warehouse
RE: What is a staging area? Do we need it? What is the...
Staging area is a place to hold data from OLTP to OLAP until all transformation had been complete and without staging we can't do modify filter cleansing as per my knowledge