Monday, May 11, 2015

Datastgae Introduction - Server Vs Parallel


 

Datastage is ETL tool

·        Extract , transform and load

·        Earlier the product was owned by company called stage àthen Ascential datastage,-à IBM Infosphere in 2008

 

IBM Infosphere has several tools

·        Datastage

·        Quality stage

·        Information Analyzer

·        MDM – Master data management etc

 

Difference between version 7.5 (Ascential) Vs Datastage 8.0

 

Ascential
IBM Infosphere
File based Repository like table definitions etc
Database based repository
2 Tier ( Unix server + Datastage )
3 Tier ( Unix server + xMETA + Datastage )
Director,Manager,Designer and Administrator
Director, Designer and Manager is integrated into one as Designer,Administrator
Unix Login is sufficient
Datastage needs a separate user group and access rights
 
Parameter sets were introduced
Previously it was 1-100 ex:, next time when we run it is again 1-100
Enhanced Surogate key generator à 1-100
101->200
 
New stages were introduced like connector stages, improved transformer stage

 

 

Director Client

·        Validate, runs ,monitor and schedule the jobs. We can do the same thing in designer client however we can look at multiple running jobs at a time

 

Administrator Client

·        Creating and managing user creation/projects

·        Setting up project specific parameter sets

 

Designer client

·        Designing the job

 

Types of jobs

·        Server Jobs

·        Parallel Jobs

·        Sequence Jobs

 

Server Jobs
Parallel Jobs
Uses Basic Compiler
Uses C++ compiler. Background all the datastage jobs are converted to OSH which requires a C++ Compiler
Uses Single node
Uses multiple node
Executes on DS Server Engine
Executes on DS Parallel Engine
Handles less data
Handles huge data
Processing speed is slow
Processing speed is fast

 

No comments:

Post a Comment