It is all about data !!! Data Science, Data analytics,Data warehouse : Datastage

Node – Logical Processing unit – Represent resources. This is helpful in Load balance. Optimal number of nodes can be chosen

· A Node is a logical processing unit. Each node in a configuration file is distinguished by a virtual name and defines a number and speed of CPUs, memory availability, page and swap space, network connectivity details, etc.

· Node information is stored in APT configuration file.

Server Job à single node ( ex: Single Lane highway)

Parallel Job à Based on the number of nodes , the data will be passed on the nodes (ex: Multi lane Highway) . This is called parallelism

APT Configuration file

It denotes about the degree of parallelism.

4 things to note

main_program: APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt

{

node "node1"

{

fastname "xxxx" à Physical node name

pools "" à In some cases this will be represent for specific functionality – For ex: sort

resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""} -à Physical storage . All the datasets will be created here

resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""} à Temporary location for processing

}

node "node2"

{

fastname "xxxx"

pools ""

resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}

resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}

}

Example:

node "node2"

{

fastname "xxxx"

pools "" “sort” à This indicates this node will be exclusively used for sort operation

resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}

resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}

}

How datastage decides on which processing node a stage should be run?

1. If a job or stage is not constrained to run on specific nodes then parallel engine executes a parallel stage on all nodes defined in the default node pool. (Default Behavior)

2. If the node is constrained then the constrained processing nodes are chosen while executing the parallel stage.

It is all about data !!! Data Science, Data analytics,Data warehouse

Monday, May 11, 2015

Datastage - Node and APT configuration file

No comments:

Post a Comment

List of topics