Monday, May 11, 2015

Datastage - Node and APT configuration file



Node – Logical Processing unit – Represent resources. This is helpful in Load balance. Optimal number of nodes can be chosen

·        A Node is a logical processing unit. Each node in a configuration file is distinguished by a virtual name and defines a number and speed of CPUs, memory availability, page and swap space, network connectivity details, etc.

·        Node information is stored in APT configuration file.

 

Server Job  à single node ( ex: Single Lane highway)

Parallel Job à Based on the number of nodes , the data will be passed on the nodes (ex: Multi lane Highway) . This is called parallelism

 

APT Configuration file

                                It denotes about the degree of parallelism.

 

4 things to note

main_program: APT configuration file: /opt/IBM/InformationServer/Server/Configurations/default.apt

{

                node "node1"

                {

                                fastname "xxxx"  à Physical node name

                                pools "" à In some cases this will be represent for specific functionality – For ex: sort

                                resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""} -à Physical storage . All the datasets will be created here

                                resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""} à Temporary location for processing

                }

                node "node2"

                {

                                fastname "xxxx"

                                pools ""

                                resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}

                                resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}

                }

}

 

Example:

                node "node2"

                {

                                fastname "xxxx"

                                pools "" “sort” à This indicates this node will be exclusively used for sort operation

                                resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools ""}

                                resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch" {pools ""}

                }

 

How datastage decides on which processing node a stage should be run?

1. If a job or stage is not constrained to run on specific nodes then parallel engine executes a parallel stage on all nodes defined in the default node pool. (Default Behavior)

2. If the node is constrained then the constrained processing nodes are chosen while executing the parallel stage.

 
 

No comments:

Post a Comment