Node – Logical Processing unit – Represent resources.
This is helpful in Load balance. Optimal number of nodes can be chosen
·
A Node is a
logical processing unit. Each node in a configuration file is distinguished by
a virtual name and defines a number and speed of CPUs, memory availability,
page and swap space, network connectivity details, etc.
·
Node information
is stored in APT configuration file.
Server Job à single node
( ex: Single Lane highway)
Parallel Job à Based on the
number of nodes , the data will be passed on the nodes (ex: Multi lane Highway)
. This is called parallelism
APT Configuration file
It denotes about the degree of parallelism.
4 things to note
main_program: APT configuration file:
/opt/IBM/InformationServer/Server/Configurations/default.apt
{
node "node1"
{
fastname "xxxx" à Physical node name
pools "" à
In some cases this will be represent for specific functionality – For ex: sort
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools
""} -à
Physical storage . All the datasets will be created here
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch"
{pools ""} à
Temporary location for processing
}
node "node2"
{
fastname "xxxx"
pools ""
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools
""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch"
{pools ""}
}
}
Example:
node "node2"
{
fastname "xxxx"
pools "" “sort” à This indicates this node will be
exclusively used for sort operation
resource disk "/opt/IBM/InformationServer/Server/Datasets" {pools
""}
resource scratchdisk "/opt/IBM/InformationServer/Server/Scratch"
{pools ""}
}
How datastage decides on which
processing node a stage should be run?
1. If a job or
stage is not constrained to run on specific nodes then parallel engine executes
a parallel stage on all nodes defined in the default node pool. (Default
Behavior)
2. If the node
is constrained then the constrained processing nodes are chosen while executing
the parallel stage.
No comments:
Post a Comment