The key functions of IS are
- Interpretation of the workflow and mapping metadata from the repository.
- Execution of the instructions in the metadata
- Manages the data from source system to target system within the memory and disk
The main three components of Integration Service which enable data movement are,
- Integration Service Process
- Load Balancer
- Data Transformation Manager
6.1 Integration Service Process (ISP)
The Integration Service starts one or
more Integration Service processes to run and monitor workflows. When we
run a workflow, the ISP starts and locks the workflow, runs the
workflow tasks, and starts the process to run sessions. The functions of
the Integration Service Process are,
- Locks and reads the workflow
- Manages workflow scheduling, ie, maintains session dependency
- Reads the workflow parameter file
- Creates the workflow log
- Runs workflow tasks and evaluates the conditional links
- Starts the DTM process to run the session
- Writes historical run information to the repository
- Sends post-session emails
6.2 Load Balancer
The Load Balancer dispatches tasks to
achieve optimal performance. It dispatches tasks to a single node or
across the nodes in a grid after performing a sequence of steps. Before
understanding these steps we have to know about Resources, Resource
Provision Thresholds, Dispatch mode and Service levels
- Resources – we can configure the Integration
Service to check the resources available on each node and match them
with the resources required to run the task. For example, if a session
uses an SAP source, the Load Balancer dispatches the session only to
nodes where the SAP client is installed
- Three Resource Provision Thresholds, The maximum
number of runnable threads waiting for CPU resources on the node called
Maximum CPU Run Queue Length. The maximum percentage of virtual memory
allocated on the node relative to the total physical memory size called
Maximum Memory %. The maximum number of running Session and Command
tasks allowed for each Integration Service process running on the node
called Maximum Processes
- Three Dispatch mode’s – Round-Robin: The Load
Balancer dispatches tasks to available nodes in a round-robin fashion
after checking the “Maximum Process” threshold. Metric-based: Checks all
the three resource provision thresholds and dispatches tasks in round
robin fashion. Adaptive: Checks all the three resource provision
thresholds and also ranks nodes according to current CPU availability
- Service Levels establishes priority among tasks
that are waiting to be dispatched, the three components of service
levels are Name, Dispatch Priority and Maximum dispatch wait time.
“Maximum dispatch wait time” is the amount of time a task can wait in
queue and this ensures no task waits forever
A .Dispatching Tasks on a node
- The Load Balancer checks different resource provision thresholds on
the node depending on the Dispatch mode set. If dispatching the task
causes any threshold to be exceeded, the Load Balancer places the task
in the dispatch queue, and it dispatches the task later
- The Load Balancer dispatches all tasks to the node that runs the master Integration Service process
B. Dispatching Tasks on a grid,
- The Load Balancer verifies which nodes are currently running and enabled
- The Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow
- The Load Balancer verifies that the resource provision thresholds on
each candidate node are not exceeded. If dispatching the task causes a
threshold to be exceeded, the Load Balancer places the task in the
dispatch queue, and it dispatches the task later
- The Load Balancer selects a node based on the dispatch mode
6.3 Data Transformation Manager (DTM) Process
When the workflow reaches a session, the
Integration Service Process starts the DTM process. The DTM is the
process associated with the session task. The DTM process performs the
following tasks:
- Retrieves and validates session information from the repository.
- Validates source and target code pages.
- Verifies connection object permissions.
- Performs pushdown optimization when the session is configured for pushdown optimization.
- Adds partitions to the session when the session is configured for dynamic partitioning.
- Expands the service process variables, session parameters, and mapping variables and parameters.
- Creates the session log.
- Runs pre-session shell commands, stored procedures, and SQL.
- Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
- Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data
- Runs post-session stored procedures, SQL, and shell commands and sends post-session email
- After the session is complete, reports execution result to ISP
Pictorial Representation of Workflow execution:
- A PowerCenter Client request IS to start workflow
- IS starts ISP
- ISP consults LB to select node
- ISP starts DTM in node selected by LB