Informatica Way

Subscribe:

Wednesday 22 April 2009

Informatica and Oracle hints in SQL overrides

HINTS used in a SQL statement helps in sending instructions to the Oracle optimizer which would quicken the query processing time involved. Can we make use of these hints in SQL overrides within our Informatica mappings so as to improve a query performance?

On a general note any Informatica help material would suggest: you can enter any valid SQL statement supported by the source database in a SQL override of a Source qualifier or a Lookup transformation or at the session properties level.

While using them as part of Source Qualifier has no complications, using them in a Lookup SQL override gets a bit tricky. Use of forward slash followed by an asterix (“/*”) in lookup SQL Override [generally used for commenting purpose in SQL and at times as Oracle hints.] would result in session failure with an error like:

TE_7017 : Failed to Initialize Server Transformation lkp_transaction

2009-02-19 12:00:56 : DEBUG : (18785 | MAPPING) : (IS | Integration_Service_xxxx) : node01_UAT-xxxx : DBG_21263 : Invalid lookup override

SELECT SALES. SALESSEQ as SalesId, SALES.OrderID as ORDERID, SALES.OrderDATE as ORDERDATE FROM SALES, AC_SALES WHERE AC_SALES. OrderSeq >= (Select /*+ FULL(AC_Sales) PARALLEL(AC_Sales,12) */ min(OrderSeq) From AC_Sales)

This is because Informatica’s parser fails to recognize this special character when used in a Lookup override. There has been a parameter made available starting with PowerCenter 7.1.3 release, which enables the use of forward slash or hints.

§ Infa 7.x

1. Using a text editor open the PowerCenter server configuration file (pmserver.cfg).

2. Add the following entry at the end of the file:

LookupOverrideParsingSetting=1

3. Re-start the PowerCenter server (pmserver).

§ Infa 8.x

1. Connect to the Administration Console.

2. Stop the Integration Service.

3. Select the Integration Service.

4. Under the Properties tab, click Edit in the Custom Properties section.

5. Under Name enter LookupOverrideParsingSetting

6. Under Value enter 1.

7. Click OK.

8. And start the Integration Service.

§ Starting with PowerCenter 8.5, this change could be done at the session task itself as follows:

1. Edit the session.

2. Select Config Object tab.

3. Under Custom Properties add the attribute LookupOverrideParsingSetting and set the Value to 1.

4. Save the session.

Thanks for reading this blog.To know more about Informatica

Thursday 19 March 2009

Informatica PowerCenter 8x Key Concepts – 6

6. Integration Service (IS)

The key functions of IS are

Interpretation of the workflow and mapping metadata from the repository.
Execution of the instructions in the metadata
Manages the data from source system to target system within the memory and disk

The main three components of Integration Service which enable data movement are,

Integration Service Process
Load Balancer
Data Transformation Manager

6.1 Integration Service Process (ISP)

The Integration Service starts one or more Integration Service processes to run and monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. The functions of the Integration Service Process are,

Locks and reads the workflow
Manages workflow scheduling, ie, maintains session dependency
Reads the workflow parameter file
Creates the workflow log
Runs workflow tasks and evaluates the conditional links
Starts the DTM process to run the session
Writes historical run information to the repository
Sends post-session emails

6.2 Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks to a single node or across the nodes in a grid after performing a sequence of steps. Before understanding these steps we have to know about Resources, Resource Provision Thresholds, Dispatch mode and Service levels

Resources – we can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. For example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed
Three Resource Provision Thresholds, The maximum number of runnable threads waiting for CPU resources on the node called Maximum CPU Run Queue Length. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size called Maximum Memory %. The maximum number of running Session and Command tasks allowed for each Integration Service process running on the node called Maximum Processes
Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to available nodes in a round-robin fashion after checking the “Maximum Process” threshold. Metric-based: Checks all the three resource provision thresholds and dispatches tasks in round robin fashion. Adaptive: Checks all the three resource provision thresholds and also ranks nodes according to current CPU availability
Service Levels establishes priority among tasks that are waiting to be dispatched, the three components of service levels are Name, Dispatch Priority and Maximum dispatch wait time. “Maximum dispatch wait time” is the amount of time a task can wait in queue and this ensures no task waits forever

A .Dispatching Tasks on a node

The Load Balancer checks different resource provision thresholds on the node depending on the Dispatch mode set. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
The Load Balancer dispatches all tasks to the node that runs the master Integration Service process

B. Dispatching Tasks on a grid,

The Load Balancer verifies which nodes are currently running and enabled
The Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow
The Load Balancer verifies that the resource provision thresholds on each candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
The Load Balancer selects a node based on the dispatch mode

6.3 Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:

Retrieves and validates session information from the repository.
Validates source and target code pages.
Verifies connection object permissions.
Performs pushdown optimization when the session is configured for pushdown optimization.
Adds partitions to the session when the session is configured for dynamic partitioning.
Expands the service process variables, session parameters, and mapping variables and parameters.
Creates the session log.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data
Runs post-session stored procedures, SQL, and shell commands and sends post-session email
After the session is complete, reports execution result to ISP

Pictorial Representation of Workflow execution:

A PowerCenter Client request IS to start workflow
IS starts ISP
ISP consults LB to select node
ISP starts DTM in node selected by LB

Thanks for reading this blog.To know more about Informatica PowerCenter 8x

Friday 16 January 2009

Informatica PowerCenter 8x Key Concepts – 5

5. Repository Service

As we already discussed about metadata repository, now we discuss a separate,multi-threaded process that retrieves, inserts and updates metadata in the repository database tables, it is Repository Service.
Repository service manages connections to the PowerCenter repository from PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository manager, console and integration service. Repository service is responsible for ensuring the consistency of metdata in the repository.

Creation & Properties:

Use the PowerCenter Administration Console Navigator window to create a Repository Service. The properties needed to create are,

Service Name – name of the service like rep_SalesPerformanceDev

Location – Domain and folder where the service is created

License – license service name

Node, Primary Node & Backup Nodes – Node on which the service process runs

CodePage – The Repository Service uses the character set encoded in the repository code page when writing data to the repository

Database type & details – Type of database, username, pwd, connect string and tablespacename

The above properties are sufficient to create a repository service, however we can take a look at following features which are important for better performance and maintenance.

General Properties

> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform administrative tasks like enabling version control or promoting local to global repository

> EnableVersionControl: Creates a versioned repository

Node Assignments: “High availability option” is licensed feature which allows us to choose Primary & Backup nodes for continuous running of the repository service. Under normal licenses would see only only Node to select from

Database Properties

> DatabaseArrayOperationSize: Number of rows to fetch each time an array database operation is issued, such as insert or fetch. Default is 100

> DatabasePoolSize:Maximum number of connections to the repository database that the Repository Service can establish. If the Repository Service tries to establish more connections than specified for DatabasePoolSize, it times out the connection attempt after the number of seconds specified for DatabaseConnectionTimeout

Advanced Properties

> CommentsRequiredFor Checkin: Requires users to add comments when checking in repository objects.

> Error Severity Level: Level of error messages written to the Repository Service log. Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug

> EnableRepAgentCaching:Enables repository agent caching. Repository agent caching provides optimal performance of the repository when you run workflows. When you enable repository agent caching, the Repository Service process caches metadata requested by the Integration Service. Default is Yes.

> RACacheCapacity:Number of objects that the cache can contain when repository agent caching is enabled. You can increase the number of objects if there is available memory on the machine running the Repository Service process. The value must be between 100 and 10,000,000,000. Default is 10,000

> AllowWritesWithRACaching: Allows you to modify metadata in the repository when repository agent caching is enabled. When you allow writes, the Repository Service process flushes the cache each time you save metadata through the PowerCenter Client tools. You might want to disable writes to improve performance in a production environment where the Integration Service makes all changes to repository metadata. Default is Yes.

Environment Variables

The database client code page on a node is usually controlled by an environment variable. For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All Integration Services and Repository Services that run on this node use the same environment variable. You can configure a Repository Service process to use a different value for the database client code page environment variable than the value set for the node.

You might want to configure the code page environment variable for a Repository Service process when the Repository Service process requires a different database client code page than the Integration Service process running on the same node.

For example, the Integration Service reads from and writes to databases using the UTF-8 code page. The Integration Service requires that the code page environment variable be set to UTF-8. However, you have a Shift-JIS repository that requires that the code page environment variable be set to Shift-JIS. Set the environment variable on the node to UTF-8. Then add the environment variable to the Repository Service process properties and set the value to Shift-JIS.

Read More about Informatica PowerCenter 8x

Tuesday 9 December 2008

Informatica PowerCenter 8x Key Concepts – 4

owerCenter Client (contd)

Workflow Manager : In the Workflow Manager, we define a set of instructions called a workflow to execute mappings we build in the Designer. Generally, a workflow contains a session and any other task we may want to perform when we run a session. Tasks can include a session, email notification, or scheduling information.

A set of tasks grouped together becomes worklet. After we create a workflow, we run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow Manager has following three window panes,Task Developer, Create tasks we want to accomplish in the workflow. Worklet Designer, Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can nest worklets inside a workflow. Workflow Designer, Create a workflow by connecting tasks with links in the Workflow Designer. We can also create tasks in the Workflow Designer as you develop the workflow. The ODBC connection details are defined in Workflow Manager “Connections “ Menu .

Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor. We can view details about a workflow or task in Gantt Chart view or Task view. We can run, stop, abort, and resume workflows from the Workflow Monitor. We can view sessions and workflow log events in the Workflow Monitor Log Viewer.

The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic information.

The Workflow Monitor consists of the following windows:

Navigator window – Displays monitored repositories, servers, and repositories objects.

Output window – Displays messages from the Integration Service and Repository Service.

Time window – Displays progress of workflow runs.

Gantt chart view – Displays details about workflow runs in chronological format.

Task view – Displays details about workflow runs in a report format.

Repository Manager

We can navigate through multiple folders and repositories and perform basic repository tasks with the Repository Manager. We use the Repository Manager to complete the following tasks:

2. Add and connect to a repository, we can add repositories to the Navigator window and client registry and then connect to the repositories.

3. Work with PowerCenter domain and repository connections, we can edit or remove domain connection information. We can connect to one repository or multiple repositories. We can export repository connection information from the client registry to a file. We can import the file on a different machine and add the repository connection information to the client registry.

4. Change your password. We can change the password for our user account.

5. Search for repository objects or keywords. We can search for repository objects containing specified text. If we add keywords to target definitions, use a keyword to search for a target definition.

6. View objects dependencies. Before we remove or change an object, we can view dependencies to see the impact on other objects.

7. Compare repository objects. In the Repository Manager, wecan compare two repository objects of the same type to identify differences between the objects.

8. Truncate session and workflow log entries. we can truncate the list of session and workflow logs that the Integration Service writes to the repository. we can truncate all logs, or truncate all logs older than a specified date.

Read More about Informatica PowerCenter 8x

Monday 24 November 2008

Business Intelligence Challenge – Product Upgrades & Migrations, Moving the Code – 4

Last time we discussed about Impact Assessment , the next logical step after this is to perform the actual upgrade or migration of the code.

Moving the Code: Performing Upgrade or Migration of the Objects

When we talk about product upgrades, always the product vendor provides tools by which the objects in the earlier version can be upgraded to the latest version. Yes we would see some objects failing through while using such tools; these are the ones that would need rework after the upgrade process.

When we talk about product migration like moving from Cognos to Business Objects or Business Objects to Cognos, there is good scope for us to look for some ways to automate the code migration. Earlier discussions have been on how to leverage the metadata for understanding the environment, now we are looking at an option on how to manipulate or transform the metadata so that an object in platform ‘A’ becomes compliant to platform ‘B’.

Steps involved in building an automated product migration process

Perform metadata level object mapping between the two platforms, determine the gaps. This would actually be a ‘by product’ of ‘Step 2’ in Impact Assessment

Build individual components that would

Read the metadata from the source platform and prepare a repository
Have the knowledge of the match & gap between the platforms, could be reference tables
Transform the ‘source’ metadata and write out as understood by the ‘target’ platform by using the reference tables

Benefits of Automated Migration

Helps avoid creation of objects from scratch
Ensures availability of time for testing (core task) than code development
Enables team to have a flexible skillset
A faster way of delivering things when a ‘one to one’ migration from the source platform is seen as a must

Automated Migration Challenges

Transforming the source metadata to the target platform would be a challenge, especially with data manipulation functions. Having a good understanding of the gaps will help; a reference table mapping the functions between the platforms would be useful. In scenarios where a function cannot be converted to the target platform, a comment can be written into a log file enabling quicker attention.

Have seen good success in writing such automated migration components though not 100%. With almost every products providing good SDK kits for reading and as well writing metadata and as well with the support for XML structures, writing such bridges for object migration are getting easier.

Whether the objects in a product are migrated/upgraded in an automated way or not, the following activity of ‘Validation’ plays a key role in ensuring the final quality, next time let us discuss on some of the means for effective validation ….

Read More about Business Intelligence Challenge

Informatica Way

Ads 468x60px

Pages

Labels

Blog Archive

Labels

Blogroll

About

Blogger templates

Blogger news

Wednesday 22 April 2009

Informatica and Oracle hints in SQL overrides

Thursday 19 March 2009

Informatica PowerCenter 8x Key Concepts – 6

6. Integration Service (IS)

6.1 Integration Service Process (ISP)

6.2 Load Balancer

6.3 Data Transformation Manager (DTM) Process

Friday 16 January 2009

Informatica PowerCenter 8x Key Concepts – 5

Environment Variables

Tuesday 9 December 2008

Informatica PowerCenter 8x Key Concepts – 4

Monday 24 November 2008

Business Intelligence Challenge – Product Upgrades & Migrations, Moving the Code – 4

My Favourite Links

ERP- Oracle

Popular Posts

Mamta @ Twitter

Blog Archive

Labels

About Me