Informatica Way

Tuesday, 25 January 2011

Informatica Pushdown Optimization

What is Pushdown Optimization and things to consider

The process of pushing transformation logic to the source or target database by Informatica Integration service is known as Pushdown Optimization. When a session is configured to run for Pushdown Optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Source or Target Database executes the SQL queries to process the transformations.

How does Pushdown Optimization (PO) Works?

The Integration Service generates SQL statements when native database driver is used. In case of ODBC drivers, the Integration Service cannot detect the database type and generates ANSI SQL. The Integration Service can usually push more transformation logic to a database if a native driver is used, instead of an ODBC driver.

For any SQL Override, Integration service creates a view (PM_*) in the database while executing the session task and drops the view after the task gets complete. Similarly it also create sequences (PM_*) in the database.

Database schema (SQ Connection, LKP connection), should have the Create View / Create Sequence Privilege, else the session will fail.

Few Benefits in using PO

There is no memory or disk space required to manage the cache in the Informatica server for Aggregator, Lookup, Sorter and Joiner Transformation, as the transformation logic is pushed to database.
SQL Generated by Informatica Integration service can be viewed before running the session through Optimizer viewer, making easier to debug.
When inserting into Targets, Integration Service do row by row processing using bind variable (only soft parse – only processing time, no parsing time). But In case of Pushdown Optimization, the statement will be executed once.

Without Using Pushdown optimization:

INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL,

PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT,

MANAGER_ID,MANAGER_NAME,

DEPARTMENT_ID) VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13) –executes 7012352 times

With Using Pushdown optimization

INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, MANAGER_NAME, DEPARTMENT_ID) SELECT CAST(PM_SJEAIJTJRNWT45X3OO5ZZLJYJRY.NEXTVAL AS NUMBER(15, 2)), EMPLOYEES_SRC.EMPLOYEE_ID, EMPLOYEES_SRC.FIRST_NAME, EMPLOYEES_SRC.LAST_NAME, CAST((EMPLOYEES_SRC.EMAIL || ‘@gmail.com’) AS VARCHAR2(25)), EMPLOYEES_SRC.PHONE_NUMBER, CAST(EMPLOYEES_SRC.HIRE_DATE AS date), EMPLOYEES_SRC.JOB_ID, EMPLOYEES_SRC.SALARY, EMPLOYEES_SRC.COMMISSION_PCT, EMPLOYEES_SRC.MANAGER_ID, NULL, EMPLOYEES_SRC.DEPARTMENT_ID FROM (EMPLOYEES_SRC LEFT OUTER JOIN EMPLOYEES PM_Alkp_emp_mgr_1 ON (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID)) WHERE ((EMPLOYEES_SRC.MANAGER_ID = (SELECT PM_Alkp_emp_mgr_1.EMPLOYEE_ID FROM EMPLOYEES PM_Alkp_emp_mgr_1 WHERE (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID))) OR (0=0)) –executes 1 time

Things to note when using PO

There are cases where the Integration Service and Pushdown Optimization can produce different result sets for the same transformation logic. This can happen during data type conversion, handling null values, case sensitivity, sequence generation, and sorting of data.

The database and Integration Service produce different output when the following settings and conversions are different:

Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

To know more about Pushdown Optimization

Monday, 3 January 2011

Informatica Performance Improvement Tips

We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.

Use Source Qualifier if the Source tables reside in the same schema
Make use of Source Qualifer “Filter” Properties if the Source type is Relational.
If the subsequent sessions are doing lookup on the same table, use persistent cache in the first session. Data remains in the Cache and available for the subsequent session for usage.
Use flags as integer, as the integer comparison is faster than the string comparison.
Use tables with lesser number of records as master table for joins.
While reading from Flat files, define the appropriate data type instead of reading as String and converting.
Have all Ports that are required connected to Subsequent Transformations else check whether we can remove these ports
Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations
Minimize the number of Update strategies.
Group by simple columns in transformations like Aggregate, Source Qualifier
Use Router transformation in place of multiple Filter transformations.
Turn off the Verbose Logging while moving the mappings to UAT/Production environment.
For large volume of data drop index before loading and recreate indexes after load.
For large of volume of records Use Bulk load Increase the commit interval to a higher value large volume of data
Set ‘Commit on Target’ in the sessions

To know more about Informatica

Thursday, 23 December 2010

Leveraging Metadata in Informatica Workflow-Session/Analysis

We can leverage the metadata collected in the Informatica repository for many interesting analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.

This SQL Queries can be executed in Oracle database with no changes and requires little modification with other databases.

Failed Sessions

The following query lists the failed sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

SESSION_NAME,

LAST_ERROR AS ERROR_MESSAGE,

DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS,

ACTUAL_START AS START_TIME,

SESSION_TIMESTAMP

FROM REP_SESS_LOG

WHERE RUN_STATUS_CODE != 1

AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)

RESULT:

Long running Sessions

The following query lists long running sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

SESSION_NAME,

SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS,

SUCCESSFUL_ROWS AS TARGET_ROWS,

ACTUAL_START AS START_TIME,

SESSION_TIMESTAMP

FROM REP_SESS_LOG

WHERE RUN_STATUS_CODE = 1

AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)

AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60))

ORDER BY SESSION_TIMESTAMP

RESULT:

Invalid Tasks

The following query lists folder names and task name, version number, and last saved for all invalid tasks.

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE,

TASK_NAME AS OBJECT_NAME,

VERSION_NUMBER,

LAST_SAVED

FROM REP_ALL_TASKS

WHERE IS_VALID=0

AND IS_ENABLED=1

ORDER BY SUBJECT_AREA,TASK_NAME

RESULT:

Thanks for reading, do you have other scenarios where Workflow Metadata has been effective …wish you a very happy new year 2011.

To know more about Informatica Workflow

Tuesday, 26 October 2010

Impact Analysis on Source & Target Definition Changes

Changes to Source and Target definition will impact the current state of the Informatica mapping and this article list the possible changes at Source and the Target with impact.

Updating Source Definitions:When we update a source definition, the Designer propagates the changes to all mappings using that source. Some changes to source definitions can invalidate mappings.
Below table describes how the mappings get impacted when the source definition is edited:

Modification	Result of the source after modifying the source definition
Add a column.	Mappings are not invalidated.
Change a column Data type.	Mappings may be invalidated. If the column is connected to an input port that uses a Data type incompatible with the new one, the mapping is invalidated.
Change a column name.	Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.	Mappings can be invalidated if the mapping uses values from the deleted column.

Adding a new column in the existing source definition:

When we add a new column to a source in the Source Analyzer, all mappings using the source definition remain valid.
However, when we add a new column and change some of its properties, the Designer invalidates mappings using the source definition.
We can change the following properties for a newly added source column without invalidating a mapping: 1. Name
2. Data type
3. Format
4. Usage
5. Redefines
6. Occurs
7. Key type

If the changes invalidate the mapping, we must open and edit the mapping. Then click Repository > Save to save the changes to the repository. If the invalidated mapping is used in a session, we must validate the session.
Updating Target Definitions:
When we change a target definition, the Designer propagates the changes to any mapping using that target. Some changes to target definitions can invalidate mappings.
The following table describes how the mappings get impacted when we edit target definitions:

Modification	Result of the source after modifying the target definition
Add a column.	Mapping not invalidated.
Change a column Data type.	Mapping may be invalidated. If the column is connected to an input port that uses a Data type that is incompatible with the new one (for example, Decimal to Date), the mapping is invalid.
Change a column name.	Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.	Mapping may be invalidated if the mapping uses values from the deleted column.
Change the target definition type.	Mapping not invalidated.

Adding a new column in the existing target definition:

When we add a new column to a target in the Target Designer, all mappings using the target definition remain valid.
However, when you add a new column and change some of its properties, the Designer invalidates mappings using the target definition.
We can change the following properties for a newly added target column without invalidating a mapping:

1. Name
2. Data type
3. Format
If the changes invalidate the mapping, validate the mapping and any session using the mapping. We can validate objects from the Query Results or View Dependencies window or from the Repository Navigator. We can validate multiple objects from these locations without opening them in the workspace. If we cannot validate the mapping or session from one of these locations, open the object in the workspace and edit it.

Re-importing a Relational Target Definition:
If a target table changes, such as when we change a column data type, we can edit the definition or we can re-import the target definition. When we re-import the target, we can either replace the existing target definition or rename the new target definition to avoid a naming conflict with the existing target definition.

To re-import a target definition:

In the Target Designer, follow the same steps to import the target definition, and select the Target to import. The Designer notifies us that a target definition with that name already exists in the repository. If we have multiple tables to import and replace, select apply to All Tables.
Click Rename, Replace, Skip, or Compare.
If we click Rename, enter the name of the target definition and click OK.
If we have a relational target definition and click Replace, specify whether we want to retain primary key-foreign key information and target descriptions

The following table describes the options available in the Table Exists dialog box when re-importing and replacing a relational target definition:

Option	Description
Apply to all Tables	Select this option to apply rename, replaces, or skips all tables in the folder.
Retain User-Defined PK-FK Relationships	Select this option to keep the primary key-foreign key relationships in the target definition being replaced. This option is disabled when the target definition is non-relational.
Retain User-Defined Descriptions	Select this option to retain the target description and column and port descriptions of the target definition being replaced.

To know more about Target Definition

Thursday, 14 October 2010

Output Files in Informatica

The Integration Service process generates output files when we run workflows and sessions. By default, the Integration Service logs status and error messages to log event files.

Log event files are binary files that the Log Manager uses to display log events. When we run each session, the Integration Service also creates a reject file. Depending on transformation cache settings and target types, the Integration Service may create additional files as well.

The Integration Service creates the following output files:

Output Files

Session Details/logs:

When we run a session, the Integration service creates session log file with the load statistics/table names/Error information/threads created etc based on the tracing level that have set in the session properties.
We can monitor session details in the session run properties while session running/failed/succeeded.

Workflow Log:

Workflow log is available in Workflow Monitor.
The Integration Service process creates a workflow log for each workflow it runs.
It writes information in the workflow log such as
- Initialization of processes,
- Workflow task run information,
- Errors encountered and
- Workflows run summary.
The Integration Service can also be configured to suppress writing messages to the workflow log file.
As with Integration Service logs and session logs, the Integration Service process enters a code number into the workflow log file message along with message text.

Performance Detail File:

The Integration Service process generates performance details for session runs.
Through the performance details file we can determine where session performance can be improved.
Performance details provide transformation-by-transformation information on the flow of data through the session.

Reject Files:

By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets.
The writer may reject a row in the following circumstances:
- It is flagged for reject by an Update Strategy or Custom transformation.
- It violates a database constraint such as primary key constraint
- A field in the row was truncated or overflowed
- The target database is configured to reject truncated or overflowed data.

Note: By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. We can view this file name in session level.

Open Session – Select any of the target View the options
- Reject File directory.
- Reject file name.
If you enable row error logging, the Integration Service process does not create a reject file.

Row Error Logs:

When we configure a session, we can choose to log row errors in a central location.
When a row error occurs, the Integration Service process logs error information that allows to determine the cause and source of the error.
The Integration Service process logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information.
we enable flat file logging, by default, the Integration Service process saves the file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager.

Recovery Tables Files:

The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery.
When you run a session in recovery mode, the Integration Service process uses information in the recovery tables to complete the session.
When the Integration Service process performs recovery, it restores the state of operations to recover the workflow from the point of interruption.
The workflow state of operations includes information such as active service requests, completed and running status, workflow variable values, running workflows and sessions, and workflow schedules.

Control File:

When we run a session that uses an external loader, the Integration Service process creates a control file and a target flat file.
The control file contains information about the target flat file such as data format and loading instructions for the external loader.
The control file has an extension of .ctl. The Integration Service process creates the control file and the target flat file in the Integration Service variable directory, $PMTargetFileDir, by default.

Email:

We can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer and the Email task can be placed in a workflow, or can be associated it with a session.
The Email task allows to automatically communicate information about a workflow or session run to designated recipients.
Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, we can create two different messages, one to be sent if the session completes successfully, the other if the session fails.
We can also use variables to generate information about the session name, status, and total rows loaded.

Indicator File:

If we use a flat file as a target, we can configure the Integration Service to create an indicator file for target row type information.
For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject.
The Integration Service process names this file target_name.ind and stores it in the Integration Service variable directory, $PMTargetFileDir, by default.

Target or Output File:

If the session writes to a target file, the Integration Service process creates the target file based on a file target definition.
By default, the Integration Service process names the target file based on the target definition name.
If a mapping contains multiple instances of the same target, the Integration Service process names the target files based on the target instance name.
The Integration Service process creates this file in the Integration Service variable directory, $PMTargetFileDir, by default.

Cache Files:

When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects:
- Aggregator transformation
- Joiner transformation
- Rank transformation
- Lookup transformation
- Sorter transformation
- XML target
By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations and XML targets in the directory configured for the $PMCacheDir service process variable.

Informatica Way

Ads 468x60px

Pages

Labels

Blog Archive

Labels

Blogroll

About

Blogger templates

Blogger news

Tuesday, 25 January 2011

Informatica Pushdown Optimization

What is Pushdown Optimization and things to consider

How does Pushdown Optimization (PO) Works?

Few Benefits in using PO

Things to note when using PO

Monday, 3 January 2011

Informatica Performance Improvement Tips

Thursday, 23 December 2010

Leveraging Metadata in Informatica Workflow-Session/Analysis

Tuesday, 26 October 2010

Impact Analysis on Source & Target Definition Changes

Thursday, 14 October 2010

Output Files in Informatica

My Favourite Links

ERP- Oracle

Popular Posts

Mamta @ Twitter

Blog Archive

Labels

About Me