Informatica Way: Informatica

Showing posts with label Informatica. Show all posts

Tuesday, 15 March 2011

Configuring Informatica File Transfer Protocol

Informatica File Transfer Protocol can be used to transfer/move files from different environment into our pre-defined Landing Zone. It can also be used to transfer file to the destination folder/directories. The Integration Service can use FTP to access any machine it can connect to, including mainframes.

Configuring FTP in Informatica Workflow

To use FTP file sources and targets in a session,

Create an FTP connection object in the Workflow Manager and configure the connection attributes
Configure the session to use the FTP connection object in the session properties.
Specify the Remote filename in the connection value of the Session properties.

Guidelines

Specify the source or target output directory in the session properties. If not specified, the Integration Service stage the file in the directory where the Integration Service runs on UNIX or in the Windows System directory.

Session cannot run concurrently if the same FTP source file or target file located on a mainframe.

If a workflow containing a session that stages an FTP source or target from a mainframe is aborted, then the same workflow cannot be run until it’s timed out.

Configure an FTP connection to use SSH File Transfer Protocol (SFTP) while connecting to an SFTP server. SFTP enables file transfer over a secure data stream. The Integration Service creates an SSH2 transport layer that enables a secure connection and access to the files on an SFTP server.

To run a session using an FTP connection for an SFTP server that requires public key authentication, the public key and private key files must be accessible on nodes where the session will run.

Configuring Remote Filename

Attribute	Description
Remote Filename	The remote file name for the source or target. Indirect source file name to be entered, in case of indirect source file is sent. Use 7-bit ASCII characters for the file name. The session fails if it encounters a remote file name with Unicode characters. If the path name is provided with the source file name, the Integration Service ignores the path entered in the Default Remote Directory field. The session will fail if the File name with path is provided with single or double quotation marks.
Is Staged	Stages the source or target file on the Integration Service. Default is “Not staged”.
Is Transfer Mode ASCII	Changes the transfer mode. When enabled, the Integration Service uses ASCII transfer mode. - Use ASCII mode when transferring files on Windows machines to ensure that the end of line character is translated properly in text files. When disabled, the Integration Service uses Binary Transfer mode. - Use Binary Transfer mode when transferring files on UNIX machines. Default is disabled.

To know more about Transfer Protocol

Tuesday, 1 March 2011

Informatica – User Defined Functions

Informatica User Defined Functions are similar to Built-in Functions, where these functions need to be created once and execute multiple times. Transformation logics that are common across the ports are the ideal candidate for User Defined Functions.

Transformation Logic implemented without User Defined Functions

Validation “IIF( ISNULL(LTRIM(RTRIM(INPUT))),’TRUE’,’FALSE’)” is being performed in multiple ports.

The disadvantage with this approach is any changes to this validation need to be done in all the ports.

This can be addressed by creating a User Defined function and have the logic incorporated there.

Steps to Create User Defined Functions

Step 1 : Right-click on the User-Defined Functions folder in a repository folder in the Designer.

Click on “New”

Step 2: In Editor add the transformation logic / validation that needs to be performed.

Click ok and validate the UDF.

User Defined Function – Type:

Public if the function is callable from any expression. Private if the function is only callable from another user-defined function.

To Call User Defined functions from Port:

To know more about Informatica

Tuesday, 25 January 2011

Informatica Pushdown Optimization

What is Pushdown Optimization and things to consider

The process of pushing transformation logic to the source or target database by Informatica Integration service is known as Pushdown Optimization. When a session is configured to run for Pushdown Optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Source or Target Database executes the SQL queries to process the transformations.

How does Pushdown Optimization (PO) Works?

The Integration Service generates SQL statements when native database driver is used. In case of ODBC drivers, the Integration Service cannot detect the database type and generates ANSI SQL. The Integration Service can usually push more transformation logic to a database if a native driver is used, instead of an ODBC driver.

For any SQL Override, Integration service creates a view (PM_*) in the database while executing the session task and drops the view after the task gets complete. Similarly it also create sequences (PM_*) in the database.

Database schema (SQ Connection, LKP connection), should have the Create View / Create Sequence Privilege, else the session will fail.

Few Benefits in using PO

There is no memory or disk space required to manage the cache in the Informatica server for Aggregator, Lookup, Sorter and Joiner Transformation, as the transformation logic is pushed to database.
SQL Generated by Informatica Integration service can be viewed before running the session through Optimizer viewer, making easier to debug.
When inserting into Targets, Integration Service do row by row processing using bind variable (only soft parse – only processing time, no parsing time). But In case of Pushdown Optimization, the statement will be executed once.

Without Using Pushdown optimization:

INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL,

PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT,

MANAGER_ID,MANAGER_NAME,

DEPARTMENT_ID) VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13) –executes 7012352 times

With Using Pushdown optimization

INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, MANAGER_NAME, DEPARTMENT_ID) SELECT CAST(PM_SJEAIJTJRNWT45X3OO5ZZLJYJRY.NEXTVAL AS NUMBER(15, 2)), EMPLOYEES_SRC.EMPLOYEE_ID, EMPLOYEES_SRC.FIRST_NAME, EMPLOYEES_SRC.LAST_NAME, CAST((EMPLOYEES_SRC.EMAIL || ‘@gmail.com’) AS VARCHAR2(25)), EMPLOYEES_SRC.PHONE_NUMBER, CAST(EMPLOYEES_SRC.HIRE_DATE AS date), EMPLOYEES_SRC.JOB_ID, EMPLOYEES_SRC.SALARY, EMPLOYEES_SRC.COMMISSION_PCT, EMPLOYEES_SRC.MANAGER_ID, NULL, EMPLOYEES_SRC.DEPARTMENT_ID FROM (EMPLOYEES_SRC LEFT OUTER JOIN EMPLOYEES PM_Alkp_emp_mgr_1 ON (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID)) WHERE ((EMPLOYEES_SRC.MANAGER_ID = (SELECT PM_Alkp_emp_mgr_1.EMPLOYEE_ID FROM EMPLOYEES PM_Alkp_emp_mgr_1 WHERE (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID))) OR (0=0)) –executes 1 time

Things to note when using PO

There are cases where the Integration Service and Pushdown Optimization can produce different result sets for the same transformation logic. This can happen during data type conversion, handling null values, case sensitivity, sequence generation, and sorting of data.

The database and Integration Service produce different output when the following settings and conversions are different:

Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

To know more about Pushdown Optimization

Monday, 3 January 2011

Informatica Performance Improvement Tips

We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.

Use Source Qualifier if the Source tables reside in the same schema
Make use of Source Qualifer “Filter” Properties if the Source type is Relational.
If the subsequent sessions are doing lookup on the same table, use persistent cache in the first session. Data remains in the Cache and available for the subsequent session for usage.
Use flags as integer, as the integer comparison is faster than the string comparison.
Use tables with lesser number of records as master table for joins.
While reading from Flat files, define the appropriate data type instead of reading as String and converting.
Have all Ports that are required connected to Subsequent Transformations else check whether we can remove these ports
Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations
Minimize the number of Update strategies.
Group by simple columns in transformations like Aggregate, Source Qualifier
Use Router transformation in place of multiple Filter transformations.
Turn off the Verbose Logging while moving the mappings to UAT/Production environment.
For large volume of data drop index before loading and recreate indexes after load.
For large of volume of records Use Bulk load Increase the commit interval to a higher value large volume of data
Set ‘Commit on Target’ in the sessions

To know more about Informatica

Tuesday, 26 October 2010

Impact Analysis on Source & Target Definition Changes

Changes to Source and Target definition will impact the current state of the Informatica mapping and this article list the possible changes at Source and the Target with impact.

Updating Source Definitions:When we update a source definition, the Designer propagates the changes to all mappings using that source. Some changes to source definitions can invalidate mappings.
Below table describes how the mappings get impacted when the source definition is edited:

Modification	Result of the source after modifying the source definition
Add a column.	Mappings are not invalidated.
Change a column Data type.	Mappings may be invalidated. If the column is connected to an input port that uses a Data type incompatible with the new one, the mapping is invalidated.
Change a column name.	Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.	Mappings can be invalidated if the mapping uses values from the deleted column.

Adding a new column in the existing source definition:

When we add a new column to a source in the Source Analyzer, all mappings using the source definition remain valid.
However, when we add a new column and change some of its properties, the Designer invalidates mappings using the source definition.
We can change the following properties for a newly added source column without invalidating a mapping: 1. Name
2. Data type
3. Format
4. Usage
5. Redefines
6. Occurs
7. Key type

If the changes invalidate the mapping, we must open and edit the mapping. Then click Repository > Save to save the changes to the repository. If the invalidated mapping is used in a session, we must validate the session.
Updating Target Definitions:
When we change a target definition, the Designer propagates the changes to any mapping using that target. Some changes to target definitions can invalidate mappings.
The following table describes how the mappings get impacted when we edit target definitions:

Modification	Result of the source after modifying the target definition
Add a column.	Mapping not invalidated.
Change a column Data type.	Mapping may be invalidated. If the column is connected to an input port that uses a Data type that is incompatible with the new one (for example, Decimal to Date), the mapping is invalid.
Change a column name.	Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.	Mapping may be invalidated if the mapping uses values from the deleted column.
Change the target definition type.	Mapping not invalidated.

Adding a new column in the existing target definition:

When we add a new column to a target in the Target Designer, all mappings using the target definition remain valid.
However, when you add a new column and change some of its properties, the Designer invalidates mappings using the target definition.
We can change the following properties for a newly added target column without invalidating a mapping:

1. Name
2. Data type
3. Format
If the changes invalidate the mapping, validate the mapping and any session using the mapping. We can validate objects from the Query Results or View Dependencies window or from the Repository Navigator. We can validate multiple objects from these locations without opening them in the workspace. If we cannot validate the mapping or session from one of these locations, open the object in the workspace and edit it.

Re-importing a Relational Target Definition:
If a target table changes, such as when we change a column data type, we can edit the definition or we can re-import the target definition. When we re-import the target, we can either replace the existing target definition or rename the new target definition to avoid a naming conflict with the existing target definition.

To re-import a target definition:

In the Target Designer, follow the same steps to import the target definition, and select the Target to import. The Designer notifies us that a target definition with that name already exists in the repository. If we have multiple tables to import and replace, select apply to All Tables.
Click Rename, Replace, Skip, or Compare.
If we click Rename, enter the name of the target definition and click OK.
If we have a relational target definition and click Replace, specify whether we want to retain primary key-foreign key information and target descriptions

The following table describes the options available in the Table Exists dialog box when re-importing and replacing a relational target definition:

Option	Description
Apply to all Tables	Select this option to apply rename, replaces, or skips all tables in the folder.
Retain User-Defined PK-FK Relationships	Select this option to keep the primary key-foreign key relationships in the target definition being replaced. This option is disabled when the target definition is non-relational.
Retain User-Defined Descriptions	Select this option to retain the target description and column and port descriptions of the target definition being replaced.

To know more about Target Definition

Thursday, 14 October 2010

Output Files in Informatica

The Integration Service process generates output files when we run workflows and sessions. By default, the Integration Service logs status and error messages to log event files.

Log event files are binary files that the Log Manager uses to display log events. When we run each session, the Integration Service also creates a reject file. Depending on transformation cache settings and target types, the Integration Service may create additional files as well.

The Integration Service creates the following output files:

Output Files

Session Details/logs:

When we run a session, the Integration service creates session log file with the load statistics/table names/Error information/threads created etc based on the tracing level that have set in the session properties.
We can monitor session details in the session run properties while session running/failed/succeeded.

Workflow Log:

Workflow log is available in Workflow Monitor.
The Integration Service process creates a workflow log for each workflow it runs.
It writes information in the workflow log such as
- Initialization of processes,
- Workflow task run information,
- Errors encountered and
- Workflows run summary.
The Integration Service can also be configured to suppress writing messages to the workflow log file.
As with Integration Service logs and session logs, the Integration Service process enters a code number into the workflow log file message along with message text.

Performance Detail File:

The Integration Service process generates performance details for session runs.
Through the performance details file we can determine where session performance can be improved.
Performance details provide transformation-by-transformation information on the flow of data through the session.

Reject Files:

By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets.
The writer may reject a row in the following circumstances:
- It is flagged for reject by an Update Strategy or Custom transformation.
- It violates a database constraint such as primary key constraint
- A field in the row was truncated or overflowed
- The target database is configured to reject truncated or overflowed data.

Note: By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. We can view this file name in session level.

Open Session – Select any of the target View the options
- Reject File directory.
- Reject file name.
If you enable row error logging, the Integration Service process does not create a reject file.

Row Error Logs:

When we configure a session, we can choose to log row errors in a central location.
When a row error occurs, the Integration Service process logs error information that allows to determine the cause and source of the error.
The Integration Service process logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information.
we enable flat file logging, by default, the Integration Service process saves the file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager.

Recovery Tables Files:

The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery.
When you run a session in recovery mode, the Integration Service process uses information in the recovery tables to complete the session.
When the Integration Service process performs recovery, it restores the state of operations to recover the workflow from the point of interruption.
The workflow state of operations includes information such as active service requests, completed and running status, workflow variable values, running workflows and sessions, and workflow schedules.

Control File:

When we run a session that uses an external loader, the Integration Service process creates a control file and a target flat file.
The control file contains information about the target flat file such as data format and loading instructions for the external loader.
The control file has an extension of .ctl. The Integration Service process creates the control file and the target flat file in the Integration Service variable directory, $PMTargetFileDir, by default.

Email:

We can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer and the Email task can be placed in a workflow, or can be associated it with a session.
The Email task allows to automatically communicate information about a workflow or session run to designated recipients.
Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, we can create two different messages, one to be sent if the session completes successfully, the other if the session fails.
We can also use variables to generate information about the session name, status, and total rows loaded.

Indicator File:

If we use a flat file as a target, we can configure the Integration Service to create an indicator file for target row type information.
For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject.
The Integration Service process names this file target_name.ind and stores it in the Integration Service variable directory, $PMTargetFileDir, by default.

Target or Output File:

If the session writes to a target file, the Integration Service process creates the target file based on a file target definition.
By default, the Integration Service process names the target file based on the target definition name.
If a mapping contains multiple instances of the same target, the Integration Service process names the target files based on the target instance name.
The Integration Service process creates this file in the Integration Service variable directory, $PMTargetFileDir, by default.

Cache Files:

When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects:
- Aggregator transformation
- Joiner transformation
- Rank transformation
- Lookup transformation
- Sorter transformation
- XML target
By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations and XML targets in the directory configured for the $PMCacheDir service process variable.

Read More about Informatica

Monday, 30 August 2010

Informatica Development Best Practice – Workflow

Workflow Manager default properties can be modified to improve the overall performance and few of them are listed below. This properties can impact the ETL runtime directly and needs to configured based on :

i) Source Database
ii) Target Database
iii) Data Volume

Category

Technique

Session Properties

While loading Staging Tables for FULL LOADS, Truncate target table option should be checked. Based on the Target database and the primary key defined, Integration Service fires TRUNCATE or DELETE statement.Database                  Primary Key Defined                   No Primary KeyDB2                             TRUNCATE                                       TRUNCATE
INFORMIX                 DELETE                                              DELETE
ODBC                         DELETE                                                DELETE
ORACLE                    DELETE UNRECOVERABLE            TRUNCATE
MSSQL                       DELETE                                               TRUNCATE
SYBASE                     TRUNCATE                                        TRUNCATE Workflow Property “Commit interval” (Default value : 10,000) should be increased for increased for Volumes more than 1 million records. Database Rollback Segment size should also be updated, while increasing “Commit Interval”.
Insert/Update/Delete options should be set as determined by the target population method.
Target Option                                   Integration Service
Insert                                                   Uses Target update Option
Update as Update
Update as Insert
Update else Insert
Update as update                             Updates all rows as Update
Update as Insert                               Inserts all rows
Update else Insert                            Updates existing rows else Insert

Partition

Maximum number of partitions for a session should be 1.5 times the number of processes in the Informatica server. i.e. 1.5 X 4 Processors = 6 partitions.

Key Value partitions should be used only when an even Distribution of data can be obtained. In other cases, Pass Through partitions should be used.

A Source filter should be added to evenly distribute the data between Pass through Partitions. Key Value should have ONLY numeric values. MOD(NVL(<Numeric Key Value>,0),# No of Partitions defined) Ex: MOD(NVL(product_sys_no,0),6)

If a session contains “N” partition, increase the DTM Buffer Size to at least “N” times the value for the session with One partition

If the Source or Target database is of MPP( Massively Parallel Processing ), enable Pushdown Optimization. By enabling this, Integration Service will push as much Transformation Logic to Source database or Target database or FULL ( both ) , based on the settings. This property can be ignored for Conventional databases.

To know more about Informatica

Thursday, 12 August 2010

Change Data Capture in Informatica

Change data capture (CDC) is an approach or a technique to identify changes, only changes, in the source. I have seen applications that are built without CDC and later mandate to implement CDC at a higher cost. Building an ETL application without CDC is a costly miss and usually a backtracking step. In this article we can discuss different methods of implementing CDC.

Scenario #01: Change detection using timestamp on source rows
In this typical scenario the source rows have extra two columns say row_created_time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modified

In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime data type
Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime); this steps stores the time at which the Session was started to $$LAST_ETL_RUN_TIME
Use $$LAST_ETL_RUN_TIME in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where last_modified_date > ’01/01/1900 00:00:000’
Now let us assume the session is run on ’01/01/2010 00:00:000’ for initial seed
When the session is executed on ’02/01/2010 00:00:000’, the sequel would be like : select * from employee where last_modified_date > ’01/01/2010 00:00:000’, hereby pulling records that had only got changed in between successive runs

Scenario #02: Change detection using load_id or Run_id
Under this scenario the source rows have a column say load_id, a positive running number. The load_id is updated as and when the record is updated

In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer data type
Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the maximum load_id is stored into mapping variable
Use $$LAST_READ_LOAD_ID in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where load_id > 0; Assuming all records during initial seed have load_id =1, the mapping variable would store ‘1’ into the repository.
Now let us assume the session is run after five load’s into the source, the sequel would be select * from employee where load_id >1 ; hereby we limit the source read only to the records that have been changed after the initial seed
Consecutive runs would take care of updating the load_id & pulling the delta in sequence

In the next blog we can see how to implement CDC when reading from Salesforce.com

To know more about Informatica

Wednesday, 3 March 2010

Processing Multiple XML Files through Informatica – 1

Problem Statement: Data to be processed in Informatica were XML files in nature. The number of XML files to be processed was dynamic in nature. The need was also to ensure that the XML file name from which data is being processed is to be captured.

Resolution:
Option 1 – Using File list as part of Indirect File Sources in session
Option 2 – Using Parameter File and workflow variable

Implementation Details for option 1: Using File list

XML file names to be processed were read using batch script and file list was created containing XML file. This file list name was set under source properties at session level. XML file were read sequentially and data pertaining to every XML file was processed. Since the number of XML files to be processed was dynamic the need of the hour was to achieve looping in Informatica.

Challenge in using File List – Created in a session to run multiple source files for one source instance in the mapping. When file list is used in a mapping as multiple source files for one source instance, the properties of all files must match the source definition. File list are configured in session properties by mentioning the file name of the file list in the Source Filename field and location of the file list in the Source File Directory field. When the session starts, the Integration Service reads the file list, then locates and reads the first file source in the list. After the Integration Service reads the first file, it locates and reads the next file in the list. The issue using XML file names in file list was further compounded by Informatica grouping records pertaining to similar XML node together. This lead to difficultly in identifying which record belonged to which XML file.

Batch Script – batch scripts controlled over all looping in Informatica by encompassing below mentioned tasks:
• Reading XML file names from staging location and creating file list containing XML file names.
• Moving XML files from staging location to archive location.
• Verifying whether there are any more XML files to be processed and depending on the outcome either loop the process by invoking first workflow or end the process
• Using PMCMD commands invoke appropriate workflows.

Workflow Details –
There were two Informatica workflows designed to achieve looping:
• First workflow –created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are:
o Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server.
o Command task which will execute the second workflow to process data within XML files.

• Second workflow will read process XML files and populate staging tables. Details of workflow are:
o A session will read XML file names using indirect file and load into staging tables.
o A command task will move the XML file just processed in file into an archive folder. Using batch script
o A command task will execute a batch script which will:
 Check whether there are any more XML files to be processed.
 If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables.
 If no then process will complete.

Thanks for reading, pls let me know have you faced any similar situation.

To know more about XML Files

Wednesday, 2 September 2009

Process Control / Audit of Workflows in Informatica

1. Process Control – Definition
Process control or Auditing of a workflow in an Informatica is capturing the job information like start time, end time, read count, insert count, update count and delete count. This information is captured and written into table as the workflow executes

2. Structure of Process Control/Audit table
The table structure of process control table is given below,
Table 1: Process Control structure

PROCESS_RUN_ID	Number(p,s)	11	A unique number used to identify a specific process run.
PROCESS_NME	Varchar2	120	The name of the process (this column will be populated with the names of the informatica mappings.)
START_TMST	Date	19	The date/time when the process started.
END_TMST	Date	19	The date/time when the process ended.
ROW_READ_CNT	Number(p,s)	16	The number of rows read by the process.
ROW_INSERT_CNT	Number(p,s)	16	The number of rows inserted by the process.
ROW_UPDATE_CNT	Number(p,s)	16	The number of rows updated by the process.
ROW_DELETE_CNT	Number(p,s)	16	The number of rows deleted by the process
ROW_REJECT_CNT	Number(p,s)	16	The number of rows rejected by the process.
USER_ID	Varchar2	32	The etl user identifier associated with the process.

3. Mapping Logic and Build Steps
The process control flow has two data flows, one is an insert flow and the other is an update flow. The insert flow runs before the main mapping and update flows runs after the main mapping, this option is chosen in “Target Load Plan”. The source for both the flows could be a dummy source which will return one record as output, for example select ‘process’ from dual or select count(1) from Table_A. The following list of mapping variable is to be created,

Table 2: Mapping Parameter and variables

$$PROCESS_ID

$$PROCESS_NAME

$$INSERT_COUNT

$$UPDATE_COUNT

$$DELETE_COUNT

$$REJECT_COUNT

Steps to create Insert flow:

1. Have “select ‘process’ from dual” as Sequel in source qualifier
2. Have a sequence generator to create running process_run_Id ’s
3. In an expression SetVariable ($$PROCESS_RUN_ID,NEXTVAL), $$PROCESS_NAME to o_process_name, a output only field
4. In an expression assign $$SessionStarttime to o_Starttime, an output only field
5. In an expression accept the sequence id from sequence generator
6. Insert into target’ process control table’ with all the above three values

Table 3: Process Control Image after Insert flow

PROCESS_RUN_ID	1
PROCESS_NME	VENDOR_DIM_LOAD
START_TMST	8/23/2009 12:23
END_TMST
ROW_READ_CNT
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
USER_ID	INFA8USER

Steps in main mapping,

1. After the source qualifier, increment the read count in a variable (v_read_count) for each record been read in an expression and SetMaxVariable ($$READ_COUNT,v_read_count)
2. Before the update strategy of target instances, do the same for Insert/Update/Delete counts; all the variables are now set with all their respective counts

Steps to create Update flow:

1. Have “select ‘process’ from dual” as Sequel in source qualifier
2. Use SetMaxvariable to get the process_run_id created in insert flow
3. In an expression assign $$INSERT_COUNT to an o_insert_count, a output only field, assign all the counts in the same way
4. In an expression assign $$SessionEndtime to o_Endtime, an output only field
5. Update the target ‘Process Control Table’ with all the above three values where process_run_id equals the process_run_id generated in Insert flow

Table 4: Process Control Image after Update flow

PROCESS_RUN_ID	1
PROCESS_NME	VENDOR_DIM_LOAD
START_TMST	8/23/2009 12:23
END_TMST	8/23/2009 12:30
ROW_READ_CNT	1000
ROW_INSERT_CNT	900
ROW_UPDATE_CNT	60
ROW_DELETE_CNT	40
ROW_REJECT_CNT	0
USER_ID	INFA8USER

4. Merits over Informatica Metadata
This information is also available in Informatica metadata, however maintaining this within our system has following benefits,

Need not write complex query to bring in the data from metadata tables
Job names need not be mapping names and can be user friendly names
Insert/Delete/Update counts of all as well as individual target can be audited
This audit information can be maintained outside the metadata security level and can be used by other mappings in their transformations
Can be used by mappings that build parameter files
Can be used by mappings that govern data volume
Can be used by Production support to find out the quick status of load

To know more about Informatica Process control audit

Monday, 11 May 2009

Informatica Upgrade Challenge –Default SQL Join for a Source Qualifier in 7x vs. 8x

Default SQL Query Generation for a Source Qualifier:

When relational sources are joined in one Source Qualifier transformation, the PowerCenter Server joins the tables based on the related keys in each table. This default join will be an equijoin like below

Source1.column_name = Source2.column_name

For Default joins to work, the columns in the default join must have:

A primary key-foreign key relationship
Matching data types

In current scenario, Most of the Datawarehouse are designed such a way that the primary key – foreign key relationship are designed in the logic instead of physical tables. In scenarios, where the fact tables are joined with dimension tables, the developer writes the join condition specifically in user defined join property present in source qualifier. This can be also done by default joins by creating relationships between the tables in Informatica instead of creating physically on the tables.

Creating relationships between the tables in Informatica are simple, just by dragging and dropping the column from one source definition to the other in Source Analyzer.

PowerCenter Server and SQL Query Generation

When a session is executed, Powercenter Server has two options

Use the SQL Query typed by the developer if the ‘SQL Query’ property text window has ‘some text’ which is not blank
If the ‘SQL Query’ property is blank then the PowerCenter Server generates a query for each Source Qualifier transformation when it runs the session.
The SQL Query generation process for option 2 is bit different in PowerCenter 7x and 8x.

The Default query from Powercenter 7x is built in the below order

SELECT keyword
Field/Port Names which are linked to the next transformation from Source Qualifier
FROM Keyword
List of table names from the source definitions connected to the Source Qualifier separated by Comma
WHERE Keyword
[Value Present in the “User Defined Join” property ]
[AND Keyword] combined with Default Join Condition formed by Powercenter based on the relationship (If the User Defined Join is not present)
[AND Keyword] combined with Value present in the “Source Filter” property
[ORDER BY keyword By Default, It selects the first field which is being selected after the SELECT clause.]

Where as in the Powercenter 8x, the default query is built in the below order

SELECT keyword
Field/Port Names which are linked to the next transformation from Source Qualifier
FROM Keyword
List of table names from the source definitions connected to the Source Qualifier separated by Comma
WHERE Keyword
[Value Present in the “User Defined Join” property ]
[AND Keyword] combined with Value present in the “Source Filter” property
[AND Keyword] combined with Default Join Condition formed by Powercenter based on the relationship (If the User Defined Join is not present)
[ORDER BY keyword By Default, It selects the first field which is being selected after the SELECT clause.]

The Default join condition in 8x is appended next to the Source Filter where as in 7x the default join is appended before the source filter.

I came across an issue in a recent upgrade project because of this difference in behavior. The mapping that ran properly in 7x which extracted the required data from the source, actually ran into problem 8x. The upgraded mapping in 8x created a Cartesian SQL join. When analyzed found that the source filter had the last line commented with ‘—‘. This made the default join condition to also get commented in 8x which resulted in Cartesian product of the source tables.

So the key is to determine how many of the Informatica mappings/sessions have Source Filter property set with a comment ‘—‘, this could help identify this issue much earlier in the upgrade.

Thanks for reading, share any other upgrade challenge that you have faced.

To know more about Informatica Upgrade Challenge

Wednesday, 22 April 2009

Informatica and Oracle hints in SQL overrides

HINTS used in a SQL statement helps in sending instructions to the Oracle optimizer which would quicken the query processing time involved. Can we make use of these hints in SQL overrides within our Informatica mappings so as to improve a query performance?

On a general note any Informatica help material would suggest: you can enter any valid SQL statement supported by the source database in a SQL override of a Source qualifier or a Lookup transformation or at the session properties level.

While using them as part of Source Qualifier has no complications, using them in a Lookup SQL override gets a bit tricky. Use of forward slash followed by an asterix (“/*”) in lookup SQL Override [generally used for commenting purpose in SQL and at times as Oracle hints.] would result in session failure with an error like:

TE_7017 : Failed to Initialize Server Transformation lkp_transaction

2009-02-19 12:00:56 : DEBUG : (18785 | MAPPING) : (IS | Integration_Service_xxxx) : node01_UAT-xxxx : DBG_21263 : Invalid lookup override

SELECT SALES. SALESSEQ as SalesId, SALES.OrderID as ORDERID, SALES.OrderDATE as ORDERDATE FROM SALES, AC_SALES WHERE AC_SALES. OrderSeq >= (Select /*+ FULL(AC_Sales) PARALLEL(AC_Sales,12) */ min(OrderSeq) From AC_Sales)

This is because Informatica’s parser fails to recognize this special character when used in a Lookup override. There has been a parameter made available starting with PowerCenter 7.1.3 release, which enables the use of forward slash or hints.

§ Infa 7.x

1. Using a text editor open the PowerCenter server configuration file (pmserver.cfg).

2. Add the following entry at the end of the file:

LookupOverrideParsingSetting=1

3. Re-start the PowerCenter server (pmserver).

§ Infa 8.x

1. Connect to the Administration Console.

2. Stop the Integration Service.

3. Select the Integration Service.

4. Under the Properties tab, click Edit in the Custom Properties section.

5. Under Name enter LookupOverrideParsingSetting

6. Under Value enter 1.

7. Click OK.

8. And start the Integration Service.

§ Starting with PowerCenter 8.5, this change could be done at the session task itself as follows:

1. Edit the session.

2. Select Config Object tab.

3. Under Custom Properties add the attribute LookupOverrideParsingSetting and set the Value to 1.

4. Save the session.

Thanks for reading this blog.To know more about Informatica

Thursday, 19 March 2009

Informatica PowerCenter 8x Key Concepts – 6

6. Integration Service (IS)

The key functions of IS are

Interpretation of the workflow and mapping metadata from the repository.
Execution of the instructions in the metadata
Manages the data from source system to target system within the memory and disk

The main three components of Integration Service which enable data movement are,

Integration Service Process
Load Balancer
Data Transformation Manager

6.1 Integration Service Process (ISP)

The Integration Service starts one or more Integration Service processes to run and monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. The functions of the Integration Service Process are,

Locks and reads the workflow
Manages workflow scheduling, ie, maintains session dependency
Reads the workflow parameter file
Creates the workflow log
Runs workflow tasks and evaluates the conditional links
Starts the DTM process to run the session
Writes historical run information to the repository
Sends post-session emails

6.2 Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks to a single node or across the nodes in a grid after performing a sequence of steps. Before understanding these steps we have to know about Resources, Resource Provision Thresholds, Dispatch mode and Service levels

Resources – we can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. For example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed
Three Resource Provision Thresholds, The maximum number of runnable threads waiting for CPU resources on the node called Maximum CPU Run Queue Length. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size called Maximum Memory %. The maximum number of running Session and Command tasks allowed for each Integration Service process running on the node called Maximum Processes
Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to available nodes in a round-robin fashion after checking the “Maximum Process” threshold. Metric-based: Checks all the three resource provision thresholds and dispatches tasks in round robin fashion. Adaptive: Checks all the three resource provision thresholds and also ranks nodes according to current CPU availability
Service Levels establishes priority among tasks that are waiting to be dispatched, the three components of service levels are Name, Dispatch Priority and Maximum dispatch wait time. “Maximum dispatch wait time” is the amount of time a task can wait in queue and this ensures no task waits forever

A .Dispatching Tasks on a node

The Load Balancer checks different resource provision thresholds on the node depending on the Dispatch mode set. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
The Load Balancer dispatches all tasks to the node that runs the master Integration Service process

B. Dispatching Tasks on a grid,

The Load Balancer verifies which nodes are currently running and enabled
The Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow
The Load Balancer verifies that the resource provision thresholds on each candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
The Load Balancer selects a node based on the dispatch mode

6.3 Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:

Retrieves and validates session information from the repository.
Validates source and target code pages.
Verifies connection object permissions.
Performs pushdown optimization when the session is configured for pushdown optimization.
Adds partitions to the session when the session is configured for dynamic partitioning.
Expands the service process variables, session parameters, and mapping variables and parameters.
Creates the session log.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data
Runs post-session stored procedures, SQL, and shell commands and sends post-session email
After the session is complete, reports execution result to ISP

Pictorial Representation of Workflow execution:

A PowerCenter Client request IS to start workflow
IS starts ISP
ISP consults LB to select node
ISP starts DTM in node selected by LB

Thanks for reading this blog.To know more about Informatica PowerCenter 8x

Friday, 8 August 2008

Informatica and Stored Procedures

A. Described below is a scenario where the requirement is to have a stored procedure that returns a cursor as a source.By and large PowerCenter does not support a stored procedure that returns a cursor as a source. The workaround for this is1. The procedure that will load the data to a new table:

CREATE OR REPLACE procedure load (p_initial_date in date, p_final_Date in date) as

str_load varchar2 (500);
str_clean varchar2 (500);
begin
str_clean:= ‘DELETE FROM EMP’;
str_load:= ‘INSERT INTO EMP select * from EMPLOYEE where DOJ between trunc

(p_initial_date) and trunc (p_final_Date) ‘;
execute immediate str_clean;
execute immediate str_load;
EXCEPTION
WHEN OTHERS
THEN
ROLLBACK;
end load;

2. Create the table that will receive the data from the procedure:

SQL> create table EMP as SELECT * from EMPLOYEE where 1 > 2;

3. Add a Store Procedure transformation to the PowerCenter mapping. This transformation will execute this new procedure called as LOAD on this example.

4. Set the run method to be Source Pre Load, to be executed before read the source table.

5. Import the EMP table as a Source Definition. This table will be populated by the new Store Procedure.

If the original store procedure is used by the customer application and you can’t change the source code, you can create a new store procedure that call the original one (without inserting into a table), and execute the insert on the new table executing a loop on the returned cursor.

B. Given below is a situation where you wanted to pass a mapping variable to a stored procedure transformation (it can either be connected or unconnected).

Connected Stored Procedure

The parameters that are passed to a connected Stored Procedure have to be linked from another transformation.
Given below are the steps to pass mapping variable to a connected Stored Procedure transformation:

Create an Expression transformation.
Create an output port in the Expression transformation with the following expression:

$$mapping_variable

This sets the value of this output port to the mapping variable.

Link this output port to the Stored Procedure transformation.

Unconnected Stored Procedure

For unconnected Stored Procedure transformations you can use the mapping variable in the expression calling the stored procedure.
Follow the steps below to pass mapping variable to a unconnected Stored Procedure transformation:

Create an Expression transformation.
Create an output port in the Expression transformation with the following expression:

: SP.GET_NAME_FROM_ID ($$mapping_variable, PROC_RESULT)

In case if you are attempting to use a mapping variable to store the output value of the stored procedure, the session will fail with the below error.

“TE_7002 Transformation Parse Fatal Error; transformation stopped: invalid function reference. Failed to Initialize Server Transformation.”

To resolve the issue replace the mapping variable with the PROC_RESULT system variable.

Example:

Incorrect, using a mapping variable:

:SP.PROCEDURE(FIELD1, $$mapping_variable)

Correct, using the PROC_RESULT system variable:

:SP.PROCEDURE(FIELD1,PROC_RESULT)

:SP.PROCEDURE($$mapping_variable,PROC_RESULT)

The PROC_RESULT system variable assigns the stored procedure output to the port with this expression.

Read More about Informatica

Tuesday, 8 July 2008

Informatica Exceptions – 3

Here are few more Exceptions:

1. There are occasions where sessions fail with the following error in the Workflow Monitor:

“First error code [36401], message [ERROR: Session task instance [session XXXX]: Execution terminated unexpectedly.] “

where XXXX is the session name.

The server log/workflow log shows the following:

“LM_36401 Execution terminated unexpectedly.”

To determine the error do the following:

a. If the session fails before initialization and no session log is created look for errors in Workflow log and pmrepagent log files.

b. If the session log is created and if the log shows errors like

“Caught a fatal signal/exception” or

“Unexpected condition detected at file [xxx] line yy”

then a core dump has been created on the server machine. In this case Informatica Technical Support should be contacted with specific details. This error may also occur when the PowerCenter server log becomes too large and the server is no longer able to write to it. In this case a workflow and session log may not be completed. Deleting or renaming the PowerCenter Server log (pmserver.log) file will resolve the issue.

2. Given below is not an exception but a scenario which most of us would have come across.

Rounding problem occurs with columns in the source defined as Numeric with Precision and Scale or Lookups fail to match on the same columns. Floating point arithmetic is always prone to rounding errors (e.g. the number 1562.99 may be represented internally as 1562.988888889, very close but not exactly the same). This can also affect functions that work with scale such as the Round() function. To resolve this do the following:

a. Select the Enable high precision option for the session.

b. Define all numeric ports as Decimal datatype with the exact precision and scale desired. When high precision processing is enabled the PowerCenter Server support numeric values up to 28 digits. However, the tradeoff is a performance hit (actual performance really depends on how many decimal ports there are).

Read More about Informatica

Friday, 27 June 2008

Exceptions in Informatica – 2

Let us see few more strange exceptions in Informatica

1. Sometimes the Session fails with the below error message.
“FATAL ERROR : Caught a fatal signal/exception
FATAL ERROR : Aborting the DTM process due to fatal signal/exception.”

There might be several reasons for this. One possible reason could be the way the function SUBSTR is used in the mappings, like the length argument of the SUBSTR function being specified incorrectly.
Example:

IIF(SUBSTR(MOBILE_NUMBER, 1, 1) = ‘9′,
SUBSTR(MOBILE_NUMBER, 2, 24),
MOBILE_NUMBER)

In this example MOBILE_NUMBER is a variable port and is 24 characters long.
When the field itself is 24 char long, the SUBSTR starts at position 2 and go for a length of 24 which is the 25th character.

To solve this, correct the length option so that it does not go beyond the length of the field or avoid using the length option to return the entire string starting with the start value.
Example:

In this example modify the expression as follows:

IIF(SUBSTR(MOBILE_NUMBER, 1, 1) = ‘9′,
SUBSTR(MOBILE_NUMBER, 2, 23),
MOBILE_NUMBER)

IIF(SUBSTR(MOBILE_NUMBER, 1, 1) = ‘9′,
SUBSTR(MOBILE_NUMBER, 2),
MOBILE_NUMBER).

2. The following error can occur at times when a session is run

“TE_11015 Error in xxx: No matching input port found for output port OUTPUT_PORT TM_6006 Error initializing DTM for session…”

Where xxx is a Transformation Name.

This error will occur when there is corruption in the transformation.
To resolve this do one of the following: * Recreate the transformation in the mapping having this error.

3. At times you get the below problems,

1. When opening designer, you get “Exception access violation”, “Unexpected condition detected”.

2. Unable to see the navigator window, output window or the overview window in designer even after toggling it on.

3. Toolbars or checkboxes are not showing up correctly.

These are all indications that the pmdesign.ini file might be corrupted. To solve this, following steps need to be followed.

1. Close Informatica Designer
2. Rename the pmdesign.ini (in c:\winnt\system32 or c:\windows\system).
3. Re-open the designer.

When PowerMart opens the Designer, it will create a new pmdesign.ini if it doesn’t find an existing one. Even reinstalling the PowerMart clients will not create this file if it finds one.

Read More about Informatica

Thursday, 5 June 2008

Exceptions in Informatica

There exists no product/tool without strange exceptions/errors, we will see some of those exceptions.

1. You get the below error when you do “Generate SQL” in Source Qualifier and try to validate it.
“Query should return exactly n field(s) to match field(s) projected from the Source Qualifier”
Where n is the number of fields projected from the Source Qualifier.

Possible reasons for this to occur are:

1. The order of ports may be wrong
2. The number of ports in the transformation may be more/less.
3. Sometimes you will have the correct number of ports and in correct order too but even then you might face this error in that case make sure that Owner name and Schema name are specified correctly for the tables used in the Source Qualifier Query.
E.g., TC_0002.EXP_AUTH@TIMEP

2. The following error occurs at times when an Oracle table is used

“[/export/home/build80/zeusbuild/vobs/powrmart
/common/odl/oracle8/oradriver.cpp] line [xxx]”
Where xxx is some line number mostly 241, 291 or 416.

Possible reasons are

1. Use DataDirect Oracle ODBC driver instead of the driver “Oracle in
2. If the table has been imported using the Oracle drivers which are not supported, then columns with Varchar2 data type are replaced by String data type and Number columns are imported with precision Zero(0).

3. Recently I encountered the below error while trying to save a Mapping.

Unexpected Condition Detected
Warning: Unexpected condition at: statbar.cpp: 268
Contact Informatica Technical Support for assistance

When there is no enough memory in System this happens. To resolve this we can either

1. Increase the Virtual Memory in the system
2. If continue to receive the same error even after increasing the Virtual Memory, in Designer, go to ToolsàOptions, go to General tab and clear the “Save MX Data” option.

Read More about Informatica

Ads 468x60px

Pages

Labels

Blog Archive

Labels

Blogroll

About

Blogger templates

Blogger news

Tuesday, 15 March 2011

Tuesday, 1 March 2011

Tuesday, 25 January 2011

What is Pushdown Optimization and things to consider

How does Pushdown Optimization (PO) Works?

Few Benefits in using PO

Things to note when using PO

Monday, 3 January 2011

Tuesday, 26 October 2010

Thursday, 14 October 2010

Monday, 30 August 2010

Thursday, 12 August 2010

Wednesday, 3 March 2010

Wednesday, 2 September 2009

Monday, 11 May 2009

Wednesday, 22 April 2009

Thursday, 19 March 2009

6. Integration Service (IS)

6.1 Integration Service Process (ISP)

6.2 Load Balancer

6.3 Data Transformation Manager (DTM) Process

Friday, 8 August 2008

Tuesday, 8 July 2008

Friday, 27 June 2008

Thursday, 5 June 2008

My Favourite Links

ERP- Oracle

Popular Posts

Mamta @ Twitter

Blog Archive

Labels

About Me