Informatica Way: Metadata

Showing posts with label Metadata. Show all posts

Thursday, 23 December 2010

Leveraging Metadata in Informatica Workflow-Session/Analysis

We can leverage the metadata collected in the Informatica repository for many interesting analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.

This SQL Queries can be executed in Oracle database with no changes and requires little modification with other databases.

Failed Sessions

The following query lists the failed sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

SESSION_NAME,

LAST_ERROR AS ERROR_MESSAGE,

DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS,

ACTUAL_START AS START_TIME,

SESSION_TIMESTAMP

FROM REP_SESS_LOG

WHERE RUN_STATUS_CODE != 1

AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)

RESULT:

Long running Sessions

The following query lists long running sessions. To make it work for the last ‘n’ days, replace SYSDATE-1 with SYSDATE – n

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

SESSION_NAME,

SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS,

SUCCESSFUL_ROWS AS TARGET_ROWS,

ACTUAL_START AS START_TIME,

SESSION_TIMESTAMP

FROM REP_SESS_LOG

WHERE RUN_STATUS_CODE = 1

AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE)

AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60))

ORDER BY SESSION_TIMESTAMP

RESULT:

Invalid Tasks

The following query lists folder names and task name, version number, and last saved for all invalid tasks.

QUERY:

SELECT SUBJECT_AREA AS FOLDER_NAME,

DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE,

TASK_NAME AS OBJECT_NAME,

VERSION_NUMBER,

LAST_SAVED

FROM REP_ALL_TASKS

WHERE IS_VALID=0

AND IS_ENABLED=1

ORDER BY SUBJECT_AREA,TASK_NAME

RESULT:

Thanks for reading, do you have other scenarios where Workflow Metadata has been effective …wish you a very happy new year 2011.

To know more about Informatica Workflow

Wednesday, 2 September 2009

Process Control / Audit of Workflows in Informatica

1. Process Control – Definition
Process control or Auditing of a workflow in an Informatica is capturing the job information like start time, end time, read count, insert count, update count and delete count. This information is captured and written into table as the workflow executes

2. Structure of Process Control/Audit table
The table structure of process control table is given below,
Table 1: Process Control structure

PROCESS_RUN_ID	Number(p,s)	11	A unique number used to identify a specific process run.
PROCESS_NME	Varchar2	120	The name of the process (this column will be populated with the names of the informatica mappings.)
START_TMST	Date	19	The date/time when the process started.
END_TMST	Date	19	The date/time when the process ended.
ROW_READ_CNT	Number(p,s)	16	The number of rows read by the process.
ROW_INSERT_CNT	Number(p,s)	16	The number of rows inserted by the process.
ROW_UPDATE_CNT	Number(p,s)	16	The number of rows updated by the process.
ROW_DELETE_CNT	Number(p,s)	16	The number of rows deleted by the process
ROW_REJECT_CNT	Number(p,s)	16	The number of rows rejected by the process.
USER_ID	Varchar2	32	The etl user identifier associated with the process.

3. Mapping Logic and Build Steps
The process control flow has two data flows, one is an insert flow and the other is an update flow. The insert flow runs before the main mapping and update flows runs after the main mapping, this option is chosen in “Target Load Plan”. The source for both the flows could be a dummy source which will return one record as output, for example select ‘process’ from dual or select count(1) from Table_A. The following list of mapping variable is to be created,

Table 2: Mapping Parameter and variables

$$PROCESS_ID

$$PROCESS_NAME

$$INSERT_COUNT

$$UPDATE_COUNT

$$DELETE_COUNT

$$REJECT_COUNT

Steps to create Insert flow:

1. Have “select ‘process’ from dual” as Sequel in source qualifier
2. Have a sequence generator to create running process_run_Id ’s
3. In an expression SetVariable ($$PROCESS_RUN_ID,NEXTVAL), $$PROCESS_NAME to o_process_name, a output only field
4. In an expression assign $$SessionStarttime to o_Starttime, an output only field
5. In an expression accept the sequence id from sequence generator
6. Insert into target’ process control table’ with all the above three values

Table 3: Process Control Image after Insert flow

PROCESS_RUN_ID	1
PROCESS_NME	VENDOR_DIM_LOAD
START_TMST	8/23/2009 12:23
END_TMST
ROW_READ_CNT
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
USER_ID	INFA8USER

Steps in main mapping,

1. After the source qualifier, increment the read count in a variable (v_read_count) for each record been read in an expression and SetMaxVariable ($$READ_COUNT,v_read_count)
2. Before the update strategy of target instances, do the same for Insert/Update/Delete counts; all the variables are now set with all their respective counts

Steps to create Update flow:

1. Have “select ‘process’ from dual” as Sequel in source qualifier
2. Use SetMaxvariable to get the process_run_id created in insert flow
3. In an expression assign $$INSERT_COUNT to an o_insert_count, a output only field, assign all the counts in the same way
4. In an expression assign $$SessionEndtime to o_Endtime, an output only field
5. Update the target ‘Process Control Table’ with all the above three values where process_run_id equals the process_run_id generated in Insert flow

Table 4: Process Control Image after Update flow

PROCESS_RUN_ID	1
PROCESS_NME	VENDOR_DIM_LOAD
START_TMST	8/23/2009 12:23
END_TMST	8/23/2009 12:30
ROW_READ_CNT	1000
ROW_INSERT_CNT	900
ROW_UPDATE_CNT	60
ROW_DELETE_CNT	40
ROW_REJECT_CNT	0
USER_ID	INFA8USER

4. Merits over Informatica Metadata
This information is also available in Informatica metadata, however maintaining this within our system has following benefits,

Need not write complex query to bring in the data from metadata tables
Job names need not be mapping names and can be user friendly names
Insert/Delete/Update counts of all as well as individual target can be audited
This audit information can be maintained outside the metadata security level and can be used by other mappings in their transformations
Can be used by mappings that build parameter files
Can be used by mappings that govern data volume
Can be used by Production support to find out the quick status of load

To know more about Informatica Process control audit

Friday, 16 January 2009

Informatica PowerCenter 8x Key Concepts – 5

5. Repository Service

As we already discussed about metadata repository, now we discuss a separate,multi-threaded process that retrieves, inserts and updates metadata in the repository database tables, it is Repository Service.
Repository service manages connections to the PowerCenter repository from PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository manager, console and integration service. Repository service is responsible for ensuring the consistency of metdata in the repository.

Creation & Properties:

Use the PowerCenter Administration Console Navigator window to create a Repository Service. The properties needed to create are,

Service Name – name of the service like rep_SalesPerformanceDev

Location – Domain and folder where the service is created

License – license service name

Node, Primary Node & Backup Nodes – Node on which the service process runs

CodePage – The Repository Service uses the character set encoded in the repository code page when writing data to the repository

Database type & details – Type of database, username, pwd, connect string and tablespacename

The above properties are sufficient to create a repository service, however we can take a look at following features which are important for better performance and maintenance.

General Properties

> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform administrative tasks like enabling version control or promoting local to global repository

> EnableVersionControl: Creates a versioned repository

Node Assignments: “High availability option” is licensed feature which allows us to choose Primary & Backup nodes for continuous running of the repository service. Under normal licenses would see only only Node to select from

Database Properties

> DatabaseArrayOperationSize: Number of rows to fetch each time an array database operation is issued, such as insert or fetch. Default is 100

> DatabasePoolSize:Maximum number of connections to the repository database that the Repository Service can establish. If the Repository Service tries to establish more connections than specified for DatabasePoolSize, it times out the connection attempt after the number of seconds specified for DatabaseConnectionTimeout

Advanced Properties

> CommentsRequiredFor Checkin: Requires users to add comments when checking in repository objects.

> Error Severity Level: Level of error messages written to the Repository Service log. Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug

> EnableRepAgentCaching:Enables repository agent caching. Repository agent caching provides optimal performance of the repository when you run workflows. When you enable repository agent caching, the Repository Service process caches metadata requested by the Integration Service. Default is Yes.

> RACacheCapacity:Number of objects that the cache can contain when repository agent caching is enabled. You can increase the number of objects if there is available memory on the machine running the Repository Service process. The value must be between 100 and 10,000,000,000. Default is 10,000

> AllowWritesWithRACaching: Allows you to modify metadata in the repository when repository agent caching is enabled. When you allow writes, the Repository Service process flushes the cache each time you save metadata through the PowerCenter Client tools. You might want to disable writes to improve performance in a production environment where the Integration Service makes all changes to repository metadata. Default is Yes.

Environment Variables

The database client code page on a node is usually controlled by an environment variable. For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All Integration Services and Repository Services that run on this node use the same environment variable. You can configure a Repository Service process to use a different value for the database client code page environment variable than the value set for the node.

You might want to configure the code page environment variable for a Repository Service process when the Repository Service process requires a different database client code page than the Integration Service process running on the same node.

For example, the Integration Service reads from and writes to databases using the UTF-8 code page. The Integration Service requires that the code page environment variable be set to UTF-8. However, you have a Shift-JIS repository that requires that the code page environment variable be set to Shift-JIS. Set the environment variable on the node to UTF-8. Then add the environment variable to the Repository Service process properties and set the value to Shift-JIS.

Read More about Informatica PowerCenter 8x

Informatica Way

Ads 468x60px

Pages

Labels

Blog Archive

Labels

Blogroll

About

Blogger templates

Blogger news

Thursday, 23 December 2010

Leveraging Metadata in Informatica Workflow-Session/Analysis

Wednesday, 2 September 2009

Process Control / Audit of Workflows in Informatica

Friday, 16 January 2009

Informatica PowerCenter 8x Key Concepts – 5

Environment Variables

My Favourite Links

ERP- Oracle

Popular Posts

Mamta @ Twitter

Blog Archive

Labels

About Me