Ads 468x60px

Pages

Subscribe:

Labels

Informatica (18) Integration Service (10) Siebel Business Intelligence (6) ETL (5) Informatica PowerCenter (4) Informatica PowerCenter 8x (4) Oracle (4) Metadata (3) DTM (2) Data Transformation Manager (2) Hexaware Technologies (2) OUD (2) Oracle Unified Directory (2) PowerCenter (2) XML (2) business (2) ASCII (1) Administration Console (1) Application Services (1) Automated Migration (1) BFSI (1) Binary (1) Bulk Load (1) Business Intelligence (1) Business Intelligence Challenge (1) Business Intelligence Company (1) Business Intelligence Consulting (1) Business Objects (1) ByTree (1) CDC (1) CNBC News (1) CNBCTV18’s Shreya Roy (1) COBOL (1) Change Data Capture (1) Collaborative (1) Collaborative Data Management (1) Computing Expression Evaluator (1) Convert Rows To Columns In Inforamtica (1) Data (1) Data Governance (1) Data Management (1) Data Mart (1) Data Type (1) Data Virtualization Services (1) Database (1) Datawarehouse (1) ETL Developers (1) Expression Evaluator (1) Expression Evaluator Debugging (1) Extract (1) FTP (1) File List (1) Flash or Java Applets (1) Flat Files (1) Function (1) HP Diagnostics-Identify (1) HP Diagnostics-Identify bottlenecks (1) HTTP Headers (1) Hexaware Technologies Limited (1) IT Metrics (1) IT companies (1) IT company (1) Index (1) Indirect Source (1) Informatica 8.6 (1) Informatica 8.x (1) Informatica Data Integration Service (1) Informatica Debugger (1) Informatica Debugging Transformation (1) Informatica Development (1) Informatica File Transfer (1) Informatica Power Center (1) Informatica Powercenter 8x Key Concepts (1) Informatica Process Control Audit (1) Informatica Repository Restoration (1) Informatica Server Re-Installation (1) Informatica Server Recovery System (1) Informatica Upgrade Challenge (1) Informatica Workflow (1) Informatica Workflow Process Control (1) Integration Services (1) Integration and Repository services (1) Invalid Objects (1) JavaScript Functions (1) Joiner Transformation (1) LDAP (1) LDAP Directory (1) LDAP Replication (1) Load Balancer (1) LoadRunner 11.5 (1) Looping (1) Manual Correlation (1) Mappings (1) NTLM Resource (1) NTLM authentication (1) Native Driver (1) New Column (1) ODBC (1) ODBC Driver (1) OID (1) OUD Configuration (1) OUD Directory Server (1) OUD Replication (1) Oracle Applications (1) Oracle Client (1) Oracle Hints In SQL (1) Oracle Internet Director (1) Oracle Optimizer (1) Oracle R12 (1) Oracle Solutions (1) Oracle loadrunner (1) OracleErrorActionFile (1) PeopleSoft Jobs (1) PeopleSoft Jobs In Hexaware (1) Peoplesoft Tester In Chennai (1) Peoplesoft Tester Jobs In Chennai (1) Performance (1) Performance Testing (1) PowerCenter 8.5 (1) PowerCenter Server (1) PowerCenter Server Support (1) Powercenter 8.5.1 (1) Pushdown Optimization (1) Re-Import (1) Relational (1) Remote Filename (1) Repository Services (1) Reverse Of A Normalizer In Informatica (1) Rows Read (1) SFTP (1) SJSDS (1) SMP (1) SQL Statement (1) SSH2 (1) SUBSTR Function (1) Session Failed (1) Source Data (1) Source Definition (1) Source Row (1) Store Procedure (1) SuppressNilContentMethod (1) Symmetric Multi-Processing (1) Target Definition (1) Target Row (1) Task Developer (1) Text Flags (1) Transfer Protocol (1) Transformation (1) Transformation Logics (1) Transpose Records (1) UDF (1) User Defined Functions (1) WriteNullXMLFile (1) XML File (1) XML Optimization (1) XML Target (1) XML Tuning (1) XMLSendChildFirst (1) XMLWarnDupRows (1) Zero byte XML file (1) accelerate application (1) employee performance (1) mid-cap it (1) web_reg_save_param (1)

Labels

Blogroll

About

Blogger templates

Blogger news

Showing posts with label CDC. Show all posts
Showing posts with label CDC. Show all posts

Thursday, 12 August 2010

Change Data Capture in Informatica


Change data capture (CDC) is an approach or a technique to identify changes, only changes, in the source. I have seen applications that are built without CDC and later mandate to implement CDC at a higher cost. Building an ETL application without CDC is a costly miss and usually a backtracking step. In this article we can discuss different methods of implementing CDC.


Scenario #01: Change detection using timestamp on source rows
In this typical scenario the source rows have extra two columns say row_created_time & last_modified_time. Row_created_time : time at which the record was first created ; Last_modified_time: time at which the record was last modified
  1. In the mapping create mapping variable $$LAST_ETL_RUN_TIME of datetime data type
  2. Evaluate condition SetMaxVariable ($$LAST_ETL_RUN_TIME, SessionStartTime); this steps stores the time at which the Session was started to $$LAST_ETL_RUN_TIME
  3. Use $$LAST_ETL_RUN_TIME in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where last_modified_date > ’01/01/1900 00:00:000’
  4. Now let us assume the session is run on ’01/01/2010 00:00:000’ for initial seed
  5. When the session is executed on ’02/01/2010 00:00:000’, the sequel would be like : select * from employee where last_modified_date > ’01/01/2010 00:00:000’, hereby pulling records that had only got changed in between successive runs
Scenario #02: Change detection using load_id or Run_id
Under this scenario the source rows have a column say load_id, a positive running number. The load_id is updated as and when the record is updated
  1. In the mapping create mapping variable $$LAST_READ_LOAD_ID of integer data type
  2. Evaluate condition SetMaxVariable ($$LAST_READ_LOAD_ID,load_id); the maximum load_id is stored into mapping variable
  3. Use $$LAST_READ_LOAD_ID in the ‘where’ clause of the source SQL. During the first run or initial seed the mapping variable would have a default value and pull all the records from the source, like: select * from employee where load_id > 0; Assuming all records during initial seed have load_id =1, the mapping variable would store ‘1’ into the repository.
  4. Now let us assume the session is run after five load’s into the source, the sequel would be select * from employee where load_id >1 ; hereby we limit the source read only to the records that have been changed after the initial seed
  5. Consecutive runs would take care of updating the load_id & pulling the delta in sequence
In the next blog we can see how to implement CDC when reading from Salesforce.com