Problem Statement: Data
to be processed in Informatica were XML files in nature. The number of
XML files to be processed was dynamic in nature. The need was also to
ensure that the XML file name from which data is being processed is to
be captured.
Resolution:
Option 1 – Using File list as part of Indirect File Sources in session
Option 2 – Using Parameter File and workflow variable
Option 1 – Using File list as part of Indirect File Sources in session
Option 2 – Using Parameter File and workflow variable
Implementation Details for option 1: Using File list
XML file names to be processed were read
using batch script and file list was created containing XML file. This
file list name was set under source properties at session level. XML
file were read sequentially and data pertaining to every XML file was
processed. Since the number of XML files to be processed was dynamic the
need of the hour was to achieve looping in Informatica.
Challenge in using File List
– Created in a session to run multiple source files for one source
instance in the mapping. When file list is used in a mapping as multiple
source files for one source instance, the properties of all files must
match the source definition. File list are configured in session
properties by mentioning the file name of the file list in the Source
Filename field and location of the file list in the Source File
Directory field. When the session starts, the Integration Service reads
the file list, then locates and reads the first file source in the list.
After the Integration Service reads the first file, it locates and
reads the next file in the list. The issue using XML file names in file
list was further compounded by Informatica grouping records pertaining
to similar XML node together. This lead to difficultly in identifying
which record belonged to which XML file.
Batch Script – batch scripts controlled over all looping in Informatica by encompassing below mentioned tasks:
• Reading XML file names from staging location and creating file list containing XML file names.
• Moving XML files from staging location to archive location.
• Verifying whether there are any more XML files to be processed and depending on the outcome either loop the process by invoking first workflow or end the process
• Using PMCMD commands invoke appropriate workflows.
• Reading XML file names from staging location and creating file list containing XML file names.
• Moving XML files from staging location to archive location.
• Verifying whether there are any more XML files to be processed and depending on the outcome either loop the process by invoking first workflow or end the process
• Using PMCMD commands invoke appropriate workflows.
Workflow Details –
There were two Informatica workflows designed to achieve looping:
• First workflow –created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are:
o Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server.
o Command task which will execute the second workflow to process data within XML files.
There were two Informatica workflows designed to achieve looping:
• First workflow –created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are:
o Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server.
o Command task which will execute the second workflow to process data within XML files.
• Second workflow will read process XML files and populate staging tables. Details of workflow are:
o A session will read XML file names using indirect file and load into staging tables.
o A command task will move the XML file just processed in file into an archive folder. Using batch script
o A command task will execute a batch script which will:
Check whether there are any more XML files to be processed.
If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables.
If no then process will complete.
o A session will read XML file names using indirect file and load into staging tables.
o A command task will move the XML file just processed in file into an archive folder. Using batch script
o A command task will execute a batch script which will:
Check whether there are any more XML files to be processed.
If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables.
If no then process will complete.
Thanks for reading, pls let me know have you faced any similar situation.