Requirement: Converting rows to columns
Customer
|
Product
|
Cost
|
Cust1
|
P1
|
10
|
Cust1
|
P2
|
20
|
Cust1
|
P3
|
30
|
Cust2
|
ABC
|
10
|
Cust2
|
P2
|
25
|
Cust2
|
Def
|
10
|
Customer
|
Product1
|
Cost1
|
Product2
|
Cost2
|
Product3
|
Cost3
|
Cust1
|
P1
|
10
|
P2
|
20
|
P3
|
30
|
Cust2
|
ABC
|
10
|
P2
|
25
|
Def
|
10
|
|
|
|
|
|
|
|
|
The
above illustration would help in understanding the requirement. We had
to merge multiple records into one record based on certain criteria. The
design had to be reusable since each dimension within the data mart
required this flattening logic.
1. Approach:
The
use of aggregator transformation would group the records by a key, but
retrieval of the values for a particular column as individual columns is
a challenge, hence designed a component ‘Flattener’ based on expression
transformation.
Flattener is a reusable component, a mapplet that performs the function of flattening records.
Flattener
consists of an Expression and a Filter transformation. The expression
is used to club each incoming record based on certain logic. Decision to
write the record to target is taken using the Filter transformation.
2. Design:
The mapplet can receive up to five inputs, of the following data types:
i_Col1 (string), Customer
i_Col2 (string), Product
i_Col3 (decimal), Cost
i_Col4 (decimal) and
i_Col5 (date/time)
Have
kept the names generic trying to accept different data types, so that
the mapplet can be used in any scenario where there is a need for
flattening records.
The mapplet gives out 15×5 sets of output, in the following manner:
o_F1_1 (string), Customer
o_F2_1 (string), Product1
o_F3_1 (decimal), Cost1
o_F4_1 (decimal) and
o_F5_1 (date/time)
o_F1_2 (string), Customer
o_F2_2 (string), Product2
o_F3_2 (decimal), Cost2
o_F4_2 (decimal) and
o_F5_2 (date/time)
… … and so on
The
output record is going to have repetitive sets of 5 columns each (Each
set would refer to one incoming row). Based on the requirement the
number of occurrence of these sets can be increased. The required fields
alone can be used / mapped. For the above example we use just 2 strings
and one decimal for mapping Customer, Product and Cost.
The
mapplet receives records from its parent mapping. The Expression would
initially save each incoming value to a variable and compare it with its
counterpart that came in earlier and is held in its cache as long as
the condition to flatten is satisfied.
Syntax to store current and previous values:
i_Col2 string i
prv_Col2 string v curr_Col2
curr_Col2 string v i_Col2
The
condition/logic to flatten records is parameterized and decided before
mapping is called thus increasing codes’ scalability. The parameterized
logic is passed to the Expression transformation via a Mapplet
parameter. The value is used as an expression to perform the evaluation
and the result is a flag value either ‘1’ or ‘2’.
Syntax for port – flag
Flag integer v $$Expr_compare
An example for parameterized expression
$$Expr_compare = iif (curr_Col1 = prv_Col1 AND curr_Col2 !=
prv_Col2, 1, iif (curr_Col1 != prv_Col1,2))
A variable port named “rec_count” is incremented, based on the flag.
Syntax for port – rec_count
rec_count integer v iif (flag=2,0, iif (flag=1,rec_count + 1,rec_count))
The expression transformation now uses the value in ports “flag” and “rec_count”
to decide the place holder for each incoming input value, i.e. the
column in target table where this data would move into ultimately. This
process is an iterative one and goes on until the comparison logic ($$Expr_compare) holds good, i.e. until all records get flattened per the logic. An example of the place holder expression is shown below:
v_Field1 data type v iif(flag=2 AND rec_count=0,curr_Col1, v_Field1)
Port
“write_flag_1” is set to 1 when the comparison logic fails (meaning
flattening is complete). Filter transformation filters out the record
once it is completely transposed.
Filter condition:
write_flag_1 integer v iif (flag=2 AND write_flag>1 ,1,0)
3. Outcome:
After developing the code and implementing the same we found it to be a useful utility, so thought of sharing it
and would like to hear suggestions from readers on performing the same
functionality in a different way. Please do share your views.