Tuesday 28 January 2014

Routines in Talend ETL -- finding date is in which format in the form of string then converting strings to a standard date

Hello every one,
Today I've tried routines concepts in Talend ETL and would like to share with you how we can write java code and use it in through out the job in various places using routines

Aim of this article is
1) Identifying the format of the string in which date format it is
2) Converting the identified string of date data to standard date.

Software compatibility 
1) OS: W7
2) Talend : TOS_DI-r96646-V5.1.3


To achieve the above,we write java code in ETL using routines concept.

What are routines in Talend ETL ?
* Routines are reusable pieces of Java code.
* Routines enable you to write custom code in Java to optimize data processing, improve Job capacity, and to extend Talend Studio features. 
* There are two types of routines in Talend Studio:
  • System routines: a number of system routines are provided. They are classified according to the type of data they process, such as: numerical, string, date. These types of routines are read-only, and you can call them directly in a Talend Job.
  • User routines: You can create your own new user routines or adapt existing routines.
 Example that I am going to explain is a User routines 

1. In the Repository Pane, Expand Code then Right click on Routines.
2. Give the routine name for example lets say routine name is StringToDate.
3. Remember that your routine name is similar to class name when you create it.
4. Also, the default package for routines is package routines; on the first row.
5. Write the below java code which taken from here to achieve the aim of the article.

package routines;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.text.DateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

class ThreadSafeSimpleDateFormat {
   
     private DateFormat df;
   
     public ThreadSafeSimpleDateFormat(String format) {
         this.df = new SimpleDateFormat(format);
     }
   
     public synchronized String format(Date date) {
         return df.format(date);
     }
   
     public synchronized Date parse(String string) throws ParseException {
         return df.parse(string);
     }

    }


public class StringToDate {
     private static List<ThreadSafeSimpleDateFormat> dateFormats = new ArrayList<ThreadSafeSimpleDateFormat>(){
         /**
         *
         */
        private static final long serialVersionUID = 1L;

        {
             add(new ThreadSafeSimpleDateFormat("EEE MMM dd HH:mm:ss z yyyy"));
           
             add(new ThreadSafeSimpleDateFormat("yyyy-MM-dd"));
             add(new ThreadSafeSimpleDateFormat("yyyy/MM/dd"));
             add(new ThreadSafeSimpleDateFormat("yyyy.MM.dd"));
           
             add(new ThreadSafeSimpleDateFormat("MM/dd/yyyy"));
             add(new ThreadSafeSimpleDateFormat("MM-dd-yyyy"));
             add(new ThreadSafeSimpleDateFormat("MM.dd.yyyy"));
           
             add(new ThreadSafeSimpleDateFormat("dd-MM-yyyy"));
             add(new ThreadSafeSimpleDateFormat("dd/MM/yyyy"));
             add(new ThreadSafeSimpleDateFormat("dd.MM.yyyy"));
           
           
            add(new ThreadSafeSimpleDateFormat("M/dd/yyyy"));
            add(new ThreadSafeSimpleDateFormat("dd.M.yyyy"));
            add(new ThreadSafeSimpleDateFormat("M/dd/yyyy hh:mm:ss a"));
            add(new ThreadSafeSimpleDateFormat("dd.M.yyyy hh:mm:ss a"));
            add(new ThreadSafeSimpleDateFormat("dd.MMM.yyyy"));
            add(new ThreadSafeSimpleDateFormat("dd-MMM-yyyy"));
            add(new ThreadSafeSimpleDateFormat("yyyy/dd/MM"));
                              
                }
     };
       
  
       
            public static Date convertToDate(String input) {
                //System.out.println("input is "+input);
                Date date = null;
                if(null == input) {
                    return null;
                }
                for (ThreadSafeSimpleDateFormat format : dateFormats) {
                    try {
                        format.setLenient(true);
                        date = format.parse(input);
                        //System.out.println("date is======================="+date);
                    } catch (ParseException e) {
                     
                    }
                    if (date != null) {
                        break;
                    }
                }
        
                return date;
            }
        }
 

6. Made your code error free by importing all the required packages.
7. Now it is the time to use the created routine in your Talend job.
8. Syntax to use the methods/logic implemented in your Talend job is
    routines.rouituneName.methodName(Parameter/fieldName)  - routines. is optional.
9. To meet the 2nd requirement of this article you need to take a java component in your talend job
    (Assume you it is a new job created for testing this functionality).
10. Drag and drop tJava component from Palette section and write below code.

System.out.println("24-07-2012" + " = " + StringToDate.convertToDate("24-07-2012"));
System.out.println("2012/14/10" + " = " + StringToDate.convertToDate("2012/14/10"));
System.out.println("2013-03-15" + " = " + StringToDate.convertToDate("2013-03-15"));

    System.out.println("10/14/2012" + " = " + StringToDate.convertToDate("10/14/2012"));
    System.out.println("10-Jan-2012" + " = " + StringToDate.convertToDate("10-Jan-2012"));
    System.out.println("01.03.2002" + " = " + StringToDate.convertToDate("01.03.2002"));
    System.out.println("12/03/2010" + " = " + StringToDate.convertToDate("12/03/2010"));
    System.out.println("19.Feb.2011" + " = " + StringToDate.convertToDate("19.Feb.2011" ));
    System.out.println("4/20/2012" + " = " + StringToDate.convertToDate("4/20/2012"));
    System.out.println("some string" + " = " + StringToDate.convertToDate("some string"));
    System.out.println("123456" + " = " + StringToDate.convertToDate("123456"));
    System.out.println("null" + " = " + StringToDate.convertToDate(null));
   

NOTE: In the above code
Routine Name : StirngToDate
Method Name from routine which has logic is : convertToDate
Parameter/Fields : a string in the above code.

11. Sample output of the above job is

Starting job date_1 at 18:29 28/01/2014.
[statistics] connecting to socket on port 3364
[statistics] connected
24-07-2012 = Mon Jan 02 00:00:00 IST 30
2012/14/10 = Sun Feb 10 00:00:00 IST 2013
2013-03-15 = Fri Mar 15 00:00:00 IST 2013
10/14/2012 = Tue Aug 04 00:00:00 IST 16
10-Jan-2012 = Tue Jan 10 00:00:00 IST 2012
01.03.2002 = Mon Aug 23 00:00:00 IST 6
12/03/2010 = Tue Aug 31 00:00:00 IST 17
19.Feb.2011 = Sat Feb 19 00:00:00 IST 2011
4/20/2012 = Mon Feb 02 00:00:00 IST 11
some string = null
123456 = null
null = null
[statistics] disconnected
Job date_1 ended at 18:29 28/01/2014. [exit code=0]

   
12. This is the how you can find out the given string is which format of date by defining all the formats in your routine java class and then converting every format of date to a single standard date format.

Java code is taken from this link 
http://www.verious.com/tutorial/check-if-string-is-valid-date-in-java/

References:

1) http://www.vikramtakkar.com/2013/03/creatng-custom-routines-in-talend-or.html

2) https://help.talend.com/display/KB/Creating+a+user+routine?focusedCommentId=27989710#comment-27989710

3) https://help.talend.com/pages/viewpage.action?pageId=5671119


Wednesday 8 January 2014

Converting Types in tMap Component - Tips & Tricks

Checking whether the field is having null values or not. If it is null value print the NULL(nothing) , if it is NOT null then convert the type and display the filed in the output(output could be any database output table).
 
1) String to Integer
  Relational.ISNULL(row1.EmployeeID) == true ? null:Integer.parseInt(row1.EmployeeID)



2) String to Double
Relational.ISNULL(row1.Revenue)==true? null:Double.parseDouble(row1.Revenue)


3) String to Long
Relational.ISNULL(row1.Category_ID) == true ? null:Long.parseLong(row1.Category_ID)

4) String to Date
Relational.ISNULL(row1.Date)==true?
null:new SimpleDateFormat ("EEE mm/DD/yyyy").parse(row1.Date)   


5) Integer to Double
Relational.ISNULL(row1.NumberOfEmployees)==true? null:Double.parseDouble(Integer.toString(row1.NumberOfEmployees)) 
 
6) Integer to String

(Integer.toString(row1.id)).equals(Integer.toString(row2.id))
((new Integer(row1.id)).toString()).equals((new Integer(row2.id)).toString())==false



This post will be up-datable one.  

Run if example in Talend ETL [How to write conditions]

Run If conditions is used to develop the logics in your ETL job.
For example below jod design describes you how to work with it.

Drag and drop the below shown components (tJava_1, tFileExist_1, tFileExist_2)
Let's say you are passing a file name using tJava component 
Refer this post for tJava explantion : here
Lets say file 1 name is "ABC.XLSX" 
and file 2 name is "PQR.XLSX" 

Execute upper portion if the file name is "ABC.XLSX" or execute below part of the job if the file name is "PQR.XLSX"

Right click on tJava component ->Click on Trigger -> select Run If

Condition 1 :
context.filename.substring(0, context.filename.lastIndexOf("_")).trim().equals("ABC")

Condition 2:
context.filename.substring(0, context.filename.lastIndexOf("_")).trim().equals("PQR")

To edit the condition, click on "If(order1)" and press F2.

In this way you can work with the tJava component along with some conditions provided on it.



tJavaComponent Example in Talend

tJavaComponent complete referral
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/5.3+tJava

 This post is about the excel file name from context variables

Let's say I have a file stored in my system location is : SADAKAR_DEMO_TABLE_1387461868.XLSX

Below java code gives the usage of tJava component

System.out.print(context.filename);

output:  
SADAKAR_DEMO_TABLE_1387461868.XLSX

String fileName = context.filename;
fileName.replaceAll("_\\d+.*", "").trim();

System.out.print("\n"+
           fileName.substring(0, fileName.lastIndexOf("_"))+"\n"+
           fileName.replaceAll("_[0-9]+", "")+"\n"+
            fileName.replaceAll("_\\d+.*", ""));
output:
SADAKAR_DEMO_TABLE
SADAKAR_DEMO_TABLE.XLSX

SADAKAR_DEMO_TABLE

System.out.print(context.filepath+context.filename);
output:
 \Users\sadakar002\Desktop\myproject\my_folder\my_sub_folder\SADAKAR_DEMO_TABLE.XLSX


Based up the file name you are inputting from context variable it'll display the corresponding results of your java code.