Fundamentals of DWH and Pentaho, Talend ETL : 2014

Monday, 22 September 2014

Database table Look up Example in Talend

Database table Look up Example in Talend

Written on : 22nd Sep 2014

Update 1 : 22nd Sep 2014 (Find the updates at end of the post ).
[Sending Duplicate Rows to a flat file after adding CRC column]

Update 2 : 24th Sep 2014 (Find the updates at end of the post ).
Will the look up table executes for the first time ?

Scenario :

There is a flat file of few(let's say 6 rows) rows. Insert this( 6 rows of flat file) data to database table. Again you will get same flat file with newly added rows(Lets say 1 row is added). Now, when you run the job the new rows only will have to inserted to database table and reject the existing rows to a flat file by using Lookup of same table.

Find the below quick snapshot of Scenario :

Sample Data file: largecities.csv

Content in it :

City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)

Beijing;10233;1418;7620

Moscow;10452;1081;9644

Seoul;10422;605;17215

Tokyo;8731;617;14151

Jakarta;8490;664;12738

New York;8310;789;10452

Hyderabad;4000;543;12345

Observer the data first, there is not unique identification of data using key(primary).

Step 1 :

Create meta data of the CSV file (Repository->MetaData->File Delimited).
Drag and drop the created .csv metadata to the design area.
Drag and drop tAddCRCRow component(Palette ->DataQuality->tAddCRC).
Connect these two components as Main flow.
As there is no unique identification of rows we are adding this component . i.e., making the rows to uniquely identify.
Check “Use in CRC” in Implication properties of the component.
NOTE : If you have a unique identifier in your data file you need not to add this component.

Step 2:

Drag and drop tMap compoent & tPostgreSQLOutput
Connect CRC component with tMap then tMap with tPostgreSQLOutput.
Give all the connection details information in tPostgreSQLOutput component.(For example : localhost, 5432, postgres, postgres, public as schema, student as database, largecities as table name).
Action on the table is : Drop table if exists and create and Action on Data is : Insert for the first time later change the action on the table to Create table if does not exist
Open the tMap component to Map the rows from source to target( i.e., file meta data to create table metadata).
In the schema editor of tMapComponent opened copy all the rows from left and paste in right side(This is quick way instead typing manually) and then click on Auto Map ( or else hold the meta data from left and release on the right to corresponding rows.
Convert CRC from Long type to String using in table meta data using this expression String.valueOf(row2.CRC)

Step 3 :

Save the job and run. i.e., up to here this is a normal table creation.

Step 4:

Now the look up scenario starts here. ..

Let's assume the source file is updated with new rows and you would like to insert the updated/added rows to the table.(Remember, the file consists of old records + new records and you need to insert only newly updated records)

That means you will have to look up the existing rows(formally the table) if the rows are existing reject them else insert them to the table.

Drag and drop the tPostgreSQLinput component to the canvas.
Give all the connection details.
Table Name : “largecities” .
Query : "SELECT * FROM largecities" (Should be enclosed in double quotes).
Connect it to tMap component.
Drag and drop “tFileOutputDelimited” on the canvas and connect “reject” from tMap component.
The scenario you can find in Image-1 (Go back to Image-1 and observe how the connections were given).

Now, Open the tMap component

Observe there will be 2 rows (row2 and row3 on the left side) one is of flat file and another is of lookup table one.
On the right side there will be 2 outputs.(output and reject) one is for table output and one is for reject.

Left side of tMap

Hold CRC on row2 and drop it on CRC of row3, you will be getting a connection with purple color.

For CRC in row3 give this expression “String.valueOf(row2.CRC) “

In row3 header there are 3 icons – Click the first one , it will show the Property and value paired default settings.

Change the Join Model from Left outer join to Inner join.

Right side of tMap

output

catch lookup inner join reject = true

and in the expression editor write below

!(String.valueOf(row2.CRC).equals(String.valueOf(row3.CRC)))

In above expression, we are comparing the string value of CRC (key) of row2 and row3..

If the expression is true it will pass the new rows to the table and if the expression is false it'll send the data to dilimited file.

As quick as you write above expression you will get two join lines colored orange.

Reject

copy paste the meta data of row2 to reject metadata.(You can manually type also).Map the row2 meta data to reject meta data.

Click on Apply and then Ok.

Save the job & and let's test it.

Test-1:

Data : largecities.csv

Beijing;10233;7620;3199468552

Moscow;10452;9644;1970518573

Seoul;10422;17215;2044616669

Tokyo;8731;14151;2119623448

Jakarta;8490;12738;555965170

New York;8310;10452;2301332887

PostgreSQL table output: largecities

City Population_x1000_ LandArea_km2_ PopulationDensity_people_km2_ CRC

Beijing 10233 1418 7620 3199468552

Moscow 10452 1081 9644 1970518573

Seoul 10422 605 17215 2044616669

Tokyo 8731 617 14151 2119623448

Jakarta 8490 664 12738 555965170

New York 8310 789 10452 2301332887

reject: reject.csv

As we are processing the file for the first time, no data will go (i.e., no rows) into this file as rejection.

Test-2:

Lets say the file is updated with 3 new records and the sample is

City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)

Beijing;10233;1418;7620

Moscow;10452;1081;9644

Seoul;10422;605;17215

Tokyo;8731;617;14151

Jakarta;8490;664;12738

New York;8310;789;10452

Hyderabad;6767;899;2345

Bellampalli;6767;234;78788

Warangal;78787;3445;90909

PostgreSQL table output: largecities

City Population_x1000_ LandArea_km2_ PopulationDensity_people_km2_ CRC

Beijing 10233 1418 7620 3199468552

Moscow 10452 1081 9644 1970518573

Seoul 10422 605 17215 2044616669

Tokyo 8731 617 14151 2119623448

Jakarta 8490 664 12738 555965170

New York 8310 789 10452 2301332887

Hyderabad 6767 899 2345 1096786685

Bellampalli 6767 234 78788 1693815431

Warangal 78787 3445 90909 3569283860

reject: reject.csv

Beijing;10233;7620;3199468552

Moscow;10452;9644;1970518573

Seoul;10422;17215;2044616669

Tokyo;8731;14151;2119623448

Jakarta;8490;12738;555965170

New York;8310;10452;2301332887

Update 1 : (Eliminating duplicate rows from source file).
When we have duplicate rows in the source file, the duplicate rows are not getting inserted into the table.

To over come this scenario, drag and drop tUniqueRows component b/w tAddCRC and tMap components.

It'll list the columns coming from tAddCRC row. Check the CRC field as key attribute.

Let's take one more flat file to store duplicate rows.. for example if you have a row 3 times what the job has do is insert only 1 row and the remaining 2 duplicate rows have to go into a file.

find the image below to understand better.

Sample data for this design:

City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)
Beijing;10233;1418;7620
Moscow;10452;1081;9644
Seoul;10422;605;17215
Tokyo;8731;617;14151
Msoft;3333;7777;8989
Jakarta;8490;664;12738
New York;8310;789;10452
Hyderabad;6767;899;2345
Bellampalli;6767;234;78788
Msoft;3333;7777;8989
Warangal;78787;3445;90909
Sadakar;1000;2222;3333
Raju;8888;9999;1111
Raju;8888;9999;1111
Msoft;3333;7777;8989

Update 2 : 24th Sep 2014 (Find the updates at end of the post ).
Will the look up table executes for the first time ?

Let us assume you have designed the job as shown in update 1 and running the job.

When you run the job for the 1st time the look up step will give you the error as you do not have table created in database.

My finding here is with out look up we have to run the job 1st time then it will create table in the database with it's data & then apply look up. ( not sure whether it is right way or not )

What I my questions in this scenario are :

1) Does Talend allows to create DDL using metadata before we run the job ?

2) Why sequential execution ? Will it not support parallel execution of components as pentaho kettle ETL does ?

References :

http://www.deepinopensource.com/scd-2-implementations-in-talend/

http://www.vikramtakkar.com/2013/04/tmap-vs-tjoin-talend-open-studio.html

http://www.vikramtakkar.com/2013/01/dynamic-database-lookup-using.html

:-) :-)

Thursday, 18 September 2014

Tip : Converting String to BigDecimal and in tMap column expression of Talend ETL

The below expression in tMap column editor for a particular column will convert string to BigDecimal and replace the "?" mark with empty space .

new BigDecimal(StringHandling.CHANGE("12345?","\\?",""))

:-)

Usage Of Var feature in tMap component in Talend ?

The below article explains the usage of Var in tMap Component.

http://www.etladvisors.com/2012/11/26/using-variables-in-the-tmap-component/

Thank you.

My workout image from the reference is :

Job:

tMap Logic

Thank you :-)

Friday, 7 February 2014

Convert Excel file to CSV file using Java Code in Talend ETL - Why we need to convert Excel file using java code instead components do it ???

Hi Guys,

Aim of this post :
Converting excel file(.xlsx and/or .xls) to CSV file(.csv) using Java Code in Talend Job.

Why to convert Excel to CSV using java code in Talend ETL when you can directly input ExcelInput to CSVoutput?

B'z some times when you dump data taking from
2007 Excel work book to CSV file OR
2007 Excel work book to Table output (Any database table : Postgres or MySQL or SQL)

the data in the few columns in the work book will fall into infinite loop and causes to give "Stack Over flow error "

What I found with Excel Work book 2007 to cause this "Stack over flow " Exception
Categories of fields in Excel
- Few fields could be General
- Few fields could be Date
- Few fields could be Formulas
- Few fields could be Number

Though we Tick the check box "Read excel2007 file format(xlsx)" some times we may not able to dump the all rows of data to the target .

In my case I have an Excel work book of 2007 with 2000 rows and 10+ fields where few fields like date1, date2 having date type in the excel work book and ETL was not allowed me to dump all the data to the target instead giving me "Stack Over Flow" Exception..

I thought it of Memory problem but it was not and then converted Excel 2007 work book to CSV file and dumped the data to Target.

I've directly mapped ExcelComponent Input to CSV component Output but it also given me the same " Stack Over Flow" Exception and

the above situation leads me to convert Excel Workbook to CSV file.... I've tried to

Software setup for this example
ETL
1) TOS_DI-r96646-V5.1.3
2) 2007 Excel Work book with data

Code Execution in Eclipse (Testing Externally)
1) Eclipse version :
     Version: Juno Service Release 2
     Build id: 20130225-0426
2) Code compiled/Executed on JDK 1.7 in eclipse

I've taken code from this site

http://bigsnowball.com/content/convert-excel-xlsx-and-xls-csv?language=ar

You can download the same code with Eclipse project here(.zipfile)
NOTE:
The code uses library files of Apache POI - the Java API for Microsoft Document
You need to download the all the jar files of POI and need to add them to your eclipse project
Download jar files from here poi-bin-3.9-20121203.zip

ETL job design and code implementation of the same

1) Drag and drop "tLibraryLoad" component from Palette to Design area.
2) Add all the required jar files of Apache POI - the Java API for Microsoft Document
3) How to add multiple jar files - You can learn from here(Click this link)
4) Drag and drop "tJava" component.
5) Connect "tLibrayLoad" component to "tJava" component.
    a) Right click on "tLibraryLoad'
    b) Click on Trigger then drop On sub job ok to "tJava" component.

6) Write below java code in "tJava" component Code area ,which will convert the input Excel work book(.xlsx or .xls) to CSV file(.csv).

class CSVUtils {

 /*
 * Convert Excel spreadsheet to CSV.
 * Works for Excel 97-2003 Workbook (.xls) and Excel Workbook (.xlsx).
* Does not work for the XML Spreadsheet 2003 (*.xml) format produced by BIRT.
 * @param fileName
* @throws InvalidFormatException
 * @throws IOException
*/
public void convertExcelToCSV(String fileName) throws InvalidFormatException, IOException {

 BufferedWriter output = new BufferedWriter(new FileWriter(fileName.substring(0, fileName.lastIndexOf(".")) + ".csv"));

 InputStream is = new FileInputStream(fileName);

 Workbook wb = WorkbookFactory.create(is);

 Sheet sheet = wb.getSheetAt(0);

 // hopefully the first row is a header and has a full compliment of
 // cells, else you'll have to pass in a max (yuck)
 int maxColumns = sheet.getRow(0).getLastCellNum();

 for (Row row : sheet) {

 // row.getFirstCellNum() and row.getLastCellNum() don't return the
 // first and last index when the first or last column is blank
 int minCol = 0; // row.getFirstCellNum()
 int maxCol = maxColumns; // row.getLastCellNum()

 for (int i = minCol; i < maxCol; i++) {

 Cell cell = row.getCell(i);
 String buf = "";
 if (i > 0) {
 buf = "|";
 }

 if (cell == null) {
 output.write(buf);
 //System.out.print(buf);
 } else {

 String v = null;

 switch (cell.getCellType()) {
 case Cell.CELL_TYPE_STRING:
 v = cell.getRichStringCellValue().getString();
 break;
 case Cell.CELL_TYPE_NUMERIC:
 if (DateUtil.isCellDateFormatted(cell)) {

 // SimpleDateFormat datetemp = new SimpleDateFormat("yyyy-MM-dd");
 //v = datetemp.format(cell.getDateCellValue());

 v = cell.getDateCellValue().toString();
 } else {
 //v = String.valueOf(cell.getNumericCellValue());
 v = new HSSFDataFormatter().formatCellValue( cell );
 }
 break;
 case Cell.CELL_TYPE_BOOLEAN:
 v = String.valueOf(cell.getBooleanCellValue());
 break;
 case Cell.CELL_TYPE_FORMULA:
 v = cell.getCellFormula();

 switch(cell.getCachedFormulaResultType()) {
 case Cell.CELL_TYPE_NUMERIC:
 v = String.valueOf(cell.getNumericCellValue());
 //v = new HSSFDataFormatter().formatCellValue( cell );
 break;
 case Cell.CELL_TYPE_STRING:
 v = String.valueOf(cell.getRichStringCellValue());
 break;
 }
 //v = String.valueOf(cell.getDateCellValue());
 break;
 default:
 }

 if (v != null) {
 buf = buf + toCSV(v);
 }
 output.write(buf);
 //System.out.print(buf);
 }
 }

 output.write("\n");
 //System.out.println();
 }
 is.close();
 output.close();

 }
 /*
 * Escape the given value for output to a CSV file.
 * Assumes the value does not have a double quote wrapper.
* @return
*/
 public String toCSV(String value) {

 String v = null;
 boolean doWrap = true;

 if (value != null) {

 v = value;

 if (v.contains("\"")) {
 v = v.replace("\"", "\"\""); // escape embedded double quotes
 doWrap = true;
 }

 if (v.contains(",") || v.contains("\n")) {
 doWrap = true;
 }

 if (doWrap) {
 v = "\"" + v + "\""; // wrap with double quotes to hide the comma
 }
 }

 return v;

 }

}

CSVUtils csv = new CSVUtils();
csv.convertExcelToCSV("C:/Users/sadakar002/Desktop/TestBySadha/FilesTemplates/Sadakar.xlsx");

7) You need to import all the packages to run the above code.
You can do that by Clicking " Advanced settings" of tJava Component and writing below code at "Import" area.

//import java.util.List;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;

import java.text.SimpleDateFormat;

import org.apache.poi.hssf.usermodel.HSSFDataFormatter;

8) Save and Run the job
9) Find the converted file in the same location where you are referring your Excel Work book.

NOTE:
Note that when you are taking code from Eclipse console to talend you need to skip few code lines like static key word from the methods.

Add multiple jar files using tLibraryLoad component in Talend ETL

Hi guys,
As part of my project I've explored tLibraryLoad component usage.

I needed to add multiple jar files using tLibraryLoad component and initially I used as many tLibraryLoad components for as many jar files and tired of adding them.

Then after googling and seeking help from expertise I've done the below.. This might be useful for Talend developers.

Aim of the post:
Adding multiple jar files using tLibraryLoad component.
Environment
TOS_DI-r96646-V5.1.3

Steps:

1) Drag and drop tLibraryLoad component from Palette section to ETL design area.

2) Click on the component to View it’s basic settings.

3) Whatever the files you want to use in your job , you need to add all those jar files to the lib folder of Talend installed location.

· For instance : D:\Talend\TOS_DI-r96646-V5.1.3\TOS_DI-r96646-V5.1.3\lib\java

· Restart your talend

4) When you use tLibraryLoad component, initially you must add a default jar file.(For eg: from the drop down select jdbc14.jar file even though there is no use of this jar file with in your job).

5) Next, click on Advanced settings and then keep on clicking + Symbol for the number of jar files you want to add.

6) Give the jar file names in double quotes.

a. For instance : "poi-examples-3.9-20121203.jar"

b. "poi-excelant-3.9-20121203.jar"

c. "poi-ooxml-3.9-20121203.jar"

d. And etc.

(These jar files are taken from "Apache POI - for Microsoft Excel)

7) Test whether the jar files are coming or not when you export the job.

Answer is yes, when you export the job containing tLibraryLoad, it’ll export all the jars added in it to lib folder of export.

A quick understanding image is given below.

Monday, 3 February 2014

Error Time in Talend 5.1.3 Open Studio - "Execution failed :generationManager is not initialized by the perljet/javajet!"

Hello every one,
It has given me a learning experience with Talend 5.1.3...

Yesterday(3rd feb 2014) every thing(Talend project jobs) working fine .. I have developed few jobs and executed successfully..

Today I have executed the same jobs again and I received an error saying
"Execution failed :generationManager is not initialized by the perljet/javajet!"

I had done a small googling and tried the tip from here

What I did to work properly..
1) What ever the newly created jobs(I've 2 jobs for testing purpose for a functionality) created yesterday , I deactivated them.
2) Deleted org folder from \.JETEmitters\src folder and then tried to execute the job. I got null pointer exception.
3) Restored the org folder and executed required job.. it is working now..

What I understand from this error is :
What ever the jobs we create, we need to maintain all the information used by those jobs..
In my case I've created a job for testing purpose and taken a metadata file... later I removed metadata file and kept the job as it is.. so when I execute the other talend jobs.. the project was looking for all the resources available for the remaining jobs as well. and caused for the error.

Routine example in Talend ETL - Convert strings to dates

Hello every one..

In this example you will input different formats of strings and convert them to standard date format...
What ever the format you provide in the form of strings the below code converts into a single date format...
For example if your date is dd-MM-yyyy or dd/MM-yyyy which are strings in your meta data such as .csv or .xls or .xlsx ... by using this code the format of the string is changed to Date with a single formatted output.

Coding credit goes to Vaibhav ji(Data warehouse architect - one of my colleagues)...

From the Repository, right click on Code -> Create routine and give the name of it..
Write below code in your routine. Routine name and class name should be equal.

package routines;

import java.util.Date;
import java.text.ParseException;
import java.text.SimpleDateFormat;

public class DateLogics{
    private static Date get_date = null;

    /**
    * @param get_date the get_date to set
    */
    public static void setGet_date(String get_dates) {
       if ((StringHandling.INDEX(get_dates,"-")==2))
       {
           try {
               get_date =(Date) new SimpleDateFormat("dd-MM-yyyy").parse(get_dates);
           } catch (ParseException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }
       }
       if ((StringHandling.INDEX(get_dates,".")==2))
       {
           try {
               get_date =(Date) new SimpleDateFormat("dd.MM.yyyy").parse(get_dates);
           } catch (ParseException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }
       }
       if ((StringHandling.INDEX(get_dates,"/")==2))
       {
           try {
               get_date =(Date) new SimpleDateFormat("dd/MM/yyyy").parse(get_dates);
           } catch (ParseException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
           }
       }

    }

    /**
    * @return the get_date
    */
    public static Date getGet_date() {
       return get_date;
    }

}
Now, create a job and drag and drop a tJava component and write below code..
String date_input = "10.10.2014";
System.out.println(StringHandling.LEN(date_input));
DateLogics.setGet_date(date_input);
System.out.println(DateLogics.getGet_date());

NOTE:
You need to provide as many if conditions for the different kinds of date formats as strings so that you can generalize the code ..

version tested: TOS : 5.1.3

Thank you.

Tuesday, 28 January 2014

Routines in Talend ETL -- finding date is in which format in the form of string then converting strings to a standard date

Hello every one,
Today I've tried routines concepts in Talend ETL and would like to share with you how we can write java code and use it in through out the job in various places using routines

Aim of this article is
1) Identifying the format of the string in which date format it is
2) Converting the identified string of date data to standard date.

Software compatibility
1) OS: W7
2) Talend : TOS_DI-r96646-V5.1.3

To achieve the above,we write java code in ETL using routines concept.

What are routines in Talend ETL ?
* Routines are reusable pieces of Java code.
* Routines enable you to write custom code in Java to optimize data processing, improve Job capacity, and to extend Talend Studio features.
* There are two types of routines in Talend Studio:

System routines: a number of system routines are provided. They are classified according to the type of data they process, such as: numerical, string, date. These types of routines are read-only, and you can call them directly in a Talend Job.
User routines: You can create your own new user routines or adapt existing routines.

Example that I am going to explain is a User routines

1. In the Repository Pane, Expand Code then Right click on Routines.
2. Give the routine name for example lets say routine name is StringToDate.
3. Remember that your routine name is similar to class name when you create it.
4. Also, the default package for routines is package routines; on the first row.
5. Write the below java code which taken from here to achieve the aim of the article.

package routines;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.text.DateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

class ThreadSafeSimpleDateFormat {

 private DateFormat df;

 public ThreadSafeSimpleDateFormat(String format) {
 this.df = new SimpleDateFormat(format);
 }

 public synchronized String format(Date date) {
 return df.format(date);
 }

 public synchronized Date parse(String string) throws ParseException {
 return df.parse(string);
 }

 }

public class StringToDate {
 private static List<ThreadSafeSimpleDateFormat> dateFormats = new ArrayList<ThreadSafeSimpleDateFormat>(){
 /**
 *
 */
 private static final long serialVersionUID = 1L;

 {
 add(new ThreadSafeSimpleDateFormat("EEE MMM dd HH:mm:ss z yyyy"));

 add(new ThreadSafeSimpleDateFormat("yyyy-MM-dd"));
 add(new ThreadSafeSimpleDateFormat("yyyy/MM/dd"));
 add(new ThreadSafeSimpleDateFormat("yyyy.MM.dd"));

 add(new ThreadSafeSimpleDateFormat("MM/dd/yyyy"));
 add(new ThreadSafeSimpleDateFormat("MM-dd-yyyy"));
 add(new ThreadSafeSimpleDateFormat("MM.dd.yyyy"));

 add(new ThreadSafeSimpleDateFormat("dd-MM-yyyy"));
 add(new ThreadSafeSimpleDateFormat("dd/MM/yyyy"));
 add(new ThreadSafeSimpleDateFormat("dd.MM.yyyy"));


 add(new ThreadSafeSimpleDateFormat("M/dd/yyyy"));
 add(new ThreadSafeSimpleDateFormat("dd.M.yyyy"));
 add(new ThreadSafeSimpleDateFormat("M/dd/yyyy hh:mm:ss a"));
 add(new ThreadSafeSimpleDateFormat("dd.M.yyyy hh:mm:ss a"));
 add(new ThreadSafeSimpleDateFormat("dd.MMM.yyyy"));
 add(new ThreadSafeSimpleDateFormat("dd-MMM-yyyy"));
 add(new ThreadSafeSimpleDateFormat("yyyy/dd/MM"));

 }
 };



 public static Date convertToDate(String input) {
 //System.out.println("input is "+input);
 Date date = null;
 if(null == input) {
 return null;
 }
 for (ThreadSafeSimpleDateFormat format : dateFormats) {
 try {
 format.setLenient(true);
 date = format.parse(input);
 //System.out.println("date is======================="+date);
 } catch (ParseException e) {

 }
 if (date != null) {
 break;
 }
 }

 return date;
 }
 }

6. Made your code error free by importing all the required packages.
7. Now it is the time to use the created routine in your Talend job.
8. Syntax to use the methods/logic implemented in your Talend job is
routines.rouituneName.methodName(Parameter/fieldName) - routines. is optional.
9. To meet the 2nd requirement of this article you need to take a java component in your talend job
 (Assume you it is a new job created for testing this functionality).
10. Drag and drop tJava component from Palette section and write below code.

System.out.println("24-07-2012" + " = " + StringToDate.convertToDate("24-07-2012"));
System.out.println("2012/14/10" + " = " + StringToDate.convertToDate("2012/14/10"));
System.out.println("2013-03-15" + " = " + StringToDate.convertToDate("2013-03-15"));

 System.out.println("10/14/2012" + " = " + StringToDate.convertToDate("10/14/2012"));
 System.out.println("10-Jan-2012" + " = " + StringToDate.convertToDate("10-Jan-2012"));
 System.out.println("01.03.2002" + " = " + StringToDate.convertToDate("01.03.2002"));
 System.out.println("12/03/2010" + " = " + StringToDate.convertToDate("12/03/2010"));
 System.out.println("19.Feb.2011" + " = " + StringToDate.convertToDate("19.Feb.2011" ));
 System.out.println("4/20/2012" + " = " + StringToDate.convertToDate("4/20/2012"));
 System.out.println("some string" + " = " + StringToDate.convertToDate("some string"));
 System.out.println("123456" + " = " + StringToDate.convertToDate("123456"));
 System.out.println("null" + " = " + StringToDate.convertToDate(null));

NOTE: In the above code
Routine Name : StirngToDate
Method Name from routine which has logic is : convertToDate
Parameter/Fields : a string in the above code.

11. Sample output of the above job is

Starting job date_1 at 18:29 28/01/2014.
[statistics] connecting to socket on port 3364
[statistics] connected
24-07-2012 = Mon Jan 02 00:00:00 IST 30
2012/14/10 = Sun Feb 10 00:00:00 IST 2013
2013-03-15 = Fri Mar 15 00:00:00 IST 2013
10/14/2012 = Tue Aug 04 00:00:00 IST 16
10-Jan-2012 = Tue Jan 10 00:00:00 IST 2012
01.03.2002 = Mon Aug 23 00:00:00 IST 6
12/03/2010 = Tue Aug 31 00:00:00 IST 17
19.Feb.2011 = Sat Feb 19 00:00:00 IST 2011
4/20/2012 = Mon Feb 02 00:00:00 IST 11
some string = null
123456 = null
null = null
[statistics] disconnected
Job date_1 ended at 18:29 28/01/2014. [exit code=0]

12. This is the how you can find out the given string is which format of date by defining all the formats in your routine java class and then converting every format of date to a single standard date format.

Java code is taken from this link
http://www.verious.com/tutorial/check-if-string-is-valid-date-in-java/

References:

1) http://www.vikramtakkar.com/2013/03/creatng-custom-routines-in-talend-or.html

2) https://help.talend.com/display/KB/Creating+a+user+routine?focusedCommentId=27989710#comment-27989710

3) https://help.talend.com/pages/viewpage.action?pageId=5671119

Wednesday, 8 January 2014

Converting Types in tMap Component - Tips & Tricks

Checking whether the field is having null values or not. If it is null value print the NULL(nothing) , if it is NOT null then convert the type and display the filed in the output(output could be any database output table).

1) String to Integer
Relational.ISNULL(row1.EmployeeID) == true ? null:Integer.parseInt(row1.EmployeeID)

2) String to Double

Relational.ISNULL(row1.Revenue)==true? null:Double.parseDouble(row1.Revenue)

3) String to Long

Relational.ISNULL(row1.Category_ID) == true ? null:Long.parseLong(row1.Category_ID)

4) String to Date

Relational.ISNULL(row1.Date)==true?

null:new SimpleDateFormat ("EEE mm/DD/yyyy").parse(row1.Date)

5) Integer to Double

Relational.ISNULL(row1.NumberOfEmployees)==true? null:Double.parseDouble(Integer.toString(row1.NumberOfEmployees))

6) Integer to String

(Integer.toString(row1.id)).equals(Integer.toString(row2.id))

((new Integer(row1.id)).toString()).equals((new Integer(row2.id)).toString())==false

This post will be up-datable one.

Pages