Monday, 24 April 2017

Incorrect String value issue (\xC4 \x90 .... ) while loading data from MySQL source to MySQL target (utf-8, latin1 character set usecases)

In this post, you will learn how to fix special character issue while loading data from source MySQL to destination MySQL table.

You may see below kinda of error during the ETL execution.

Starting job Test at 18:17 24/04/2017.

[statistics] connecting to socket on port 3885
[statistics] connected
Incorrect string value: '\xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 56
Incorrect string value: '\xC4\x90in\xC4\x91......' for column 'ProjectDetails' at row 76
Incorrect string value: '\xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 98
Incorrect string value: \xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 11
[statistics] disconnected
Job Test ended at 18:20 24/04/2017. [exit code=0]


Sample job design : 
source(MySQL input) --> target(MySQL output)

When you try to load text data from mysql source to mysql target you may come across issues as shown in above error message. 

In those cases, setting -Dfile.encoding=utf-8 in advanced settings of Arguments may not solve problem. Also, removing here and keeping it  in Advanced settings of component Additional JDBC parameters also not solves the problem. 

The actual solution worked for me below: 
In component advanced properties : 
For Additional JDBC parameters give value as "useUnicode=true&useOldUTF8Behavior=true"

This will solve the problem and source data will be loaded with out any special character issues. 

NOTE : MySQL default character set is utf-8.

References : 


Hope this helps some one in community :-)