Sqoop | My Big Data Blog

I have a text field on HDFS that can have a very long value of more than 4000 characters length.

I had to export this to Oracle using Sqoop. In Oracle table, this field is defined with data type VARCHAR2(4000 BYTE).

I get an error when Sqoop’ing.

Caused by: java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column

Why?

Oracle treats value longer than the defined limit for that field as LONG. Hence the error. The error is not very informative though.

Solution:

Use a CLOB datatype for that field on Oracle. CLOB can store value longer than 4000 characters.

Don’t forget to add –map-column-java option to Sqoop export command. As there is no COLB type in Java/Hive. Hive has to know how to treat this value. Full command is shown below.

sqoop export --connect jdbc:oracle:thin:@hostname:1521/sid \
   --username user --password passwd \ 
   --table Oracle_Table_Name \
   --columns col1,col2,col3 \
   --map-column-java col3=String \
   --export-dir 'location' \
   --input-fields-terminated-by '\001' --input-lines-terminated-by "\n" \
   --input-null-string '\\N' --input-null-non-string '\\N'

Sqoop is a tool used to move bulk data from an RDBMS Database like MySQL, Oracle, Postgres to HDFS (or AWS S3).

Couple of issues I faced with Sqoop export are summarized below.

Timestamp Format Issue

HDFS/S3 record has date and time format ‘2015-03-03T08:28:47.484Z’. This value goes to a column on Oracle transaction_datetime with a data type timestamp(9).

Got an error from Sqoop.

Caused by: java.lang.RuntimeException: Can't parse input data: '2015-03-03T08:28:47.484Z'
Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

I could not find a direct solution with Sqoop. So what I did is to insert the data into Hive table by converting the date and time to a format expected by Oracle.

INSERT OVERWRITE TABLE hiveTable
SELECT
 id,
 from_unixtime(unix_timestamp(regexp_replace(transaction_datetime,'[TZ]',' ')), 'yyyy-MM-dd HH:mm:ss'),
 .
 .
FROM sourceHiveTable;

Now the date stored on HDFS files is ‘2015-03-03 08:28:47’ that would work for Oracle.

Null Columns Issue

HDFS/Hive stores NULL values as ‘\N’. This would be exported to Oracle as is; and that does not look good on Oracle.

Adding some options to Sqoop utility would help recognize NULLs correctly. The full command is shown below.

sqoop export --connect jdbc:oracle:thin:@dbhost:1521/ORCL --username user --password password \
 --table Table_In_Oracle \
 --columns col1, col2 \
 --export-dir 'src-directory' \
 --input-fields-terminated-by ',' --input-lines-terminated-by "\n" \
 --input-null-string '\\N' --input-null-non-string '\\N'

After running that command, now I see a value of ‘(null)’ for that column on Oracle.

That solves the problem little, but I’m hoping to see an empty value (”) in Oracle. Trying to find a solution…

Kill a Running Sqoop Job

Sometimes I start a Sqoop export job but wanted to kill it in the middle for some reason.

I write the above command in a shell script and run using nohup command.

nohup sqoop_export.sh > /tmp/job.out 2>&1 &

To kill it I get the process id using ‘ps -aef | grep nohup’ and kill using ‘kill <processid>’.

The job is killed but the problem? If you try to truncate the table on Oracle later, you get error like below. I tried to find the PID of that process on Oracle, but could not find what process locked it up, and I had to drop the table to quickly solve the problem.

ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The best way to kill Hadoop job is using ‘hadoop job’ command.

$ hadoop job -list
Total jobs:1
JobId     State     StartTime     UserName       Queue   Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem   AM info
job_1452010418661_0082   RUNNING 1452198342153       userid     default     NORMAL               9               0   15360M       0M     15360M ...

$ hadoop job -kill job_1452010418661_008
Killed job job_1452010418661_0082

	Hadoop training in H… on Cloudera Hadoop Developer…
	putnam120 on S3 and Data Transfer issue fro…
	Test Bank Mathematic… on AWS Solutions Architect…
	yeskay on Some Hadoop Questions
	mathivanan on Some Hadoop Questions

My Big Data Blog

My notes on Hadoop, Cloud, and other BigData technologies

Tag Archives: Sqoop

Sqoop Export a Long Text Field

Issues with Sqoop Export to Oracle