Issues with Sqoop Export to Oracle

Sqoop is a tool used to move bulk data from an RDBMS Database like MySQL, Oracle, Postgres to HDFS (or AWS S3).

Couple of issues I faced with Sqoop export are summarized below.

Timestamp Format Issue

HDFS/S3 record has date and time format ‘2015-03-03T08:28:47.484Z’. This value goes to a column on Oracle transaction_datetime with a data type timestamp(9).

Got an error from Sqoop.

Caused by: java.lang.RuntimeException: Can't parse input data: '2015-03-03T08:28:47.484Z'
Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

I could not find a direct solution with Sqoop. So what I did is to insert the data into Hive table by converting the date and time to a format expected by Oracle.

INSERT OVERWRITE TABLE hiveTable
SELECT
 id,
 from_unixtime(unix_timestamp(regexp_replace(transaction_datetime,'[TZ]',' ')), 'yyyy-MM-dd HH:mm:ss'),
 .
 .
FROM sourceHiveTable;

Now the date stored on HDFS files is ‘2015-03-03 08:28:47’ that would work for Oracle.

Null Columns Issue

HDFS/Hive stores NULL values as ‘\N’. This would be exported to Oracle as is; and that does not look good on Oracle.

Adding some options to Sqoop utility would help recognize NULLs correctly. The full command is shown below.

sqoop export --connect jdbc:oracle:thin:@dbhost:1521/ORCL --username user --password password \
 --table Table_In_Oracle \
 --columns col1, col2 \
 --export-dir 'src-directory' \
 --input-fields-terminated-by ',' --input-lines-terminated-by "\n" \
 --input-null-string '\\N' --input-null-non-string '\\N'

After running that command, now I see a value of ‘(null)’ for that column on Oracle.

That solves the problem little, but I’m hoping to see an empty value (”) in Oracle. Trying to find a solution…

Kill a Running Sqoop Job

Sometimes I start a Sqoop export job but wanted to kill it in the middle for some reason.

I write the above command in a shell script and run using nohup command.

nohup sqoop_export.sh > /tmp/job.out 2>&1 &

To kill it I get the process id using ‘ps -aef | grep nohup’ and kill using ‘kill <processid>’.

The job is killed but the problem? If you try to truncate the table on Oracle later, you get error like below. I tried to find the PID of that process on Oracle, but could not find what process locked it up, and I had to drop the table to quickly solve the problem.

ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

The best way to kill Hadoop job is using ‘hadoop job’ command.

$ hadoop job -list
Total jobs:1
JobId     State     StartTime     UserName       Queue   Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem   AM info
job_1452010418661_0082   RUNNING 1452198342153       userid     default     NORMAL               9               0   15360M       0M     15360M ...

$ hadoop job -kill job_1452010418661_008
Killed job job_1452010418661_0082

	Hadoop training in H… on Cloudera Hadoop Developer…
	putnam120 on S3 and Data Transfer issue fro…
	Test Bank Mathematic… on AWS Solutions Architect…
	yeskay on Some Hadoop Questions
	mathivanan on Some Hadoop Questions

My Big Data Blog

My notes on Hadoop, Cloud, and other BigData technologies

Issues with Sqoop Export to Oracle

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply