Text Processing using AWK

Posted in Linux by yeskay

I came across a scenario where in I had to parse a file that has many INSERT hql commands in a single file and I had to create a separate file for each INSERT command.

My input file insert_all.hql is as below.

INSERT INTO TABLE table1
SELECT
 col1,
 col2,
 col3
FROM table11
WHERE date = '2017-07-12'
;

INSERT INTO TABLE table2
SELECT
 col1,
 col2,
 col3
FROM table22
WHERE date = '2017-07-12'
;

INSERT INTO TABLE table3
SELECT
 col1,
 col2,
 col3
FROM table33
WHERE date = '2017-07-12'

You can do this manually by creating each file and copy the required code from original file to target file for each table. But it’s error-prone and tedious.

I wrote a little script using Linux/Unix tool called awk. awk is very good text processing tool that can be used for this kind of text parsing tasks.

awk '
> BEGIN {RS=";"}
> { content=$0;
> if(content ~ /INSERT INTO TABLE ([A-Za-z].*)/) filename=$4;
> print content > filename;
> close filename;
> }
>' insert_all.hql

Explanation

In the BEGIN section, RS (Record separator) is initialized as semicolon. The default is newline, but this doesn’t work for us here.
Next section, I am copying the content of that record to a variable.
Next, I am extracting table name using REGEX. filename is table name
Next, I am copying the content into the file
This will generate files like table1, table2, etc.

You can run this code from a terminal on Mac or a Linux machine.

Leave a comment Cancel reply