I came across a scenario where in I had to parse a file that has many INSERT hql commands in a single file and I had to create a separate file for each INSERT command.
My input file insert_all.hql is as below.
INSERT INTO TABLE table1 SELECT col1, col2, col3 FROM table11 WHERE date = '2017-07-12' ; INSERT INTO TABLE table2 SELECT col1, col2, col3 FROM table22 WHERE date = '2017-07-12' ; INSERT INTO TABLE table3 SELECT col1, col2, col3 FROM table33 WHERE date = '2017-07-12'
You can do this manually by creating each file and copy the required code from original file to target file for each table. But it’s error-prone and tedious.
I wrote a little script using Linux/Unix tool called awk. awk is very good text processing tool that can be used for this kind of text parsing tasks.
awk ' > BEGIN {RS=";"} > { content=$0; > if(content ~ /INSERT INTO TABLE ([A-Za-z].*)/) filename=$4; > print content > filename; > close filename; > } >' insert_all.hql
Explanation
- In the BEGIN section, RS (Record separator) is initialized as semicolon. The default is newline, but this doesn’t work for us here.
- Next section, I am copying the content of that record to a variable.
- Next, I am extracting table name using REGEX. filename is table name
- Next, I am copying the content into the file
- This will generate files like table1, table2, etc.
You can run this code from a terminal on Mac or a Linux machine.