awk - Multi-line data to be stored in variable


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - Multi-line data to be stored in variable
# 1  
Old 04-18-2016
awk - Multi-line data to be stored in variable

Greetings Experts,

As part of automating the sql generation, I have the source table name, target table name, join condition stored in a file join_conditions.txt which is a delimited file (I can edit the file if for any reason). The reason I needed to store is I have built SELECT list without FROM clause according to source and target tables taken from another file and stored it in an array and now I need to build the complete sql query of select list and the from list;

Code:
Src_tbl       Tgt_tbl     join condition
Src_1        @Tgt_1       @FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN SRC_TEMP_1 ON
                          SRC_1.COL1=SRC_TEMP_1.COL1) SRC JOIN (SELECT TGT_1.* FROM TGT_1 
                          INNER JOIN TGT_TEMP_1 ON TGT_1.COL1=TGT_TEMP_1.COL1) TGT ON 
                          SRC.COL1=TGT.COL1;
Src_2        @Tgt_2        ...................

Please note that I have aligned the 3rd column LPAD with spaces for better view; the provided join condition column can span multiple lines;
Similarly there are corresponding join conditions for other source and target tables in the file which is not mentioned here.

Code:
awk -F "@" 'BEGIN{RS=";"}
{
join_cond=$3
a[$1 OFS $2]=join_cond
}
END {
for (i in a) {
print a[i] 
}
}' < join_conditions.txt > sql_query.txt

Moderator's Comments:
Mod Comment Please use code tags as required by forum rules!


Expected assignment:
Code:
FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN SRC_TEMP_1 ON
                          SRC_1.COL1=SRC_TEMP_1.COL1) SRC JOIN (SELECT TGT_1.* FROM TGT_1 
                          INNER JOIN TGT_TEMP_1 ON TGT_1.COL1=TGT_TEMP_1.COL1) TGT ON 
                          SRC.COL1=TGT.COL1;

Current assignment:
Code:
FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN SRC_TEMP_1 ON

Even though I specified the RS=; I think it is still considering the \n as record separator; How can I assign the multi-line content present in
the join_conditions.txt to awk variable jon_cond for frame the sql query; I am struck up with this as I have no clue on how to proceed;

I can break the file manually to contain the join_condition with source and target repeating multiple times as

Code:
Src_tbl1   Tgt_tbl1     join condition
Src_1      Tgt_1        FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN
Src_1      Tgt_1        SRC_TEMP_1 ON SRC_1.COL1=SRC_TEMP_1.COL1) SRC
Src_1      Tgt_1        JOIN (SELECT TGT_1.* FROM TGT_1 INNER JOIN TGT_TEMP_1 ON 
Src_1      Tgt_1        TGT_1.COL1=TGT_TEMP_1.COL1) TGT ON SRC.COL1=TGT.COL1;

and then carry out the required operation. But I assume that there is more elegant approach

Many thanks for your time..

---------- Post updated at 11:27 AM ---------- Previous update was at 10:52 AM ----------

Hi RavinderSingh,

I have a input delimited file whose layout is

Code:
Src_tbl@Tgt_tbl@join_condition
Src_1@Tgt_1@FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN SRC_TEMP_1 ON  SRC_1.COL1=SRC_TEMP_1.COL1) SRC JOIN (SELECT TGT_1.* FROM TGT_1  INNER JOIN TGT_TEMP_1 ON TGT_1.COL1=TGT_TEMP_1.COL1) TGT ON  SRC.COL1=TGT.COL1;

The data in join_condition can span multiple lines here.

I need to get the data in the join_condition column and assign it to an array for further operations;

Code:
awk -F "@" ' BEGIN{RS=";"}
{
a[$1 OFS $2]=$3
print a[$1 OFS $2]
}'

Expected Output:
Code:
FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN SRC_TEMP_1 ON  SRC_1.COL1=SRC_TEMP_1.COL1) SRC JOIN (SELECT TGT_1.* FROM TGT_1  INNER JOIN TGT_TEMP_1 ON TGT_1.COL1=TGT_TEMP_1.COL1) TGT ON  SRC.COL1=TGT.COL1;

Output what I am seeing:
Code:
FROM ( SELECT SRC_1.* FROM SRC_1 INNER JOIN

How to get the expected output here please.

Last edited by RudiC; 04-19-2016 at 01:48 AM.. Reason: Added tags around CODE (while it was around data...)
# 2  
Old 04-18-2016
Interesting, nawk variants treat an FS="@" in a way that line feeds are counted aw well.
Work-around:
Code:
awk -F"[@]" 'BEGIN {RS=";"} {print $3}' < join_conditions.txt

or
Code:
awk 'BEGIN {FS="\@"; RS=";"} {print $3}' < join_conditions.txt

This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 04-18-2016
Thank You MadeInGermany; The first solution did work; however the second solution did produce the same output as my post (not able to recognize ; as RS) Can you please explain how
Code:
[@]

different is to that of a simple ones;
Code:
awk -F "@"....

Code:
awk -F "[@]"...

I am not able to understand the difference as I am learning awk Smilie

Edit:
Does [] ignore the conventional "\n" as RS unless specified other than this in all cases and does awk -F "@" considers the RS as "\n" when the specified RS is not specified or found in the entire line..

Last edited by chill3chee; 04-18-2016 at 02:08 PM..
# 4  
Old 04-18-2016
You are right, some nawk derivates require
Code:
awk 'BEGIN {FS="[@]"; RS=";"} {print $3}' join_conditions.txt

The RS=";" works multi-line, but the FS="@" treats line feeds as a field delimiter.
You see it with
Code:
awk 'BEGIN {FS="@"; RS=";"} {print NR,NF}' join_conditions.txt
awk 'BEGIN {FS="[@]"; RS=";"} {print NR,NF}' join_conditions.txt

(record number, number of fields)
Why is this? I am surprised myself...maybe a bug...there should be no difference between FS="@" and FS="[@]"
This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 04-20-2016
Hi MadeinGermany,
The NR is same for both the FS="@"and FS="[@]" However the NF differs in the above cases; as you specified for FS="@"it is treating line feeds as a delimiter. Thanks again
# 6  
Old 04-20-2016
Interesting. I had a look at this and it appears this behavior is with any character, not just "@", but also for "*", "a" or "b".

I only saw this with AIX and HPUX awk..

A possible explanation for the difference between [@] and @ may be that in the POSIX specification when a single character is used as FS, then this is regarded as a string and not as a regular expression. When [c] is used (where c is a character, for example @) then this is a regular expression.

In these two awks, it appears that, if an RS other than\n is used, and a single character is used as FS then a newline is still seen as a field separator, whereas this is not the case if FS is a regular expression.

I could not find anything in the POSIX specification that describes this behavior, so it is not as it should be. It does remind me a bit of how newlines are field separators irrespective of FS value when RS="" is used, so maybe it is a remnant behavior of some sort.

Last edited by Scrutinizer; 04-20-2016 at 07:29 PM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to lookup stored variable in file and print matching line

The bash bash below extracts the oldest folder from a directory and stores it in filename That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that (once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk - Multiple files - 1 file with multi-line data

Greetings experts, Have 2 input files, of which 1 file has 1 record per line; in 2nd file, multiple lines constitute 1 record; Hence declared the RS=";" Now in the first file which ends with ";" at each line of the line; But \nis also being considered as part of the data due to which I am... (1 Reply)
Discussion started by: chill3chee
1 Replies

3. Shell Programming and Scripting

Multi-line filtering based on multi-line pattern in a file

I have a file with data records separated by multiple equals signs, as below. ========== RECORD 1 ========== RECORD 2 DATA LINE ========== RECORD 3 ========== RECORD 4 DATA LINE ========== RECORD 5 DATA LINE ========== I need to filter out all data from this file where the... (2 Replies)
Discussion started by: Finja
2 Replies

4. UNIX for Dummies Questions & Answers

How Do I Create A Multi Line Menu Variable?

I want something that would show up basically like: Menu ----- 1) Option 1 2) Option 2 3) Option 3 Pick one: I tried menu = " Menu \r\n ----- \r\n 1)Option 1 \r\n..............etc etc etc" but that didnt work (just got the whole menu one one line, with the... (2 Replies)
Discussion started by: SoVi3t
2 Replies

5. Shell Programming and Scripting

Multi line variable script... needs help.

I am trying to write a script that will help me put a file into excel with little manipulation. Below is a sample of the file im using. Group1:*:gid1:user,user Group2:*:gid2:user,user Group3:*:gid3:user,user,user,user,user,user,user Group4:*:gid4:user,user I marked in red the part that is... (1 Reply)
Discussion started by: rookieuxixsa
1 Replies

6. Shell Programming and Scripting

AWK: pattern not properly stored in variable?

Hey there, I have a table of contents file of the form 1 Title1 1.1 Subtitle1 1.1.1 Subsubtitle1 1.1.2 Subsubtitle2 ... and want to count the number of dots in the first field to find out the level of the section. I use the gsub function for the job, which works if I pass the pattern... (2 Replies)
Discussion started by: herrsimon
2 Replies

7. Shell Programming and Scripting

Manipulating Pick multi dimensional data with awk.

Hi. I am reasonably new to awk, but have done quite a lot of unix scripting in the past. I have resolved the issues below with unix scripting but it runs like a dog. Moved to awk for speed and functionality but running up a big learning curve in a hurry, so hope there is some help here. I... (6 Replies)
Discussion started by: mike.strategis
6 Replies

8. Shell Programming and Scripting

how to get the string stored in a variable in a line???

Hi all, I want to search for a data type in a line.For this in a loop i am checking for $DATA_TYPE in a line using grep.But grep is not able to find when i give this. Can any one tell me how to check string in $DATA_TYPE variable in line usign grep (or) any other way to do the above task. ... (4 Replies)
Discussion started by: jisha
4 Replies

9. Shell Programming and Scripting

AWK Multi-Line Records Numbering Problem

I have a set of files of multi-line records with the records separated by a blank line. I needed to add a record number to the front of each line followed by a colon and did the following: awk 'BEGIN {FS = "\n"; RS = ""}{for (i=1; i<=NF; i++)print NR,":",$i}' ~/Desktop/data98-1-25.txt >... (3 Replies)
Discussion started by: RacerX
3 Replies

10. Shell Programming and Scripting

AWK Multi-Line Records Processing

I am an Awk newbie and cannot wrap my brain around my problem: Given multi-line records of varying lengths separated by a blank line I need to skip the first two lines of every record and extract every-other line in each record unless the first line of the record has the word "(CONT)" in the... (10 Replies)
Discussion started by: RacerX
10 Replies
Login or Register to Ask a Question