Using awk and grep for sql generation


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk and grep for sql generation
# 1  
Old 10-17-2016
Using awk and grep for sql generation

Hi,

I have a file pk.txt which has table related data in following format


Code:
TableName | PK
Employee | id
Contact|name,phone,country

I have another file desc.txt which lists datatype of each field like this:
Code:
Table|Field|Type
Employee|id|int
Contact|name|string
Contact|country|string
Contact|phone|bigint

My Output should be

Code:
Employee | t1.id=s.id
Contact| TRIM(t1.name)=TRIM(s.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(s.country)

Note: TRIM should be applied only to string fields

I'm able to get the output without the TRIMs by using code below, not sure how to incorporate the TRIM logic.

Code:
awk -F "[ |]*" '
NR>1{
  printf $1 "|" $2 "|"
  flds=split($2, F, ",")
  for(i=0;i<flds;i++)
     printf "%s%s.%s=%s.%s", i?" AND ":"", S, F[i+1], D, F[i+1]
  printf "\n"
}' S=t1 D=t2  pk.txt

I'm guessing I have to grep each fld element and check if it is string, but Im unable to tie the whole output in one line.

Thanks


Moderator's Comments:
Mod Comment Please don't modify posts after people have answered referencing it, pulling the rug from under their feet!

Last edited by RudiC; 10-18-2016 at 05:24 AM.. Reason: Wrong code pasted
# 2  
Old 10-17-2016
Not sure about the contents of the part-m-00000 file but this should do the trick:

Code:
awk -F "[ |]*" '
FNR==NR{T[$1"."$2]=$3; next}
FNR>1{
  printf $1 "|" $2 "|"
    flds=split($2, F, ",")
      for(i=0;i<flds;i++) {
         if(T[$1"."F[i+1]]=="string")
           fmt="%sTRIM(%s.%s)=TRIM(%s.%s)"
         else fmt="%s%s.%s=%s.%s"

         printf fmt,
             i?" AND ":"",
             S, F[i+1],
             D, F[i+1];
      }
      printf "\n"
}' S=t1 D=t2 desc.txt /mapr/datalake/optum/optuminsight/onepay_dev/fs_dev/raw/pklist/op/part-m-00000 pk.txt

These 3 Users Gave Thanks to Chubler_XL For This Post:
# 3  
Old 10-18-2016
Hi wahi80,
Assuming that /mapr/datalake/optum/optuminsight/onepay_dev/fs_dev/raw/pklist/op/part-m-00000 is an empty file, and desc.txt and pk.txt contain the sample input shown in post #1 in this thread, Chubler_XL's code produces the output:
Code:
Employee|id|t1.id=t2.id
Contact|name,phone,country|TRIM(t1.name)=TRIM(t2.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(t2.country)

Which differs from the output you said you wanted:
Code:
Employee | t1.id=s.id
Contact| TRIM(t1.name)=TRIM(s.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(s.country)

in the spots marked in red.

I have no idea whether you want two | separated fields in lines in your output (as shown in the output you said you wanted) or three (as created by the code you were using). And, I have no idea why you sometimes have s.field_name and sometimes have t2.field_name in your sample output. And, I have no idea why you seem to want inconsistent field separators in your input and output (sometimes |, sometimes | , and sometimes | ).

The following comes closer to providing the output you said you want, but:
  1. it assumes that there should just be two input files (desc.txt and pk.txt),
  2. it always uses just | as the output field separator,
  3. it always produces two output fields (as shown in the output you said you wanted), and
  4. it always uses the table name specified by the awk variable D (instead of sometimes using the string s).
Code:
awk -F "[ |]*" '
BEGIN {	fmt[0] = "%s%s.%s=%s.%s"
	fmt[1] = "%sTRIM(%s.%s)=TRIM(%s.%s)"
}
FNR == NR {
	if($3 == "string")
		trim[$1, $2]
	next
}
FNR > 1 {
	printf("%s|", $1)
	flds = split($2, F, ",")
	for(i = 1; i <= flds; i++)
		printf(fmt[(($1, F[i]) in trim)], (i > 1) ? " AND " : "", S,
		    F[i], D, F[i])
	print ""
}' S=t1 D=t2 desc.txt pk.txt

which produces the output:
Code:
Employee|t1.id=t2.id
Contact|TRIM(t1.name)=TRIM(t2.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(t2.country)

when given the sample input you provided in post #1.

As always, if someone wants to try this code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

I hope this helps,
Don
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 10-18-2016
Hi,
Chubler_XL's code works perfectly!
part-m-00000 was a mistake on my part while pasting the code, it can be removed.

I have a couple of questions regarding the working of the code, mainly how these two lines work
Code:
T[$1"."$2]=$3

Code:
T[$1"."F[i+1]]=="string"

FNR==NR would process desc.txt
FNR>1 would process pk.txt

So how is the interdependency b/n both work?

I thought I know a little bit of awk, but after looking at this code I need to revisit Smilie

Thanks

Last edited by wahi80; 10-18-2016 at 12:14 PM.. Reason: added info
# 5  
Old 10-18-2016
Hi,

Chubler_XL & Don solutions are perfect Smilie

Hope following details helps .

Code:
T[$1"."$2]=$3
T[$1"."F[i+1]]=="string"

Considering input sample in desc.txt , field seperator is set to F "[ |]*".
Code:
 T[$1"."$2]=$3

$3 will be empty "".

Here , i is assigned from for loop and F[0] inner array gets value from split function .
Note string is unset and it is empty .
Code:
T[$1"."F[i+1]]=="string"

Quote:
FNR==NR would process desc.txt
FNR>1 would process pk.txt
FNR - Current input record read in current file. When awk start processing a new input file, it resets FNR to zero.
NR - Number of current input record from beginning of the script started.
FNR==NR is equal to one ,only once ie when reading the first input file.
It is next command in first block {} which makes awk not to read the second pattern (FNR>1) and its statement block ({}).
Hence first input file ie desc.txt is completely processed (here 3 lines).
# 6  
Old 10-18-2016
Code:
T[$1"."$2]=$3

This statement first concatenates the first and second field with a dot separator, then uses this string as an index for array T and assigns the third field's value to it. Reading all lines from "desc.txt":
Code:
Table|Field|Type
Employee|id|int
Contact|name|string
Contact|country|string
Contact|phone|bigint

would yield
Code:
T["Employee.id"] = "int"
T["Contact.name"] = "string"
T["Contact.country"] = "string"
T["Contact.phone"] = "bigint"

With the flds=split($2, F, ",") function and the "pk.txt2" file
Code:
TableName | PK
Employee | id
Contact|name,phone,country

,in the second line the T[$1"."F[i+1]] statement would loop across name,phone,country and reference
Code:
T["Contact.name"] = "string"
T["Contact.country"] = "string"
T["Contact.phone"] = "bigint"

to be able to select the output format.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 10-18-2016
Pure Joy!! Smilie
Thoroughly schooled!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding The Complete SQL statement Using PDFGREP Or Grep

Linux Gods, I am simply attempting to parse SQL statements from a PDF doc in creating a base SQL script at a later time but for the life of me, am having a tough time extracting this data.This exact string worked perfectly a couple of months ago and now it doesnt. Below is an example of the data... (4 Replies)
Discussion started by: metallica1973
4 Replies

2. Shell Programming and Scripting

awk concatenation issue - SQL generation

Greetings Experts, I have an excel file and I am unable to read it directly into awk (contains , " etc); So, I cleansed and copied the data into notepad. I need to generate a script that generates the SQL. Requirement: 1. Filter and select only the data that has the "mapping" as "direct"... (4 Replies)
Discussion started by: chill3chee
4 Replies

3. UNIX for Dummies Questions & Answers

Grep SQL output file for greater than number.

Hi, This is my first post. I have a korn shell script which outputs a select statment to a file. There is only one column and one row which contains a record count of the select statement. The select statement looks something like this: SELECT COUNT(some_field) AS "count_value" ... (2 Replies)
Discussion started by: MurdocUK
2 Replies

4. Programming

[ask]SQL command act like sort and grep

for example, I have a text file in random content inside, maybe something like this. 234234 54654 123134 467456 24234234 7867867 23424 568567if I run this command cat "filename.txt" | sort -n | grep "^467456$" -A 1 -B 1the result is 234234 467456 568567is it possible to do this command... (2 Replies)
Discussion started by: 14th
2 Replies

5. Shell Programming and Scripting

How to grep the where clause of a SQL?

Hi UNIX Gurus, I want to use extract the where clause of a SQL present in a file. Please suggest me how can I do it. Select * from emp where emp_id>10; cat <file_name> | grep -i "where" returns whole SQL. how can I extract only "where emp_id>10;" Thanks in advance (4 Replies)
Discussion started by: ustechie
4 Replies

6. Shell Programming and Scripting

Random word generation with AWK

Hi - I have a word GTTCAGAGTTCTACAGTCCGACGAT I need to extract all the possible "chunks" of 7 or above letter "words" from this. SO, my out put should be GTTCAGA TTCAGAG TCAGAGT CAGAGTTCT TCCGACGAT CAGTCCGACG etc. How can I do that with awk or any other language? I have no... (2 Replies)
Discussion started by: polsum
2 Replies

7. Shell Programming and Scripting

Report Generation with Grep

All, I am pretty new to Unix Environment. I am not sure if my requirement can be accomplished in Unix. I did try searching this forum and others but could not get an answer. Requirement is explained below: I have a set of files in a folder. file1_unload file2_unload file3_unload... (7 Replies)
Discussion started by: bharath.gct
7 Replies

8. Shell Programming and Scripting

Dynamic command line generation with awk

Hi, I'm not an expert in awk but i need a simple script to do this: I'd like to AutoCrop PDF files. I found 2 simple script that combined together could help me to automatize :) The first utiliti is "pdfinfo" that it gives the MediaBox and TrimBox values from the pdf. The pdfinfo output... (8 Replies)
Discussion started by: gbagagli
8 Replies

9. Shell Programming and Scripting

awk- report generation from input file

I have input file with below content: Person: Name: Firstname1 lastname1 Address: 111, Straat City : Hilversum Person: Name : Fistname2 lastname2 Address: 222, street Cit: Bussum Person: Name : Firstname2 lastname3 Address: 333, station straat City: Amsterdam I need... (6 Replies)
Discussion started by: McLan
6 Replies

10. UNIX for Dummies Questions & Answers

SQL Loader Auto Number Generation

Hi all, I have a doubt in SQL Loader. We have SEQUENCE function in SQL Loader or can create Sequence in Oracle database for generating a number sequence for a column while loading data using SQL Loader into table or multiple tables. My requirment is this. For the first run in SQL... (2 Replies)
Discussion started by: vinoth_kumar
2 Replies
Login or Register to Ask a Question