Using awk and grep for sql generation

10-17-2016

Registered User

39, 1

Join Date: Jun 2008

Last Activity: 18 December 2018, 6:41 PM EST

Posts: 39

Thanks Given: 21

Thanked 1 Time in 1 Post

Using awk and grep for sql generation

Hi,

I have a file pk.txt which has table related data in following format

Code:

TableName | PK
Employee | id
Contact|name,phone,country

I have another file desc.txt which lists datatype of each field like this:

Code:

Table|Field|Type
Employee|id|int
Contact|name|string
Contact|country|string
Contact|phone|bigint

My Output should be

Code:

Employee | t1.id=s.id
Contact| TRIM(t1.name)=TRIM(s.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(s.country)

Note: TRIM should be applied only to string fields

I'm able to get the output without the TRIMs by using code below, not sure how to incorporate the TRIM logic.

Code:

awk -F "[ |]*" '
NR>1{
  printf $1 "|" $2 "|"
  flds=split($2, F, ",")
  for(i=0;i<flds;i++)
     printf "%s%s.%s=%s.%s", i?" AND ":"", S, F[i+1], D, F[i+1]
  printf "\n"
}' S=t1 D=t2  pk.txt

I'm guessing I have to grep each fld element and check if it is string, but Im unable to tie the whole output in one line.

Thanks

Moderator's Comments:

Please don't modify posts after people have answered referencing it, pulling the rug from under their feet!

Last edited by RudiC; 10-18-2016 at 05:24 AM.. Reason: Wrong code pasted

wahi80

View Public Profile for wahi80

Find all posts by wahi80

10-17-2016

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Not sure about the contents of the part-m-00000 file but this should do the trick:

Code:

awk -F "[ |]*" '
FNR==NR{T[$1"."$2]=$3; next}
FNR>1{
  printf $1 "|" $2 "|"
    flds=split($2, F, ",")
      for(i=0;i<flds;i++) {
         if(T[$1"."F[i+1]]=="string")
           fmt="%sTRIM(%s.%s)=TRIM(%s.%s)"
         else fmt="%s%s.%s=%s.%s"

         printf fmt,
             i?" AND ":"",
             S, F[i+1],
             D, F[i+1];
      }
      printf "\n"
}' S=t1 D=t2 desc.txt /mapr/datalake/optum/optuminsight/onepay_dev/fs_dev/raw/pklist/op/part-m-00000 pk.txt

These 3 Users Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

10-18-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Hi wahi80,
Assuming that /mapr/datalake/optum/optuminsight/onepay_dev/fs_dev/raw/pklist/op/part-m-00000 is an empty file, and desc.txt and pk.txt contain the sample input shown in post #1 in this thread, Chubler_XL's code produces the output:

Code:

Employee|id|t1.id=t2.id
Contact|name,phone,country|TRIM(t1.name)=TRIM(t2.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(t2.country)

Which differs from the output you said you wanted:

Code:

Employee | t1.id=s.id
Contact| TRIM(t1.name)=TRIM(s.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(s.country)

in the spots marked in red.

I have no idea whether you want two | separated fields in lines in your output (as shown in the output you said you wanted) or three (as created by the code you were using). And, I have no idea why you sometimes have s.field_name and sometimes have t2.field_name in your sample output. And, I have no idea why you seem to want inconsistent field separators in your input and output (sometimes |, sometimes | , and sometimes | ).

The following comes closer to providing the output you said you want, but:

it assumes that there should just be two input files (desc.txt and pk.txt),
it always uses just | as the output field separator,
it always produces two output fields (as shown in the output you said you wanted), and
it always uses the table name specified by the awk variable D (instead of sometimes using the string s).

Code:

awk -F "[ |]*" '
BEGIN {	fmt[0] = "%s%s.%s=%s.%s"
	fmt[1] = "%sTRIM(%s.%s)=TRIM(%s.%s)"
}
FNR == NR {
	if($3 == "string")
		trim[$1, $2]
	next
}
FNR > 1 {
	printf("%s|", $1)
	flds = split($2, F, ",")
	for(i = 1; i <= flds; i++)
		printf(fmt[(($1, F[i]) in trim)], (i > 1) ? " AND " : "", S,
		    F[i], D, F[i])
	print ""
}' S=t1 D=t2 desc.txt pk.txt

which produces the output:

Code:

Employee|t1.id=t2.id
Contact|TRIM(t1.name)=TRIM(t2.name) AND t1.phone=t2.phone AND TRIM(t1.country)=TRIM(t2.country)

when given the sample input you provided in post #1.

As always, if someone wants to try this code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

I hope this helps,
Don

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-18-2016

Registered User

39, 1

Join Date: Jun 2008

Last Activity: 18 December 2018, 6:41 PM EST

Posts: 39

Thanks Given: 21

Thanked 1 Time in 1 Post

Hi,
Chubler_XL's code works perfectly!
part-m-00000 was a mistake on my part while pasting the code, it can be removed.

I have a couple of questions regarding the working of the code, mainly how these two lines work

Code:

T[$1"."$2]=$3

Code:

T[$1"."F[i+1]]=="string"

FNR==NR would process desc.txt
FNR>1 would process pk.txt

So how is the interdependency b/n both work?

I thought I know a little bit of awk, but after looking at this code I need to revisit

Thanks

Last edited by wahi80; 10-18-2016 at 12:14 PM.. Reason: added info

wahi80

View Public Profile for wahi80

Find all posts by wahi80

10-18-2016

Registered User

258, 87

Join Date: Aug 2011

Last Activity: 7 December 2017, 3:56 PM EST

Posts: 258

Thanks Given: 48

Thanked 87 Times in 81 Posts

Hi,

Chubler_XL & Don solutions are perfect

Hope following details helps .

Code:

T[$1"."$2]=$3
T[$1"."F[i+1]]=="string"

Considering input sample in desc.txt , field seperator is set to F "[ |]*".

Code:

 T[$1"."$2]=$3

$3 will be empty "".

Here , i is assigned from for loop and F[0] inner array gets value from split function .
Note string is unset and it is empty .

Code:

T[$1"."F[i+1]]=="string"

Quote:

FNR==NR would process desc.txt
FNR>1 would process pk.txt

FNR - Current input record read in current file. When awk start processing a new input file, it resets FNR to zero.
NR - Number of current input record from beginning of the script started.
FNR==NR is equal to one ,only once ie when reading the first input file.
It is next command in first block {} which makes awk not to read the second pattern (FNR>1) and its statement block ({}).
Hence first input file ie desc.txt is completely processed (here 3 lines).

greet_sed

View Public Profile for greet_sed

Find all posts by greet_sed

10-18-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Code:

T[$1"."$2]=$3

This statement first concatenates the first and second field with a dot separator, then uses this string as an index for array T and assigns the third field's value to it. Reading all lines from "desc.txt":

Code:

Table|Field|Type
Employee|id|int
Contact|name|string
Contact|country|string
Contact|phone|bigint

would yield

Code:

T["Employee.id"] = "int"
T["Contact.name"] = "string"
T["Contact.country"] = "string"
T["Contact.phone"] = "bigint"

With the flds=split($2, F, ",") function and the "pk.txt2" file

Code:

TableName | PK
Employee | id
Contact|name,phone,country

,in the second line the T[$1"."F[i+1]] statement would loop across name,phone,country and reference

Code:

T["Contact.name"] = "string"
T["Contact.country"] = "string"
T["Contact.phone"] = "bigint"

to be able to select the output format.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-18-2016

Registered User

39, 1

Join Date: Jun 2008

Last Activity: 18 December 2018, 6:41 PM EST

Posts: 39

Thanks Given: 21

Thanked 1 Time in 1 Post

Pure Joy!!

Thoroughly schooled!

wahi80

View Public Profile for wahi80

Find all posts by wahi80

Shell Programming and Scripting

Using awk and grep for sql generation

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding The Complete SQL statement Using PDFGREP Or Grep

Discussion started by: metallica1973

2. Shell Programming and Scripting

awk concatenation issue - SQL generation

Discussion started by: chill3chee

3. UNIX for Dummies Questions & Answers

Grep SQL output file for greater than number.

Discussion started by: MurdocUK

4. Programming

[ask]SQL command act like sort and grep

Discussion started by: 14th

5. Shell Programming and Scripting

How to grep the where clause of a SQL?

Discussion started by: ustechie

6. Shell Programming and Scripting

Random word generation with AWK

Discussion started by: polsum

7. Shell Programming and Scripting

Report Generation with Grep

Discussion started by: bharath.gct

8. Shell Programming and Scripting

Dynamic command line generation with awk

Discussion started by: gbagagli

9. Shell Programming and Scripting

awk- report generation from input file

Discussion started by: McLan

10. UNIX for Dummies Questions & Answers

SQL Loader Auto Number Generation

Discussion started by: vinoth_kumar