Joining 3 AWK scripts to avoid use "temp" files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Joining 3 AWK scripts to avoid use "temp" files
# 1  
Old 04-28-2010
Joining 3 AWK scripts to avoid use "temp" files

Hi everyone,

Looking for a suggestion to improve the below script in which I´ve been working.

The thing is I have 3 separated AWK scripts that I need to apply over the inputfile, and for scripts (2) and (3) I have to use a "temp" file as their inputfile (inputfile_temp and inputfile_temp1 respectively).

I would like to join this 3 different scripts in a unique AWK whith "inputfile" as unique source file, without using temp files.

inputfile is as follow ($7 is empty):

Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,

The script that works (Let say Divided_Script) is as follow (with each routine a little bit explained):
Code:
### 1-) Filter to search "pattern1" and "pattern2" within any column in inputfile ###
awk  'BEGIN{FS=OFS=","} /HEADER/||/pattern1/||/pattern2.*pattern1/||/pattern1.*pattern2/' inputfile > inputfile_temp

### 2-) 2nd filter to exclude lines containing "pattern4", "pattern5" and "pattern6" in column 2 ####
awk  'BEGIN{FS=OFS=","} $2 !~ /pattern4|pattern5|pattern6/' inputfile_temp > inputfile_temp1

### 3-) Make column 7 = Column 2 and after that renaming column 7 header with "NEW_HEADER" ###
awk  'BEGIN{FS=OFS=","} {$7=$2} NR==1{$7="NEW_HEADER"} 

### 3.1-) Deleting the string "/Sub data1/Sub data2" for every line in column 7, which now has the same data of $2 ###
{sub(/\/.*/,"",$7)}

### 3.2-) Printing the final output in a new desired order ###
{print $1,$7,$3,$4,$5,$6,$2}' inputfile_temp1 > outputfile

The Desired and Correct Output using "inputfile" and "Divided_Script" is:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2

But when I try to join the routines in a unique AWK script invoking only once AWK command with the next script (Let say Unified_Script):
Code:
*Basically removing "awk 'BEGIN{FS=OFS=","}" from routines (2) and (3).

### 1-) Filter to search "pattern1" and "pattern2" within any column in inputfile ###
awk  'BEGIN{FS=OFS=","} {/HEADER/||/pattern1/||/pattern2.*pattern1/||/pattern1.*pattern2/}

### 2-) 2nd filter to exclude lines containing "pattern4", "pattern5" and "pattern6" in column 2 ####
$2 !~ /pattern4|pattern5|pattern6/

### 3-) Make column 7 = Column 2 and after that renaming column 7 header with "NEW_HEADER" ###
{$7=$2} NR==1{$7="NEW_HEADER"} 

### 4-) Deleting the string "/Sub data1/Sub data2" for every line in column 2 ###
{sub(/\/.*/,"",$7)}

### 5-) Printing the final output in a new desired order ###
{print $1,$7,$3,$4,$5,$6,$2}' inputfile > outputfile

Then the resulting output using "inputfile" and "Unified_Script" is wrong, and it seems that prints the original file merged with the lines processed by routines 3, 4 and 5 in "Unified_Script", but the without the filter that should apply routines 1 and 2 because appear lines that don´t contain pattern1 or pattern2.

Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2

I hope somebody could help me to join this 3 scripts to work as I´ve explained.

Thanks in advance for any suggestion.
# 2  
Old 04-29-2010
Code:
awk  '
 BEGIN { FS = OFS = "," }

 NR == 1 { $7="NEW_HEADER" }

 $2 == /pattern4|pattern5|pattern6/ { next }

 {
   $7 = $2
   sub(/\/.*/,"",$7)
 }

 /HEADER/ || ( /pattern1/ && /pattern2/ ) {
   print $1, $7, $3, $4, $5, $6, $2
 }
' inputfile > outputfile

# 3  
Old 04-29-2010
Hi cfajonshon,

Thanks for your help, I think I´m close to get the script to work as I want.

I only changed from your script
"NR == 1 { $7="NEW_HEADER" }" from line 3 to line 8 (without count blank lines), and I put it after "$7 = $2 sub(/\/.*/,"",$7)"

The output is almost correct, but I want to exclude pattern4, pattern5 and pattern6 from column 2.
Code:
The output so far is:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub  data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub  data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub  data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub  data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub  data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub  data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub  data1/Sub data2

The question would be:
What it mean the sentence
Code:
 " $2 == /pattern4|pattern5|pattern6/ { next }"

I was trying to exclude those patterns from column 2 using
Code:
$2 !~ /pattern4|pattern5|pattern6/

but if I replace "$2==" with "$2!~" doesn´t work.

The code updated is:
Code:
awk  '
 BEGIN { FS = OFS = "," }

 $2 == /pattern4|pattern5|pattern6/ { next }

 {
   $7 = $2
   sub(/\/.*/,"",$7)
 }
 
 NR == 1 { $7="NEW_HEADER" }
 
 /HEADER/ || ( /pattern1/ && /pattern2/ ) {
   print $1, $7, $3, $4, $5, $6, $2
 }
' inputfile > outputfile

How would be the way to add the sentence to exclude pattern4, 5 and 6 to the above script?

Thanks in advance
.

---------- Post updated at 02:27 PM ---------- Previous update was at 01:47 AM ----------

Hi guys,

Some help with 2 questions:


I have the same input file:
Code:
HEADER_1,HEADER_2,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_7
pattern2,pattern7/Sub data1/Sub data2,pattern8,pattern9,pattern2,pattern2,
pattern3,pattern6/Sub data1/Sub data2,pattern7,pattern3,pattern5,pattern1,
pattern5,pattern9/Sub data1/Sub data2,pattern4,pattern8,pattern4,pattern1,
pattern1,pattern5/Sub data1/Sub data2,pattern5,pattern2,pattern5,pattern2,
pattern6,pattern7/Sub data1/Sub data2,pattern1,pattern6,pattern2,pattern3,
pattern3,pattern8/Sub data1/Sub data2,pattern9,pattern7,pattern5,pattern8,
pattern8,pattern2/Sub data1/Sub data2,pattern3,pattern2,pattern8,pattern1,
pattern2,pattern4/Sub data1/Sub data2,pattern9,pattern1,pattern9,pattern9,


I have this script:

Code:
awk  'BEGIN { FS = OFS = "," } #-1
 
{ $7 = $2; sub(/\/.*/,"",$7) } #-2 (to delete "/Sub data1/Sub data2" from $7)
 
 NR == 1 { $7="NEW_HEADER" } #-3
 
 {/HEADER/||/pattern1/||/pattern2/ } #-4
 
 {$2 !~ /pattern4|pattern5|pattern6/ } #-5
 
 {print $1, $7, $3, $4, $5, $6, $2}' inputfile #-6

And I get the following output:
Code:
HEADER_1,NEW_HEADER,HEADER_3,HEADER_4,HEADER_5,HEADER_6,HEADER_2
pattern2,pattern7,pattern8,pattern9,pattern2,pattern2,pattern7/Sub data1/Sub data2
pattern3,pattern6,pattern7,pattern3,pattern5,pattern1,pattern6/Sub data1/Sub data2
pattern5,pattern9,pattern4,pattern8,pattern4,pattern1,pattern9/Sub data1/Sub data2
pattern1,pattern5,pattern5,pattern2,pattern5,pattern2,pattern5/Sub data1/Sub data2
pattern6,pattern7,pattern1,pattern6,pattern2,pattern3,pattern7/Sub data1/Sub data2
pattern3,pattern8,pattern9,pattern7,pattern5,pattern8,pattern8/Sub data1/Sub data2
pattern8,pattern2,pattern3,pattern2,pattern8,pattern1,pattern2/Sub data1/Sub data2
pattern2,pattern4,pattern9,pattern1,pattern9,pattern9,pattern4/Sub data1/Sub data2

Question 1:
In line 5 of the script, that is " {$2 !~ /pattern4|pattern5|pattern6/ }" oriented to delete lines containing pattern4, pattern5 and pattern6 from column 2 it seems not to be working.

What I´m doing wrong in this case?


Question 2:
If I see the output, the line highlighted in red, is present and should not appear, because this line does not contain nor HEADER nor pattern1 nor pattern2.

Why does this happen?

Maybe somebody could help me with this.

Thanks in advance.

---------- Post updated at 07:27 PM ---------- Previous update was at 02:27 PM ----------

Hi, I fix the problem, adding an if statement as shown below.
Code:
awk  '
 BEGIN { FS = OFS = "," }
 
 { $7 = $2; sub(/\/.*/,"",$7) }
 
 NR == 1 { $7="NEW_HEADER" }
 
/HEADER/||/pattern1/||/pattern2/ {
if ($2 !~ /pattern4|pattern5|pattern6/)

 print $1, $7, $3, $4, $5, $6, $2}' inputfile


Many thanks for your help!

Regards,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

2. Shell Programming and Scripting

How to avoid the "temp files" in my script?

Hi ! :) I noticed that I create often of temporary files to keep character strings or other and I wish to avoid that if possible ? e.g : #!/bin/bash CONFIG_FILE="conf.cfg" TEMP_HOSTNAME="temp_file1.txt" for IP in `egrep -o '({1,3}\.){3}{1,3}' $CONFIG_FILE` do ssh "$IP"... (2 Replies)
Discussion started by: Arnaudh78
2 Replies

3. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

4. Solaris

How to avoid "cannot execute" issue with a runnable jar file?

I am trying to understand why the runnable jar file runs on one Unix server, but not the other with same environment settings. I copied exact same test jar from here --> http://www.javaworld.com/javaworld/javatips/javatip127/MakeJarRunnable.zip on to both Unix servers. Then changed... (5 Replies)
Discussion started by: kchinnam
5 Replies

5. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

6. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

7. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

8. AIX

Probably an easy SMIT question- "Unable to open temp file"

Hi All, Can't find any documentation on the web for this anywhere, except about three web pages that are in Chinese. When I enter SMIT on this box, I get ERROR MESSAGE: Unable to open temp file I suspected smit.log, but it is universal readable, writeable by root, and I am root.... (6 Replies)
Discussion started by: jeffpas
6 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies

10. Shell Programming and Scripting

split the string "Setview: arv-temp-view"

Hi experts, I would like to split the value of v and assign "arv-temp-view" to another variable d. I want to do it within shell script. My shell is "tcsh" v="Setview: arv-temp-view" split v and store d=arv-temp-view Please help Thanks Amit (2 Replies)
Discussion started by: amitrajvarma
2 Replies
Login or Register to Ask a Question