awk to get multiple strings in one variable


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to get multiple strings in one variable
# 1  
Old 01-24-2018
Oracle awk to get multiple strings in one variable

I am processing a file using awk to get few input variables which I'll use later in my script. I am learning to script using awk so please advise in any mistakes I made in my code. File sample is as follows
Code:
# cat junk1.jnk
  Folder1                    : test_file     (File)
                                test1_file    (File)
                                test2_file    (File)
   Lines (9):
    00140  Li                      CHAR                         188
    00141  Li                      CHAR                         188
    00142  Li                      CHAR                         188
    00143  Li                      CHAR                         188
    00144  Li                      CHAR                         188
    00145  Li                      CHAR                         375
    00146  Li                      CHAR                         375
    00147  Li                      CHAR                         375

I am trying to extract comma separated list of file names identified by last field in braces (File) followed by Number of Lines which is (9) and comma separated list of uniq CHAR - last field of the line starting with HEX values after string "Lines (9):". I am using following code. I get the file names and Line number but unable to get the comma separated list of uniq CHAR. In this case it should be 188,375.

Code:
cat junk1.jnk | awk 'BEGIN { printf ("%-23s %-4s %-5s\n", "File Names"," Lines", "CHARS")
printf ("%-23s %-4s %-5s\n", "--------------"," ----"," ------")}
{
if ($0 ~ /Folder1/){
FLAG=1
}

if (FLAG == 1) {
if (($0 ~/Folder/) || ($0 ~ /^[ \t]+|[ \t]+\(File\)$/) || ($0 ~ /Lines/) || ($1 ~ /^[0-9A-Fa-f]{5}+$/)) {
split ($0,VAL,FS)

if ($NF ~ /\(File\)/) {
CSG=$(NF-1);printf ("%s,", CSG)
}
if ($0 ~ /Lines/) {
## split ($0,VAL,FS)
        LN=VAL[2]
        LNN=(substr( LN,2,length(LN)-2))
}

if ($1 ~ /^[0-9A-Fa-f]{5}+$/) {
## split ($0,VAL,FS)
        CHR=VAL[NF]
        }
      }
   }
}
END {printf ("%s %s %s\n", CSG, (substr(LNN, 1, length(LNN)-1)), CHR)}'

My Current O/P is as follows. As you can see the only value I get for CHAR is last one - 375. Also if you can help me understand why am I getting file name test2_file,test2_file twice.
Code:
File Names               Lines CHARS
--------------           ----  ------
test_file,test1_file,test2_file,test2_file 9 375

I am expecting following o/p
Code:
File Names               Lines CHARS
--------------              ----  ------
test_file,test1_file,test2_file  9 188,375

As usual you guys are rock stars and would appreciate your help.
# 2  
Old 01-24-2018
Code:
awk 'BEGIN {
   printf ("%-40s %-5s %-15s\n", "File Names","Lines", "CHARS")
   printf ("%-40s %-5s %-15s\n", "--------------","-----","------")
}

$NF ~ /\(File\)/ {
   CSG=CSG $(NF-1) ","
}

$0 ~ /Lines/ {
   gsub("[^0-9]", "")
   LNN=$1
}

$1 ~ /^[0-9A-Fa-f]+$/ && length($1)==5 {
   if (! c[$NF]) CHR=CHR $NF ","
   c[$NF]=$NF
}

END {
   sub(",*$", "", CSG)
   sub(",*$", "", CHR)
   printf ("%-40s %-5s %-15s\n", CSG, LNN, CHR)
}' junk1.jnk

These 2 Users Gave Thanks to rdrtx1 For This Post:
# 3  
Old 01-24-2018
Hi rdrtx1...this is superb!

Can you educate me little bit about following lines.

Code:
gsub("[^0-9]", "")

Code:
if (! c[$NF]) CHR=CHR $NF ","
   c[$NF]=$NF

Thank you! for your help
# 4  
Old 01-24-2018
gsub("[^0-9]", "") # eliminate all non-digits

if (! c[$NF]) CHR=CHR $NF ","
c[$NF]=$NF
# if last field was not stored in c array then add to CHR string (eliminate duplicates)

Better yet, use if (! ($NF in c)) CHR=CHR $NF "," just in case $NF values include zero.

Last edited by rdrtx1; 01-24-2018 at 05:00 PM..
This User Gave Thanks to rdrtx1 For This Post:
# 5  
Old 01-24-2018
Quote:
Originally Posted by shunya
I am processing a file using awk to get few input variables which I'll use later in my script. I am learning to script using awk so please advise in any mistakes I made in my code. File sample is as follows
Code:
# cat junk1.jnk
  Folder1                    : test_file     (File)
                                test1_file    (File)
                                test2_file    (File)
   Lines (9):
    00140  Li                      CHAR                         188
    00141  Li                      CHAR                         188
    00142  Li                      CHAR                         188
    00143  Li                      CHAR                         188
    00144  Li                      CHAR                         188
    00145  Li                      CHAR                         375
    00146  Li                      CHAR                         375
    00147  Li                      CHAR                         375

I am trying to extract comma separated list of file names identified by last field in braces (File) followed by Number of Lines which is (9) and comma separated list of uniq CHAR - last field of the line starting with HEX values after string "Lines (9):". I am using following code. I get the file names and Line number but unable to get the comma separated list of uniq CHAR. In this case it should be 188,375.

Code:
cat junk1.jnk | awk 'BEGIN { printf ("%-23s %-4s %-5s\n", "File Names"," Lines", "CHARS")
printf ("%-23s %-4s %-5s\n", "--------------"," ----"," ------")}
{
if ($0 ~ /Folder1/){
FLAG=1
}

if (FLAG == 1) {
if (($0 ~/Folder/) || ($0 ~ /^[ \t]+|[ \t]+\(File\)$/) || ($0 ~ /Lines/) || ($1 ~ /^[0-9A-Fa-f]{5}+$/)) {
split ($0,VAL,FS)

if ($NF ~ /\(File\)/) {
CSG=$(NF-1);printf ("%s,", CSG)
}
if ($0 ~ /Lines/) {
## split ($0,VAL,FS)
        LN=VAL[2]
        LNN=(substr( LN,2,length(LN)-2))
}

if ($1 ~ /^[0-9A-Fa-f]{5}+$/) {
## split ($0,VAL,FS)
        CHR=VAL[NF]
        }
      }
   }
}
END {printf ("%s %s %s\n", CSG, (substr(LNN, 1, length(LNN)-1)), CHR)}'

My Current O/P is as follows. As you can see the only value I get for CHAR is last one - 375. Also if you can help me understand why am I getting file name test2_file,test2_file twice.
Code:
File Names               Lines CHARS
--------------           ----  ------
test_file,test1_file,test2_file,test2_file 9 375

I am expecting following o/p
Code:
File Names               Lines CHARS
--------------              ----  ------
test_file,test1_file,test2_file  9 188,375

As usual you guys are rock stars and would appreciate your help.
There is no reason to use cat to feed data to awk; awk is perfectly capable of reading files on its own. Using cat causes all of the data to be read and written an extra time, consumes more system resources, and slows down your script.

Note that in your code that I marked in red above, you are careful to print each filename value (followed by a comma) when you find one. (But you then also print the last filename found when you get to the END clause in your awk script.

You don't do that with the values you find that you store in the CHR variable (so you just print the last value found) instead of all of them. And there isn't any check in your code to look for matching values to eliminate duplicates.

You might have also noticed that your two heading lines don't line up with each other nor with the data line that you print at the end.

The code rdrtx1 suggested accumulates the comma-separated value strings always adding a comma to the end of the string when a new value is added and then removes the last comma in the END clause. That code also lines up header columns and data columns as long as the list of filenames isn't more than 40 characters long.

The following code self adjusts headings to match the data found in the file being processed. It takes a short-cut assuming that no field will contain data that is longer than 61 characters. If your real data will have one or more fields longer than that, the DASHES variable needs to have more dashes added to its value, or the second printf in the END clause needs to be replaced by three loops that print as many dashes as are needed for each of the three headings. (I will leave that adjustment as an exercise for the reader.)

It also uses a function to add values to the two string variables and only adds a comma as a subfield-separator when the string isn't empty to start with.
Code:
awk '
function AddVal(Value, String) {
	# Add "Value" to a comma-separated value string identified by "String"
	# or, if it does not already exist, create it.
	String = ((String == "" ? "" : String ",")) Value

	# Return the new value for "String".
	return(String)
}

$NF == "(File)" {
	# Add a filename to the CSG variable.
	CSG = AddVal($(NF - 1), CSG)	
	next
}

$1 == "Lines" {
	# Grab the number of lines to be reported.
	match($0, /[[:digit:]]+/)	# I assume this is a decimal number.
	LNN = substr($0, RSTART, RLENGTH)
	next
}

$1 ~ /^[[:xdigit:]]{5}$/ {
	# We found a 5 hexadecimal digit string in $1, determine if we have
	# seen the value in the last field before...
        if($NF in seen) 
		next	# We have seen it, move on to the next input record.
	# We have not seen it before.  Note that we have seen it now...
	seen[$NF]
	# and add this value to the CHR variable.
	CHR = AddVal($NF, CHR)
}

END {	# Set DASHES to a long string of dashes...
	DASHES = "-------------------------------------------------------------"
	# Calculate the longest string to be printed in the filenames field...
	fnl = ((l1 = length("File Names")) > (l2 = length(CSG))) ? l1 : l2
	# and in the lines field...
	ll = ((l1 = length("Lines")) > (l2 = length(LNN))) ? l1 : l2
	# and in the CHARS field.
	vall = ((l1 = length("CHARS")) > (l2 = length(CHR))) ? l1 : l2

	# Print the two line header adjusted to fit the actual data.
	printf("%-*.*s %-*.*s %-*.*s\n", fnl, fnl, "File Names",
	    ll, ll, "Lines", vall, vall, "CHARS")
	printf("%-*.*s %-*.*s %-*.*s\n", fnl, fnl, DASHES,
	    ll, ll, DASHES, vall, vall, DASHES)
	# Print the accumulated data.
	printf ("%*.*s %*.*s %*.*s\n", fnl, fnl, CSG,
	    ll, ll, LNN, vall, vall, CHR)
}' junk1.jnk

The code above produces the output:
Code:
File Names                      Lines CHARS  
------------------------------- ----- -------
test_file,test1_file,test2_file     9 188,375

while the code suggested by rdrtx1 produces the output:
Code:
File Names                               Lines CHARS          
--------------                           ----- ------         
test_file,test1_file,test2_file          9     188,375

and with a different input file containing:
Code:
  Folder1                    : test_file     (File)
                                test1_file    (File)
                                test2_file    (File)
                                test3_file    (File)
   Lines (8):
    00140  Li                      CHAR                         188
    00141  Li                      CHAR                         188
    00142  Li                      CHAR                         190
    00143  Li                      CHAR                         190
    00144  Li                      CHAR                         192
    00145  Li                      CHAR                         375
    00146  Li                      CHAR                         375
    00147  Li                      CHAR                         395

the code above produces the output:
Code:
File Names                                 Lines CHARS              
------------------------------------------ ----- -------------------
test_file,test1_file,test2_file,test3_file     8 188,190,192,375,395

while the code suggested by rdrtx1 would produce the output:
Code:
File Names                               Lines CHARS          
--------------                           ----- ------         
test_file,test1_file,test2_file,test3_file 8     188,190,192,375,395

Hopefully, these two suggestions will give you some ideas you can use as you hone your awk expertise.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 01-25-2018
Awesome Don! You explained each and every line ... This is very helpful. Thank you!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

2. Programming

awk to count occurrence of strings and loop for multiple columns

Hi all, If i would like to process a file input as below: col1 col2 col3 ...col100 1 A C E A ... 3 D E G A 5 T T A A 6 D C A G how can i perform a for loop to count the occurences of letters in each column? (just like uniq -c ) in every column. on top of that, i would also like... (8 Replies)
Discussion started by: iling14
8 Replies

3. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

4. Shell Programming and Scripting

Passing multiple variable to awk

Hi , can I pass more then one variable to awk using -v option? (4 Replies)
Discussion started by: Anupam_Halder
4 Replies

5. Shell Programming and Scripting

Print lines between two strings multiple occurencies (with sed, awk, or grep)

Hello, I can extract lines in a file, between two strings but only one time. If there are multiple occurencies, my command show only one block. Example, monfichier.txt contains : debut_sect texte L1 texte L2 texte L3 texte L4 fin_sect donnees inutiles 1 donnees inutiles 2 ... (8 Replies)
Discussion started by: theclem35
8 Replies

6. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Discussion started by: genzo
5 Replies

7. Shell Programming and Scripting

Sed or Awk for lines between two strings multiple times and keep the last one

Hi, I am trying to get lines between the last occurrences of two patterns. I have files that have several occurrences of “Standard” and “Visual”. I will like to get the lines between “Standard” and “Visual” but I only want to retain only the last one e.g. Standard Some words Some words Some... (4 Replies)
Discussion started by: damanidada
4 Replies

8. Shell Programming and Scripting

CSV to SQL insert: Awk for strings with multiple lines in csv

Hi Fellows, I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem: I have csv files in the following pipe-delimited format: Column1|Column2|Column3|Column4|NEWLINE Address Type|some descriptive... (4 Replies)
Discussion started by: khayal
4 Replies

9. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

10. Shell Programming and Scripting

Awk multiple variable array: comparison

Foo.txt 20 40 57 50 22 51 66 26 17 15 63 18 80 46 78 99 87 2 14 14 51 47 49 100 58 Bar.txt 20 22 51 15 63 78 99 55 51 58 How to get output using awk 20 22 57 50 51 15 26 17 63 78 80 46 99 55 - - 51 58 49 100 (5 Replies)
Discussion started by: genehunter
5 Replies
Login or Register to Ask a Question