Files summary using awk based on index key


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Files summary using awk based on index key
# 1  
Old 03-18-2017
Files summary using awk based on index key

Hello , I have several files which are looking similar to :

file01.txt
Code:
keyA001  350 X string001 value001
keyA001  450 X string002 value007
keyA001  454 X string002 value004
keyA001  500 X string003 value005
keyA001  255 X string004 value006
keyA001  388 X string005 value008
keyA001 1278 X string005 value012
keyA001 2555 X string006 value018
keyA002  299 X string001 value022
keyA002 1455 X string002 value032
keyA002  267 X string002 value019
keyA002  322 X string004 value002
keyA002  555 X string006 value003
keyA003  124 X string002 value004
keyA003  650 X string003 value013
keyA003 8285 X string004 value006
keyA003  388 X string005 value021
keyA003  255 X string006 value027
keyA003  277 X string007 value028
keyA003  444 X string008 value048
keyA003  444 X string009 value185
keyA004  280 X string001 value076
keyA004 3170 Y Z string002 value085
keyA004  270 X string003 value005
keyA004   25 X string004 value006
keyA004   78 X string005 value012
keyA004    5 X string006 value018

file02.txt
Code:
keyB002 2555 X string006 value018
keyB003 8285 X string004 value006
keyB003  366 X string005 value021
keyB003 2445 Y Z string006 value027
keyB006  270 X string003 value005
keyB006   25 X string004 value006
keyB006   78 X string005 value012
keyB008 1245 X string006 value018
keyB008  245 X string006 value020

file03.txt
Code:
keyC008  450 X string002 value007
keyC008  500 X string003 value005
keyC008  500 X string003 value029
keyC008  255 X string004 value006
keyC010  299 X string001 value022
keyC010 1455 X string002 value032
keyC010  322 X string004 value002
keyC010  299 X string004 value026
keyC010  546 X string005 value003
keyC010  124 X string006 value004
keyC011  650 X string003 value513
keyC011 8285 X string004 value067

I would like to generate a summary for individual files and for all files using string??? as index key (most of the rows are 6 columns but can have records with 7 columns).

Desired output should look like:

Code:
file01.txt summary:
string001 3 records
string002 4 records
string003 3 records
string004 4 records
string005 3 records
string006 4 records
string007 1 records
string008 1 records
string009 1 records

file02.txt summary:
string003 1 records
string004 2 records
string005 2 records
string006 3 records

file03.txt summary:
string001 1 records
string002 2 records
string003 2 records
string004 3 records
string005 1 records
string006 1 records

Summary of all files
string001 4 records
string002 6 records
string003 6 records
string004 9 records
string005 6 records
string006 8 records
string007 1 records
string008 1 records
string009 1 records

The bellow script gets me closer to the desired results:

Code:
FNR == 1 {
	if (file) 
		{
			print file " summary :"
			for (STR in string)
				print STR, string[STR] " records"
				print " "				
				split(x, string)
				counter = 0
		}
	file = FILENAME
}    
{
	if ( $4 ~ /^(string001|string002|string003|string004|string005|string006|string007|string008|string009)$/ && !counter ) 
		{ 
			{ string[$4]++; totalstring[$4]++ }			
			if (s && $3 == "X")
				{ counter = 1}
		} 
	else 
		{ 
                         { string[$5]++; totalstring[$5]++ }
                         if (s && $4 == "Y")
	                         { counter = 1}
		} 
}
END {
        print file " summary : "
        for (STR in string)
		print STR, string[STR] " records"                
                print " "
                print "Files summary: "
        for (STR in totalstring)
		print STR, totalstring[STR] " records"                
                print "End of summary."
}

Running the following command:

Code:
awk -f files-summary.awk file*.txt

Produces:

Code:
file01.txt summary:
string001 3 records
string002 6 records (instead of 4 records)
string003 3 records
string004 4 records
string005 4 records (instead of 3 records)
string006 4 records
string007 1 records
string008 1 records
string009 1 records

file02.txt summary:
string003 1 records
string004 2 records
string005 2 records 
string006 4 records (instead of 3 records)

file03.txt summary:
string001 1 records
string002 2 records 
string003 3 records (instead of 2 records)
string004 4 records (instead of 3 records)
string005 1 records 
string006 1 records

Summary of all files
string001 4 records
string002 8 records (instead of 6 records)
string003 7 records (instead of 6 records)
string004 10 records (instead of 9 records)
string005 7 records (instead of 6 records)
string006 9 records (instead of 8 records)
string007 1 records
string008 1 records
string009 1 records

The part I'm missing, and do not know how to include it in the above code, is how to deal with duplicate index keys if first column has the same value.

Examples:

Code:
keyA001  450 X string002 value007
keyA001  454 X string002 value004

keyA001  388 X string005 value008
keyA001 1278 X string005 value012

Both of them should be counted as single record.
# 2  
Old 03-18-2017
Hi alex2005, see if this adaptation works for you:

Code:
function pr_sum (txt,arr,end,   i) {
  print txt
  for (i in arr)
    print i, arr[i] " records"
  print end
}

FNR == 1 {
  if (NR>FNR) {
    pr_sum(file " summary :",string,"")
  }
  file = FILENAME
  split(x, string)
}

!seen[$1,$(NF-1)]++ {
  string[$(NF-1)]++
  totalstring[$(NF-1)]++
}

END {
  pr_sum(file " summary :",string,"")
  pr_sum("Summary of all files :",totalstring,"End of summary.")
}

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 03-18-2017
Sorry, I have to come back to this thread, but I need more help.

Even Scrutinizer script works like a charm, for the case I have presented, I have discovered that some of the files have one or more spaces in the last column values (valueXXX).

Hence cannot use:

Code:
!seen[$1,$(NF-1)]++ {
  string[$(NF-1)]++
  totalstring[$(NF-1)]++
}

and
Code:
$(NF-1)

needs o be replaced with
Code:
$4

or
Code:
$5

I spend the entire afternoon making different trials, and the closest trial was:
Code:
{       
        if ( $3 == "IN" )
        {
                !seen[$1,$4]++ 
                                {       
                                        string[$4]++
                                        totalstring[$4]++
                                }
        }
        else 
        {       
                !seen[$1,$5]++
                                {       
                                        string[$5]++
                                        totalstring[$5]++
                                }
        }
}

This didn't work for me I need to ask for your help once more.

Thanks in advance and sorry opening again closed issue.

---------- Post updated at 01:35 PM ---------- Previous update was at 01:34 PM ----------
# 4  
Old 03-18-2017
Perhaps this :

Code:
function pr_sum (txt,arr,end,   i) {
    print txt
    for (i in arr)
      print i, arr[i] " records"
    print end
}


FNR == 1 {
  if (NR>FNR) {
    pr_sum(file " summary :",string,"")
  }
  file = FILENAME
  split(x, string)
}

{
  i=$3=="IN"?4:5
}

!seen[$1,$i]++ {
  string[$i]++
  totalstring[$i]++
}

END {
  pr_sum(file " summary :",string,"")
  pr_sum("Summary of all files :",totalstring,"End of summary.")
}


For the earlier example you could use:
Code:
{
  i=$3=="X"?4:5
}

instead
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 03-18-2017
Many thanks for your help ,and sorry for introducing the "IN" instead of "X".

Now everything works like a charm.

Again very smart solution provided.

Kind Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. UNIX for Beginners Questions & Answers

Match tab-delimited files based on key

I thought I had this figured out but was wrong so am humbly asking for help. The task is to add an additional column to FILE 1 based on records in FILE 2. The key is in COLUMN 1 for FILE 1 and in COLUMN 1 OR COLUMN 2 for FILE 2. I want to add the third column from FILE 2 to the beginning of... (8 Replies)
Discussion started by: andmal
8 Replies

3. Shell Programming and Scripting

awk - Merge two files based on one key

Hi, I am struggling with the an awk command to merge two files based on a common key. I want to append the value from File2 ($2) onto the end of File1 where $1 from each file matches - If no match then nothing is apended File1 COL1|COL2|COL3|COL4|COL5|COL6|COL7... (3 Replies)
Discussion started by: Ads89
3 Replies

4. Shell Programming and Scripting

Fetch the values based on a Key using awk from single file

Hi, Please help to fetch the values for a key from below data format in linux. Sample Input Data Format 11055005|PurchaseCondition|GiftQuantity|1 11055005|PurchaseCondition|MinimumPurchase|400 11055005|GiftCatalogEntryIdentifier|Id|207328014 11429510|PurchaseCondition|GiftQuantity|1... (2 Replies)
Discussion started by: mohanalakshmi
2 Replies

5. Shell Programming and Scripting

Using AWK to format output based on key field

I have file which contains gene lines something like this Transcript Name GO POPTR_0016s06290.1 98654 POPTR_2158s00200.1 11324 POPTR_0004s22390.1 12897 POPTR_0001s11490.1 POPTR_0016s13950.1 14532 POPTR_0015s05840.1 13455 POPTR_0013s06470.1 12344... (6 Replies)
Discussion started by: shen
6 Replies

6. Shell Programming and Scripting

Merge two files based on a 3rd key file

Hi, I want to merge the two files based on the key file's columns. The key file: DATE~DATE HOUSE~IN_HOUSE CUST~IN_CUST PRODUCT~PRODUCT ADDRESS~CUST_ADDR BASIS_POINTS~BASIS_POINTS ... The other 2 files are From_file & To_file - The From_file: DATE|date/time|29|9 ... (9 Replies)
Discussion started by: dips_ag
9 Replies

7. Shell Programming and Scripting

Gawk / Awk Merge Lines based on Key

Hi Guys, After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines. I'm an SEO Consultant and provide... (9 Replies)
Discussion started by: Jamesfirst
9 Replies

8. Shell Programming and Scripting

joining files based on key column

Hi I have to join two files based on 1st column where 4th column of a2.txt=at and take 2nd column of a1.txt and 3rd column of a2.txt and check against source files ,if matches list those source file names. a1.txt a1|20090809|20090810 a2|20090907|20090908 a2.txt a1|d|file1.txt|at... (9 Replies)
Discussion started by: akil
9 Replies

9. Shell Programming and Scripting

Merge files based on key

Hi Friends, Can any one help me with merging these file based on two columns : File1: A|123|99|SAMS B|456|95|GEORGE D|789|85|HOVARD File2: S|123|99|NANcY|6357 S|123|99|GREGRO|83748 A|456|95|HARRY|827|somers S|456|95|ANTONY|546841|RUDOLPH|7263 B|456|95|SMITH|827|BOISE STATE|834... (3 Replies)
Discussion started by: sbasetty
3 Replies

10. Shell Programming and Scripting

merging two files based on some key

I have to merge two files: The files are having the same format like A0this is first line TOlast line silmilarly other lines. I have to search for A0 line in the second file also and then put the data in the third file under A0 heading ,then for A1 and so on. A0 portion will be treminated... (1 Reply)
Discussion started by: Vandana Yadav
1 Replies
Login or Register to Ask a Question