The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
AWK Matching Fields and Combining Files Michelangelo Shell Programming and Scripting 5 03-30-2009 10:28 PM
Matching by key fields ChicagoBlues Shell Programming and Scripting 5 02-01-2009 01:52 PM
Matching fields of rows and then operating ashis.tewari Shell Programming and Scripting 3 12-04-2008 09:02 AM
matching 2 exact fields aismann Shell Programming and Scripting 2 11-14-2008 03:16 AM
Grep Line with Matching Fields hemangjani UNIX for Advanced & Expert Users 13 08-10-2007 12:46 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-12-2009
kajolo kajolo is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 5
AWK- delimiting the strings and matching the fields

Hello,

I am newbie in awk. I have just started learning it.

1) I have input file which looks like:
{4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 }
{10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307}
and so on.....

2) In output:

a) numbers in first {} should be treated as a first string, second {} delimit a second string and third {} delimit a third string,
b) then, first number from first {} should be matched with first number from second {} and first number from third {}, similarly, second number from first {} should be matched with second number from second {} and second number from third {},

c) so the output should look:
4812 4814 8116
4009 4010 9362
and so on...

Thanks,
Kajolo
  #2 (permalink)  
Old 06-12-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,131
nawk -f kaj.awk myFile

kaj.awk:

Code:
BEGIN {
  FS="[{}]"
  SEPlist=" "
}
{
  split("",a)
  for (i=2; i<=NF; i=i+2) {
    n=split($i, list, SEPlist)
    min=(min==0) ? n : (n<min)?n:min
      for(j=1; j<=n; j++)
        a[j]=(j in a) ? a[j] OFS list[j] : list[j]

  }
  for(i=1; i<=min; i++)
     print a[i]
}

  #3 (permalink)  
Old 06-12-2009
kajolo kajolo is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 5
Hi!

Works perfectly!!
Thank you!

By the way I was trying to count the number of occurrences of each record for example:
4812 4814 8116 : 8, however when I do: uniq -c input > out it doesn't work. Instead of that
it prints: 1: 4812 4814 8116, then lets say 10 lines below when it finds this same record:
1: 4812 4814 8116 and so on..

Thanks again,
Kajolo
  #4 (permalink)  
Old 06-12-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,131
'uniq' assumes sorted file.
  #5 (permalink)  
Old 06-12-2009
kajolo kajolo is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 5
Ok - I found.
awk ' { print $0 }' input |sort |uniq -c

Thanks again for help.
Kajolo

-----Post Update-----

Hello again,

I have still some problems.
The input files has 21564 words (counted by wc -w input).
However output contains only 6207 words. It seems that AWK script prints/OR / analyze only first
few hundred of lines and then it stops???

Thanks,
Kajolo

-----Post Update-----

Hello again,

I have still some problems.
The input files has 21564 words (counted by wc -w input).
However output contains only 6207 words. It seems that AWK script prints/OR / analyze only first
few hundred of lines and then it stops???

Thanks,
Kajolo
  #6 (permalink)  
Old 06-12-2009
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,131
well....
here's a quote from your original post:

Code:
{4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 }
{10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307}

The code has been in such a way, that it figures out the MINIMUM number of elements per group per record/line - these are elements in green. Any elements are go beyond the 'minimum' (represented in red) are dropped from the output.

This is the algorithm I've inferred from your data sample and the desired output.

If it's not so, provide a better desired output given a sample in the original post.
  #7 (permalink)  
Old 06-12-2009
kajolo kajolo is offline
Registered User
  
 

Join Date: Jun 2009
Posts: 5
Sorry - I wasn't precise.
1) Input file contains 1005 lines,
2) The number of words in each line (maximum and minimum) differ. Word is defined as a single number,
3) We have three strings, in each line, delimited by first {}, second {} and third {},
4) The important thing is that: number of words (regardless the row) in first {}, second {} and third {}
is exactly identical,
5) And thats way I would like to pair ALL elements with each other in the way I wrote before,

6) Here is the fragment of original input (I am not sure how it will be displayed here but each three {} are in single line),

{4812 4009 2357 1602 2756 1025 3199 951 957 0 99} {4814 4010 2358 1603 2758 1028 3200 952 958 1 100} {8116 9362 10121 10779 10120 10908 9274 10962 10963 10564 10602}

{4812 4009 2357 1602 957 951 1025 99} {4814 4010 2358 1603 958 952 1028 100} {8116 9362 10121 10779 10963 10962 10908 10602}

{4812 4009 2357 1602 1025 901 957 951 99} {4814 4010 2358 1603 1028 902 958 952 100} {8116 9362 10120 10779 10908 11012 10963 10962 10602}

{10121 10779 10120 10908 9274 11012 10962 10963 10564 10602} {2357 1602 2756 1025 3199 901 951 957 0 99} {2358 1603 2757 1028 3200 902 952 958 1 100}

{4812 1602 2756 951 957 99} {4814 1603 2757 952 958 100} {8116 10779 10120 10962 10963 10602}

{4009 1602 2756 2357 2357 99 719} {4010 1603 2758 2358 2358 100 720} {9362 10779 10120 10120 10121 10602 10375}

{4812 2756 2357 1025 3199 901 99} {4814 2759 2358 1028 3200 902 100} {8116 10120 10120 10908 9274 11012 10602}

{4812 1602 2756 2357 1025 3199 951 0 99 719} {4814 1603 2757 2358 1028 3200 952 1 100 720} {8116 10779 10120 10120 10909 9274 10962 10564 10602 10375}

{4812 3680 1602 2756 2357 3199 957 951 99 719} {4814 3682 1603 2757 2358 3200 958 952 100 720} {8116 9352 10779 10120 10121 9274 10963 10962 10602 10375}

in OUTPUT I would like (based on first line of input):

4812 4814 8116
4009 4010 2358
2357 2358 10121
1602 1603 10779 ... so the last record should be:
99 100 10602
and so on for all lines.


Regards,
Kajol
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:02 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0