AWK- delimiting the strings and matching the fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK- delimiting the strings and matching the fields
# 1  
Old 06-12-2009
AWK- delimiting the strings and matching the fields

Hello,

I am newbie in awk. I have just started learning it.

1) I have input file which looks like:
{4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 }
{10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307}
and so on.....

2) In output:

a) numbers in first {} should be treated as a first string, second {} delimit a second string and third {} delimit a third string,
b) then, first number from first {} should be matched with first number from second {} and first number from third {}, similarly, second number from first {} should be matched with second number from second {} and second number from third {},

c) so the output should look:
4812 4814 8116
4009 4010 9362
and so on...

Thanks,
Kajolo
# 2  
Old 06-12-2009
nawk -f kaj.awk myFile

kaj.awk:
Code:
BEGIN {
  FS="[{}]"
  SEPlist=" "
}
{
  split("",a)
  for (i=2; i<=NF; i=i+2) {
    n=split($i, list, SEPlist)
    min=(min==0) ? n : (n<min)?n:min
      for(j=1; j<=n; j++)
        a[j]=(j in a) ? a[j] OFS list[j] : list[j]

  }
  for(i=1; i<=min; i++)
     print a[i]
}

# 3  
Old 06-12-2009
Hi!

Works perfectly!!
Thank you!

By the way I was trying to count the number of occurrences of each record for example:
4812 4814 8116 : 8, however when I do: uniq -c input > out it doesn't work. Instead of that
it prints: 1: 4812 4814 8116, then lets say 10 lines below when it finds this same record:
1: 4812 4814 8116 and so on..

Thanks again,
Kajolo
# 4  
Old 06-12-2009
'uniq' assumes sorted file.
# 5  
Old 06-12-2009
Ok - I found.
awk ' { print $0 }' input |sort |uniq -c

Thanks again for help.
Kajolo

-----Post Update-----

Hello again,

I have still some problems.
The input files has 21564 words (counted by wc -w input).
However output contains only 6207 words. It seems that AWK script prints/OR / analyze only first
few hundred of lines and then it stops???

Thanks,
Kajolo

-----Post Update-----

Hello again,

I have still some problems.
The input files has 21564 words (counted by wc -w input).
However output contains only 6207 words. It seems that AWK script prints/OR / analyze only first
few hundred of lines and then it stops???

Thanks,
Kajolo
# 6  
Old 06-12-2009
well....
here's a quote from your original post:
Code:
{4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 }
{10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307}

The code has been in such a way, that it figures out the MINIMUM number of elements per group per record/line - these are elements in green. Any elements are go beyond the 'minimum' (represented in red) are dropped from the output.

This is the algorithm I've inferred from your data sample and the desired output.

If it's not so, provide a better desired output given a sample in the original post.
# 7  
Old 06-12-2009
Sorry - I wasn't precise.
1) Input file contains 1005 lines,
2) The number of words in each line (maximum and minimum) differ. Word is defined as a single number,
3) We have three strings, in each line, delimited by first {}, second {} and third {},
4) The important thing is that: number of words (regardless the row) in first {}, second {} and third {}
is exactly identical,
5) And thats way I would like to pair ALL elements with each other in the way I wrote before,

6) Here is the fragment of original input (I am not sure how it will be displayed here but each three {} are in single line),

{4812 4009 2357 1602 2756 1025 3199 951 957 0 99} {4814 4010 2358 1603 2758 1028 3200 952 958 1 100} {8116 9362 10121 10779 10120 10908 9274 10962 10963 10564 10602}

{4812 4009 2357 1602 957 951 1025 99} {4814 4010 2358 1603 958 952 1028 100} {8116 9362 10121 10779 10963 10962 10908 10602}

{4812 4009 2357 1602 1025 901 957 951 99} {4814 4010 2358 1603 1028 902 958 952 100} {8116 9362 10120 10779 10908 11012 10963 10962 10602}

{10121 10779 10120 10908 9274 11012 10962 10963 10564 10602} {2357 1602 2756 1025 3199 901 951 957 0 99} {2358 1603 2757 1028 3200 902 952 958 1 100}

{4812 1602 2756 951 957 99} {4814 1603 2757 952 958 100} {8116 10779 10120 10962 10963 10602}

{4009 1602 2756 2357 2357 99 719} {4010 1603 2758 2358 2358 100 720} {9362 10779 10120 10120 10121 10602 10375}

{4812 2756 2357 1025 3199 901 99} {4814 2759 2358 1028 3200 902 100} {8116 10120 10120 10908 9274 11012 10602}

{4812 1602 2756 2357 1025 3199 951 0 99 719} {4814 1603 2757 2358 1028 3200 952 1 100 720} {8116 10779 10120 10120 10909 9274 10962 10564 10602 10375}

{4812 3680 1602 2756 2357 3199 957 951 99 719} {4814 3682 1603 2757 2358 3200 958 952 100 720} {8116 9352 10779 10120 10121 9274 10963 10962 10602 10375}

in OUTPUT I would like (based on first line of input):

4812 4814 8116
4009 4010 2358
2357 2358 10121
1602 1603 10779 ... so the last record should be:
99 100 10602
and so on for all lines.


Regards,
Kajol
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies

2. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

3. Shell Programming and Scripting

Awk: adding fields after matching $1

Dear AWK-experts! I did get stuck in the task of combining files after matching fields, so I'm still awkward with learning AWK. There are 2 files: one containing 3 columns with ID, coding status, and score for long noncoding RNAs: file1 (1.txt) (>5000 lines) ... (12 Replies)
Discussion started by: kben
12 Replies

4. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

5. UNIX for Advanced & Expert Users

awk print all fields except matching regex

grep -v will exclude matching lines, but I want something that will print all lines but exclude a matching field. The pattern that I want excluded is '/mnt/svn' If there is a better solution than awk I am happy to hear about it, but I would like to see this done in awk as well. I know I can... (11 Replies)
Discussion started by: glev2005
11 Replies

6. Shell Programming and Scripting

awk question ? set 2 variables by matching fields

Hello, I'm trying to get the TOP and BASE numbers printed out File looks like this: 2300 CAR # 2300 is the TOP 2310 CAR 2335 CAR 2455 CAR # 2455 is the BASE 1000 MOTOR # 2455 will become this TOP 2000 MOTOR 3000 MOTOR 4000 MOTOR # 4000 is the BASE 2345 BIKE # 4000... (8 Replies)
Discussion started by: charlieglen
8 Replies

7. Shell Programming and Scripting

Awk - Script assistance on identifying non matching fields

Hoping for some assistance. my source file consists of: os, ip, username win7, 123.56.78, john win7, 123.56.78, paul win7, 10.1.1.1, john win7, 10.2.2.3, joe I've been trying to run a script that will only return ip and username where the IP address is the same and the username is... (3 Replies)
Discussion started by: tekvaio
3 Replies

8. Shell Programming and Scripting

AWK : Add Fields of lines with matching field

Dear All, I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input: Input: Test1 5 Test1 10 Test2 2 Test2 5 Test2 13 Test3 4 Output: Test1 7.5 Test1 7.5... (6 Replies)
Discussion started by: DerSeb
6 Replies

9. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Hello! I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Discussion started by: Michelangelo
5 Replies

10. Shell Programming and Scripting

AWK delimiting

I have a directory of files DATA1,DATA2,DATA3, etc... each file has 1 column of data Example: File DATA1 name date time time requested approved I need to change these to CSV files so I can import the data. I believe AWK is what i need but I'm new to AWK and can't seem to get the... (5 Replies)
Discussion started by: barrro
5 Replies
Login or Register to Ask a Question