The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
change order of fields in header record JohnMario UNIX for Dummies Questions & Answers 1 05-22-2008 11:58 AM
parsing data file picking out certain fields timj123 Shell Programming and Scripting 8 03-05-2008 02:57 PM
How to split a field into two fields? vbrown Shell Programming and Scripting 4 02-21-2008 02:50 AM
How to change Raw data to Coloumn data fields Nayanajith Shell Programming and Scripting 1 08-28-2006 10:23 PM
how to include field separator if there are blank fields? ReV Shell Programming and Scripting 19 07-13-2005 01:50 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 04-15-2008
Registered User
 

Join Date: Apr 2008
Posts: 2
Question AWK - printing certain fields when field order changes in data file

I'm hoping someone can help me on this. I have a data file that greatly simplified might look like this:

Code:
sec;src;dst;proto
421;10.10.10.1;10.10.10.2;tcp
426;10.10.10.3;10.10.10.4;udp
442;10.10.10.5;10.10.10.6;tcp
sec;src;fac;dst;proto
521;10.10.10.1;ab;10.10.10.2;tcp
525;10.10.10.5;ac;10.10.10.6;tcp
522;10.10.10.3;ab;10.10.10.4;udp
535;10.10.10.5;ac;10.10.10.6;tcp
...
Periodically throughout the file is a header line and the lines underneath that are the actual data where the fields correspond to the header line. However, sometimes the order of the fields change, and different fields are used, but that change is always marked by a new header line. The new header lines can be anywhere in the data file. Fields might be next to each other at one point in the file, then separated in a later part of the file.

I'd like to be able to produce a simplified output of just a few of the fields in the data file. For instance, I'd like to extract the src, dst, and proto from the example above. src is normally field 2, but dst is field 3 and then changes to field 4. My desired output would look something like this:

Code:
src;dst;proto
10.10.10.1;10.10.10.2;tcp
10.10.10.3;10.10.10.4;udp
10.10.10.5;10.10.10.6;tcp
10.10.10.1;10.10.10.2;tcp
10.10.10.5;10.10.10.6;tcp
10.10.10.3;10.10.10.4;udp
10.10.10.5;10.10.10.6;tcp
I've worked with AWK quite a bit and know how to work with field numbers, if/then, etc, but I can't figure out how to change a field number to a new value as directed by the header line.

Can anyone help me? I'd sure appreciate any advice. Is AWK the right tool to do this with?
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 04-15-2008
Registered User
 

Join Date: Apr 2008
Posts: 5
Hi eric, I think AWK is the right stuff for you. Try to evaluate some fields (or the entire line) to look for the headers so you know that next lines follow that pattern until a new header comes. At least headers are static, aren't them?

Good luck!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008
Registered User
 

Join Date: Apr 2008
Posts: 2
I think I figured it out:

Code:
awk '
BEGIN {FS=";"} 
/;src;/{
for (num=1;num<=NF;num++) {if ($num == "src") fieldsrc=num}; 
for (num=1;num<=NF;num++) {if ($num == "dst") fielddst=num}; 
for (num=1;num<=NF;num++) {if ($num == "proto") fieldproto=num};
} 

!/;src;/{print $fieldsrc";"$fielddst";"$fieldproto}
'
seems to work for this example:

10.10.10.1;10.10.10.2;tcp
10.10.10.3;10.10.10.4;udp
10.10.10.5;10.10.10.6;tcp
10.10.10.1;10.10.10.2;tcp
10.10.10.5;10.10.10.6;tcp
10.10.10.3;10.10.10.4;udp
10.10.10.5;10.10.10.6;tcp

Now if I can figure it out for the real world data ...
Reply With Quote
  #4 (permalink)  
Old 04-15-2008
vgersh99's Avatar
Moderator
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 3,002
# default fields: 'src;dst;proto'
nawk -f eric.txt myDataFile.txt

# fields order : 'proto;sec;src'
nawk -v fields='proto;sec;src' -f eric.txt myDataFile.txt

eric.awk:
Code:
BEGIN {
  FS=OFS=";"

  if (fields=="") fields="src;dst;proto"

  n=split(fields, fieldsA, FS)

  PATheader="[;]*src[;]*"
}

FNR==1 { print fields }
$0 ~ PATheader {
   for(i=1; i<=NF; i++)
      header[$i]=i
   next
}

{
   for(i=1; i<=n; i++)
     printf("%s%c", $header[fieldsA[i]], (i==n) ? ORS : OFS)
}
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 03:41 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0