Formatting and combining fields of the input file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Formatting and combining fields of the input file
# 1  
Old 03-05-2012
Formatting and combining fields of the input file

Hi,

I have a file of the following format:

Code:
AV 103
AV 104
AV 105
AV 308
AV 517
BN 210
BN 211
BN 212
BN 218

and the desired output is :

Code:
AV 103-105 3
AV 308 1
AV 517 1
BN 210-212 3
BN 218 1

In the output file the first field is the string only. The second field is the id sequence that is till the counting is continuous like in first case AV is continuous from 103-105 and in next two cases only single num is there as no continuous sequence exists. In forth line of output again continuous seq of counting exists from 210-212.
In the third field of output we have the count of the sequence that are clubbed like in first case 103-105 has a third field of count as 3.

Can anyone help me in converting my input file into the desired output.
Thanks in advance

Last edited by joeyg; 03-05-2012 at 11:07 AM.. Reason: Please wrap sample data and scripts in CodeTags
# 2  
Old 03-05-2012
Code:
(NAME==$1) && (LV==($2-1)) {    LV++;   next    }

{
        if(NAME)
        {
                if(LV == FIRST) print NAME, FIRST, 1
                else            print NAME, FIRST "-" LV, (LV-FIRST)+1;
        }

        NAME=$1;        FIRST=$2; LV=$2
}

END {
        if(NAME)
        {
                if(LV == FIRST) print NAME, FIRST, 1
                else            print NAME, FIRST "-" LV, (LV-FIRST)+1;
        }
}

Code:
$ awk -f contig.awk data

AV 103-105 3
AV 308 1
AV 517 1
BN 210-212 3
BN 218 1

$

# 3  
Old 03-07-2012
Hi Corona,

First of all thanks for a prompt reply. I have tried using the script you have provided me. It works fine for most of part of input file but fails in certain cases. I have extracted a part of input where it fails to work.

The input was :
Code:
root@s2a>cat test10
AF01_0 9999998899999097
AF01_0 9999998899999098
AF01_0 9999998899999099
AF01_0 9999999999999000
AF01_0 9999999999999001
AF01_0 9999999999999002
AF01_0 9999999999999003
AF01_0 9999999999999004
AF01_0 9999999999999005
AF01_0 9999999999999006
root@s2a>

and the output script gave me was:
Code:
root@s2a>awk -f fn.awk test10
HN01_0 9999998899999097 1
HN01_0 9999998899999099 1
HN01_0 9999999999999000 1
HN01_0 9999999999999003 1


the desired output should had been :
Code:
root@s2a>awk -f fn.awk test10
AF01_0 9999998899999097-9999998899999099 3
AF01_0 9999999999999000-9999999999999006 7

I am not able to get why is the script failing in such a situation. If you could help that would be great.

Last edited by Franklin52; 03-07-2012 at 05:24 AM.. Reason: Please use code tags for code and data samples, thank you
# 4  
Old 03-20-2012
Hi,

Can anyone help me resolve the above issue.

BR//
# 5  
Old 03-20-2012
Hi.

Translating awk code from Corona688 to perl (via a2p):
Code:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
  if $running_under_some_shell;

# @(#) p3	Demonstrate perl version of awk code.

# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)

eval '$' . $1 . '$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;

# process any FOO=bar switches

$, = ' ';     # set output field separator
$\ = "\n";    # set output record separator

line: while (<>) {
  ( $Fld1, $Fld2 ) = split( ' ', $_, -1 );
  if ( ( $NAME eq $Fld1 ) && ( $LV == ( $Fld2 - 1 ) ) ) {    #???
    $LV++;
    next line;
  }

  if ($NAME) {
    if ( $LV == $FIRST ) {                                   #???
      print $NAME, $FIRST, 1;
    }
    else {
      print $NAME, $FIRST . '-' . $LV, ( $LV - $FIRST ) + 1;
    }
  }

  $NAME  = $Fld1;
  $FIRST = $Fld2;
  $LV    = $Fld2;
}

if ($NAME) {
  if ( $LV == $FIRST ) {    #???
    print $NAME, $FIRST, 1;
  }
  else {
    print $NAME, $FIRST . '-' . $LV, ( $LV - $FIRST ) + 1;
  }
}

when run as ./p3 data2 produces:
Code:
./p3 data2
AF01_0 9999998899999097-9999998899999099 3
AF01_0 9999999999999000-9999999999999006 7

For context:
Code:
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
awk GNU Awk 3.1.5
perl 5.10.0

Best wishes ... cheers, drl
# 6  
Old 03-20-2012
Code:
perl -lane '
  sub pr{$,=" "; print $p, ($c>1?$n1."-".$n2:$n1), $c; $c=0; $n1=$F[1]}
  if($p ne $F[0] || $F[1]-1 != $n2){ &pr() } $p=$F[0]; $n2=$F[1]; $c++;
  END{&pr()}
' infile

slightly more readable:
Code:
perl -lane '
  sub pr{
    $,=" ";
    print $p, ($c>1?$n1."-".$n2:$n1), $c;
    $c=0; $n1=$F[1];
  }
   
  if($p ne $F[0] || $F[1]-1 != $n2){
    &pr() 
  } 
  $p=$F[0]; 
  $n2=$F[1];
  $c++;
  
  END{
    &pr()
  }
' infile

Apparently awk can't muster the required precision, whereas perl can...

Last edited by Scrutinizer; 03-20-2012 at 11:40 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to clean up input file, printing both fields

In the f1 file below I am trying to clean it up removing lines the have _tn_ in them. Next, removing the characters in $2 before the ninth /. Then I remove the ID_(digit- always 4). Finally, the charcters after and including the first _. It is curently doing most of it but the cut is removing $1... (5 Replies)
Discussion started by: cmccabe
5 Replies

2. Shell Programming and Scripting

How to delimit the fields of a input file which has special characters?

Hi All, I am a newbie to Shell scripting. I have a requirement to Delimit the file fields of a Input file having special characters and spaces with ";". Input File ---------------------------------- Server Port ---------------------------------- Local ... (5 Replies)
Discussion started by: Suganbabu
5 Replies

3. Shell Programming and Scripting

Input handling and formatting input to shell

i want to get input and depending on it create new commands for input to expect. But problem is that after giving date or month as 01-09 it is interpretation as 1-9 echo -n "ENTER DATE " read d1 echo -n "ENTER MONTH " read m1 echo -n "ENTER YEAR" read y1 o=1 i=1 d2=`expr $d1... (1 Reply)
Discussion started by: sagar_1986
1 Replies

4. Shell Programming and Scripting

How to check field formatting of input file?

Hi, I had input file with below data, abcdefghij;20100903040607;1234567891;GLOBAL; Having values of fields with seperated by semi-colon (;) and ended with line feed (\n). Through shell script, how can I check the field formatting? Thanks in advance. (18 Replies)
Discussion started by: Poonamol
18 Replies

5. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Hello! I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Discussion started by: Michelangelo
5 Replies

6. Shell Programming and Scripting

To get an output by combining fields from two different files

Hi guys, I couldn't find solution to this problem. If anyone knows please help me out. your guidance is highly appretiated. I have two files - FILE1 has the following 7 columns ( - has been added to make columns visible enough else columns are separated by single space) 155.34 - leg - 1... (8 Replies)
Discussion started by: smriti_shridhar
8 Replies

7. Shell Programming and Scripting

combining fields in awk

I am using: ps -A -o command,%cpu to get process and cpu usage figures. I want to use awk to split up the columns it returns. If I use: awk '{print "Process: "$1"\nCPU Usage: "$NF"\n"}' the $NF will get me the value in the last column, but if there is more than one word in the... (2 Replies)
Discussion started by: json4639
2 Replies

8. Shell Programming and Scripting

combining fields in two text fields

Can someone tell me how to do this using sed, awk, or any other basic shell scripting? Basically I have two text files with the following contained in each file: File A: a b c d e f g h i File B: 1 2 3 I want the final outcome to look like this: a b c 1 d e f 2 g h i 3 How... (3 Replies)
Discussion started by: shocker
3 Replies

9. AIX

combining two input text files

hi! i would like to process two input text files text1 9835023 20051004F2_011 9835021 20060904FAL0132006 8835099 20051004HOL011 8835044 20051004H1_011 6835023 20061002HAL0132006 4835099 20050721F1_011 4835088 ... (6 Replies)
Discussion started by: d3ck_tm
6 Replies

10. Shell Programming and Scripting

Formatting fields of a file

I have a file with n number of cols. I need to modify the size of each field. it may be either increase or decrease the present size of the field. can anybody help me in this plz thanks in advance (4 Replies)
Discussion started by: kolvi
4 Replies
Login or Register to Ask a Question