The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #4 (permalink)  
Old 11-09-2005
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,131
This is harder than it looks because fields are defined both by syntax and position. Here is a ksh script that works with your sample data. But any surprises in your real data could break it.

Code:
#! /usr/bin/ksh

IFS=""
while read line ; do
    line=${line##+(_)}
    ((${#line})) ||  continue
    if [[ "$line" != "Group of orthologs"* ]] ; then
        echo error looking for start of record 1>&2
        echo $line  1>&2
        exit 1
    fi
    line=${line#"Group of orthologs #"}
    Group_number=${line%%\.*}
    line=${line#*"Best score "}
    Best_Score=${line%" "*}
    read line
    if [[ $line != "Score difference with "* ]] ; then
        echo "error stepping over 2nd line of group $Group_number" 1>&2
        echo $line  1>&2
        exit 1
    fi
    ProteinLines=1
    while ((ProteinLines)) ; do
        if read line ; then
            line=${line##+(_)}
            if ((!${#line})) ; then
                ProteinLines=0
            else
                eval set $line
                firstchar="${line%${line#?}}"
                if [[ $# -eq 4 ]] ; then
                    S_one=$1
                    P_one=$2
                    S_two=$3
                    P_two=$4
                else
                    if [[ $firstchar = [a-zA-Z0-9] ]] ; then
                        S_one=$1
                        P_one=$2
                        S_two=""
                        P_two=""
                    else
                        S_one=""
                        P_one=""
                        S_two=$1
                        P_two=$2
                    fi
                fi
                echo "${Group_number};${Best_Score};${S_one};${P_one};${S_two};${P_two};"
            fi
        else
            ProteinLines=0
        fi
    done
done
exit 0


Code:
$
$ ./pro < data
1;3010;YHR165C;100.00%;PRP8_HUMAN;100.00%;
2;2100;YLR106C;100.00%;MDN1_HUMAN;100.00%;
3;2082;YJL130C;100.00%;PYR1_HUMAN;100.00%;
4;1959;YKR054C;100.00%;DYHC_HUMAN;100.00%;
5;1855;YNR016C;100.00%;Q6KE87_HUMAN;100.00%;
5;1855;YMR207C;19.86%;COA2_HUMAN;90.52%;
5;1855;;;COA1_HUMAN;53.30%;
6;1838;YDL140C;100.00%;RPB1_HUMAN;100.00%;
7;1768;YJR066W;100.00%;Q4LE76_HUMAN;100.00%;
7;1768;YKL203C;49.22%;;;
$