![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Domain not solved from script | Sergiu-IT | IP Networking | 6 | 04-11-2008 02:52 AM |
| Kudda has successfully solved the downloading problems for numerous video web | angelstar | UNIX and Linux Applications | 0 | 04-10-2008 02:41 AM |
| Xdmcp, dns, exceed broadcast solved BUT | kymberm | IP Networking | 3 | 02-25-2003 07:47 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
can this been solved with awk and sed?
Hi Masters,
Code:
___________________________________________________________________________________
Group of orthologs #1. Best score 3010 bits
Score difference with first non-orthologous sequence - yeast:3010 human:2754
YHR165C 100.00% PRP8_HUMAN 100.00%
___________________________________________________________________________________
Group of orthologs #2. Best score 2100 bits
Score difference with first non-orthologous sequence - yeast:2033 human:1978
YLR106C 100.00% MDN1_HUMAN 100.00%
___________________________________________________________________________________
Group of orthologs #3. Best score 2082 bits
Score difference with first non-orthologous sequence - yeast:997 human:593
YJL130C 100.00% PYR1_HUMAN 100.00%
___________________________________________________________________________________
Group of orthologs #4. Best score 1959 bits
Score difference with first non-orthologous sequence - yeast:1959 human:1007
YKR054C 100.00% DYHC_HUMAN 100.00%
___________________________________________________________________________________
Group of orthologs #5. Best score 1855 bits
Score difference with first non-orthologous sequence - yeast:1855 human:1022
YNR016C 100.00% Q6KE87_HUMAN 100.00%
YMR207C 19.86% COA2_HUMAN 90.52%
COA1_HUMAN 53.30%
___________________________________________________________________________________
Group of orthologs #6. Best score 1838 bits
Score difference with first non-orthologous sequence - yeast:1748 human:1767
YDL140C 100.00% RPB1_HUMAN 100.00%
___________________________________________________________________________________
Group of orthologs #7. Best score 1768 bits
Score difference with first non-orthologous sequence - yeast:1768 human:1636
YJR066W 100.00% Q4LE76_HUMAN 100.00%
YKL203C 49.22%
Group_number; Best_Score; S_one; P_one; S_two; P_two 5;1855;YNR016C;100.00%;Q6KE87_HUMAN;100.00% 5;1855;YMR207C;19.86%;COA2_HUMAN;90.52% 5;1855;;;COA1_HUMAN;53.30% 7;1768;YJR066W;100.00%;Q4LE76_HUMAN;100.00% 7;1768;YKL203C;49%;; Thanks in Advance! Last edited by Perderabo; 11-08-2005 at 08:41 AM. Reason: Add code tags and disable smilies for readability |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Look at the example given:
if the last line of 5 is displayed as "5;1855;;;COA1_HUMAN;53.30%" shouldnt the last line of 7 be displayed as "7;1768;;;YKL203C;49%" instead of "7;1768;YKL203C;49%;;" ? |
|
#3
|
|||
|
|||
|
thx
No. The original file was,
empty empty record record for #5 record record empty empty for #7. When I posted the records, the empty spaces were missed. But it should be extracted as a empty space. Thanks again. |
|
#4
|
||||
|
||||
|
This is harder than it looks because fields are defined both by syntax and position. Here is a ksh script that works with your sample data. But any surprises in your real data could break it.
Code:
#! /usr/bin/ksh
IFS=""
while read line ; do
line=${line##+(_)}
((${#line})) || continue
if [[ "$line" != "Group of orthologs"* ]] ; then
echo error looking for start of record 1>&2
echo $line 1>&2
exit 1
fi
line=${line#"Group of orthologs #"}
Group_number=${line%%\.*}
line=${line#*"Best score "}
Best_Score=${line%" "*}
read line
if [[ $line != "Score difference with "* ]] ; then
echo "error stepping over 2nd line of group $Group_number" 1>&2
echo $line 1>&2
exit 1
fi
ProteinLines=1
while ((ProteinLines)) ; do
if read line ; then
line=${line##+(_)}
if ((!${#line})) ; then
ProteinLines=0
else
eval set $line
firstchar="${line%${line#?}}"
if [[ $# -eq 4 ]] ; then
S_one=$1
P_one=$2
S_two=$3
P_two=$4
else
if [[ $firstchar = [a-zA-Z0-9] ]] ; then
S_one=$1
P_one=$2
S_two=""
P_two=""
else
S_one=""
P_one=""
S_two=$1
P_two=$2
fi
fi
echo "${Group_number};${Best_Score};${S_one};${P_one};${S_two};${P_two};"
fi
else
ProteinLines=0
fi
done
done
exit 0
Code:
$ $ ./pro < data 1;3010;YHR165C;100.00%;PRP8_HUMAN;100.00%; 2;2100;YLR106C;100.00%;MDN1_HUMAN;100.00%; 3;2082;YJL130C;100.00%;PYR1_HUMAN;100.00%; 4;1959;YKR054C;100.00%;DYHC_HUMAN;100.00%; 5;1855;YNR016C;100.00%;Q6KE87_HUMAN;100.00%; 5;1855;YMR207C;19.86%;COA2_HUMAN;90.52%; 5;1855;;;COA1_HUMAN;53.30%; 6;1838;YDL140C;100.00%;RPB1_HUMAN;100.00%; 7;1768;YJR066W;100.00%;Q4LE76_HUMAN;100.00%; 7;1768;YKL203C;49.22%;;; $ |
|
#5
|
|||
|
|||
|
Heres with commandline PERL:
$ perl -ne 'chop; split; > if($_[0] eq "Group") > { $group=substr($_[3],1,length($_[3])-2);$score=$_[6];} > else{ > if($_ !~ /^\s*$/&&$_[0] ne "Score") > { if(@_==2){push(@_,"","");} > if(@_==3){unshift(@_,"");} > $string=join(";",@_); > print ("\n$group;$score;$string");}}' file_name Assumption(s): Your records can have only 4 elements at the maximum. That is , record/blank record/blank record/blank record/blank If you can tell me whether these are tab separated, I can help with a more robust code. |
|
#6
|
|||
|
|||
|
And as Perderabo says, any real surprises in the data could break it!
(Note that Perderabos' code generates trailing semi-colons which probably you do not need) |
|
#7
|
||||
|
||||
|
Quote:
echo "${Group_number};${Best_Score};${S_one};${P_one};${S_two};${P_two}" |
||||
| Google The UNIX and Linux Forums |