The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Domain not solved from script Sergiu-IT IP Networking 6 04-11-2008 05:52 AM
Kudda has successfully solved the downloading problems for numerous video web angelstar UNIX and Linux Applications 0 04-10-2008 05:41 AM
Xdmcp, dns, exceed broadcast solved BUT kymberm IP Networking 3 02-25-2003 10:47 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 11-07-2005
mskcc mskcc is offline
Registered User
  
 

Join Date: Jul 2005
Posts: 37
can this been solved with awk and sed?

Hi Masters,

Code:
___________________________________________________________________________________
Group of orthologs #1. Best score 3010 bits
Score difference with first non-orthologous sequence - yeast:3010   human:2754
YHR165C             	100.00%		PRP8_HUMAN          	100.00%
___________________________________________________________________________________
Group of orthologs #2. Best score 2100 bits
Score difference with first non-orthologous sequence - yeast:2033   human:1978
YLR106C             	100.00%		MDN1_HUMAN          	100.00%
___________________________________________________________________________________
Group of orthologs #3. Best score 2082 bits
Score difference with first non-orthologous sequence - yeast:997   human:593
YJL130C             	100.00%		PYR1_HUMAN          	100.00%
___________________________________________________________________________________
Group of orthologs #4. Best score 1959 bits
Score difference with first non-orthologous sequence - yeast:1959   human:1007
YKR054C             	100.00%		DYHC_HUMAN          	100.00%
___________________________________________________________________________________
Group of orthologs #5. Best score 1855 bits
Score difference with first non-orthologous sequence - yeast:1855   human:1022
YNR016C             	100.00%		Q6KE87_HUMAN        	100.00%
YMR207C             	19.86%		COA2_HUMAN          	90.52%
                    	       		COA1_HUMAN          	53.30%
___________________________________________________________________________________
Group of orthologs #6. Best score 1838 bits
Score difference with first non-orthologous sequence - yeast:1748   human:1767
YDL140C             	100.00%		RPB1_HUMAN          	100.00%
___________________________________________________________________________________
Group of orthologs #7. Best score 1768 bits
Score difference with first non-orthologous sequence - yeast:1768   human:1636
YJR066W             	100.00%		Q4LE76_HUMAN        	100.00%
YKL203C             	49.22%
Above records are part of a file. What I need to do is to extract the information from this file and put them into a speadsheet format, Like this:(examples from #5 and #7 above)

Group_number; Best_Score; S_one; P_one; S_two; P_two
5;1855;YNR016C;100.00%;Q6KE87_HUMAN;100.00%
5;1855;YMR207C;19.86%;COA2_HUMAN;90.52%
5;1855;;;COA1_HUMAN;53.30%
7;1768;YJR066W;100.00%;Q4LE76_HUMAN;100.00%
7;1768;YKL203C;49%;;

Thanks in Advance!

Last edited by Perderabo; 11-08-2005 at 11:41 AM.. Reason: Add code tags and disable smilies for readability
  #2 (permalink)  
Old 11-08-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
Look at the example given:
if the last line of 5 is displayed as "5;1855;;;COA1_HUMAN;53.30%"
shouldnt the last line of 7 be displayed as "7;1768;;;YKL203C;49%" instead of "7;1768;YKL203C;49%;;" ?
  #3 (permalink)  
Old 11-08-2005
mskcc mskcc is offline
Registered User
  
 

Join Date: Jul 2005
Posts: 37
thx

No. The original file was,

empty empty record record for #5
record record empty empty for #7.

When I posted the records, the empty spaces were missed. But it should be extracted as a empty space. Thanks again.
  #4 (permalink)  
Old 11-09-2005
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,100
This is harder than it looks because fields are defined both by syntax and position. Here is a ksh script that works with your sample data. But any surprises in your real data could break it.
Code:
#! /usr/bin/ksh

IFS=""
while read line ; do
    line=${line##+(_)}
    ((${#line})) ||  continue
    if [[ "$line" != "Group of orthologs"* ]] ; then
        echo error looking for start of record 1>&2
        echo $line  1>&2
        exit 1
    fi
    line=${line#"Group of orthologs #"}
    Group_number=${line%%\.*}
    line=${line#*"Best score "}
    Best_Score=${line%" "*}
    read line
    if [[ $line != "Score difference with "* ]] ; then
        echo "error stepping over 2nd line of group $Group_number" 1>&2
        echo $line  1>&2
        exit 1
    fi
    ProteinLines=1
    while ((ProteinLines)) ; do
        if read line ; then
            line=${line##+(_)}
            if ((!${#line})) ; then
                ProteinLines=0
            else
                eval set $line
                firstchar="${line%${line#?}}"
                if [[ $# -eq 4 ]] ; then
                    S_one=$1
                    P_one=$2
                    S_two=$3
                    P_two=$4
                else
                    if [[ $firstchar = [a-zA-Z0-9] ]] ; then
                        S_one=$1
                        P_one=$2
                        S_two=""
                        P_two=""
                    else
                        S_one=""
                        P_one=""
                        S_two=$1
                        P_two=$2
                    fi
                fi
                echo "${Group_number};${Best_Score};${S_one};${P_one};${S_two};${P_two};"
            fi
        else
            ProteinLines=0
        fi
    done
done
exit 0
Code:
$
$ ./pro < data
1;3010;YHR165C;100.00%;PRP8_HUMAN;100.00%;
2;2100;YLR106C;100.00%;MDN1_HUMAN;100.00%;
3;2082;YJL130C;100.00%;PYR1_HUMAN;100.00%;
4;1959;YKR054C;100.00%;DYHC_HUMAN;100.00%;
5;1855;YNR016C;100.00%;Q6KE87_HUMAN;100.00%;
5;1855;YMR207C;19.86%;COA2_HUMAN;90.52%;
5;1855;;;COA1_HUMAN;53.30%;
6;1838;YDL140C;100.00%;RPB1_HUMAN;100.00%;
7;1768;YJR066W;100.00%;Q4LE76_HUMAN;100.00%;
7;1768;YKL203C;49.22%;;;
$
  #5 (permalink)  
Old 11-09-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
Heres with commandline PERL:

$ perl -ne 'chop; split;
> if($_[0] eq "Group")
> { $group=substr($_[3],1,length($_[3])-2);$score=$_[6];}
> else{
> if($_ !~ /^\s*$/&&$_[0] ne "Score")
> { if(@_==2){push(@_,"","");}
> if(@_==3){unshift(@_,"");}
> $string=join(";",@_);
> print ("\n$group;$score;$string");}}' file_name


Assumption(s):
Your records can have only 4 elements at the maximum.
That is ,
record/blank record/blank record/blank record/blank
If you can tell me whether these are tab separated, I can help with a more robust code.
  #6 (permalink)  
Old 11-09-2005
Abhishek Ghose Abhishek Ghose is offline
Registered User
  
 

Join Date: Sep 2005
Location: Chennai
Posts: 81
And as Perderabo says, any real surprises in the data could break it!
(Note that Perderabos' code generates trailing semi-colons which probably you do not need)
  #7 (permalink)  
Old 11-09-2005
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,100
Quote:
Originally Posted by Abhishek Ghose
(Note that Perderabos' code generates trailing semi-colons which probably you do not need)
Opps... to lose that trailing semicolon, change the echo statement...
echo "${Group_number};${Best_Score};${S_one};${P_one};${S_two};${P_two}"
Sponsored Links
Closed Thread

Bookmarks

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 04:48 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0