Please help to write a executable script for extracting some parts of a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Please help to write a executable script for extracting some parts of a file
# 1  
Old 05-13-2009
Please help to write a executable script for extracting some parts of a file

Hi All,
I am very new in programming. I need some help.
I have one input file like:

Code:
Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:
taxId2TaxLevel: 469951
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
          Gammaproteobacteria: 2767
       Alphaproteobacteria: 4123
         Deltaproteobacteria: 1343
                         Epsilonproteobacteria: 26
     Betaproteobacteria: 397
                        unclassified Proteobacteria: 48
                  Spirochaetes (class): 15
        Nitrospira (class): 1
        Bacilli: 25
  Not assigned: 1445
  No hits: 220253
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
    Gammaproteobacteria: 2809
                Alphaproteobacteria: 4001
       Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt

::::: and so on

I want to create some output like:
Out file1.txt(which grep from, next line of "Taxonomy:" upto "+++++++++++End" ) with no space in front of line and so on.

So the desired ouput will be: (with no space in front of the names)
outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Betaproteobacteria: 397
unclassified Proteobacteria: 48
Spirochaetes (class): 15
Nitrospira (class): 1
Bacilli: 25
Not assigned: 1445
No hits: 220253

outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890

and so on.

I got several helps from this forum. Thanks to durden_tyler, ghostdog74 and summer_cherry .SmilieSmilieSmilie
But still there is little problem so I am posting again. Smilie
Can anybody please help me in this matter?

The perl code below works (without the marked line in the code) but provides out file with spaces in front of line. I tried to get rid of the space but couldn't.


--------------------------------------------------------------------------
Code:
#!/usr/bin/perl -w

$#ARGV==0 or die "Usage: 2ndprocess-megan-script 1st-output-file-as-inputfile\n";

$i=1;
while (<>){
chomp;
   
 if (/Taxonomy:/ ) { 
     $x = $1; $x =~ s/^\s+|\s+$//g;    ##for this line there is the error, without this line it works
     open(OUT,">>","output_".$i++) or die "Cannot open for writing:$!\n";
     $f=1; next;
 }
 
 if (/\+*End of summary for file/ ){
    $f=0;close(OUT);next;
 }
 if ($f) { print OUT $_."\n";}
}

--------------------------------------------------------

One more code by durden_tyler works perfectly (but only in terminal, I failed to create a executable file with this)
----------------------------------------------------------
Code:
perl -ne '{$/=""; $i=1;
  while (/^Taxonomy:.(.*?)\+{11}/msgi) {
    $x = $1; $x =~ s/(^|\n)\s+/\1/g;
    open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
  }}' input.txt

I tryed in this way(below). But couldn't make it.
Code:
#!/usr/bin/perl -w

$#ARGV==0 or die "Usage: 2ndprocess-script 1st-output-file-as-inputfile\n";

$input=shift;

perl -ne '{$/=""; $i=1;
while (/^Taxonomy:.(.*?)\+{11}/msgi) {
$x = $1; $x =~ s/(^|\n)\s+/\1/g;
open(OUT,">outfile".$i++.".txt"); print OUT $x; close(OUT);
}}' $1;

-------------------------------------------------------------------

From the help with Python also I couldn't make it properly
% code.py
Usage: code.py <input file>
%

---------------------------------------------------------
Code:
#! /usr/bin/python

f=0;i=0
for line in open(input file):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

Please help in this matter. I want to prepare one executable script (perl/python/pr bash)..
Many thanks in advance...
Please help.
Best wishes,
Mitra
# 2  
Old 05-13-2009
Quote:
Originally Posted by iammitra
Code:
#! /usr/bin/python

f=0;i=0
for line in open(input file):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

first, where did you define "input file"?? input file should be defined
eg
for line in open("input file") <------ means you want to open the file with name of "input file"

also the output of the script has spaces removed.
Code:
# ./test.py
# more out_2.txt
Taxonomy:
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890

# more out_1.txt
Taxonomy:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Betaproteobacteria: 397
unclassified Proteobacteria: 48
Spirochaetes (class): 15
Nitrospira (class): 1
Bacilli: 25
Not assigned: 1445
No hits: 220253

so i don't see why it don't work for you.
# 3  
Old 05-13-2009
Hallo,
Thanks for your reply.
I am very new in programming. Probably that is why I couldn't make it.
My try was :
Code:
#! /usr/bin/python
input=$1
inputfile="`pwd`/$name"
f=0;i=0
for line in open(inputfile):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

and tried to execute by:
./code.py filename

Please help. I am really trying to learn.
Thanks a lot,
Mitra.
# 4  
Old 05-13-2009
you are mixing shell syntax with Python.
Code:
#! /usr/bin/python
import sys
inputfile=sys.argv[1]
f=0;i=0
for line in open(inputfile):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

on command line, just give : python myscript.py inputfile

if you want to use the script, at least read up on Python and how to use it. same with Perl if you want to use the Perl script, read the documentation.
# 5  
Old 05-13-2009
Hallo ghostdog74,
Thank you very much for your help. I am trying to learn with the tutorials and documentation. But being new I am always mixing these. Sorry for that.
Thank you very much once again.
Best,
Mitra
# 6  
Old 05-13-2009
Hallo ghostdog74,
I am sorry if I am again making any mistake. I used the code as you said. but still there is some problem in o.close()

Code:
#! /usr/bin/python
import sys
inputfile=sys.argv[1]
f=0;i=0
for line in open(inputfile):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

The error is :
Code:
mitra:testNextPart mitra$ ./2ndprocess-2.py 1st-output.txt 
Traceback (most recent call last):
  File "./2ndprocess-2.py", line 9, in <module>
    o.close()
NameError: name 'o' is not defined

If I make one false line(o=) as below
Code:
#! /usr/bin/python
import sys
inputfile=sys.argv[1]
f=0;i=0
o=open("out_"+str(i)+".txt","w")

for line in open(inputfile):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

Then it works perfectly (only creating one extra blank file out_0.txt)
But can you please tell me whats the problem. Sorry if I am disturbing you so much. But I am really trying to learn.
Best regards,
Mitra.
# 7  
Old 05-13-2009
remove o.close()
Code:
#! /usr/bin/python
import sys
inputfile=sys.argv[1]
f=0;i=0
for line in open(inputfile):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to encrypt the xls file using executable jar in Linux SUSE 11.4

Dear Experts, I am an ERP consultant and would like to learn shell script. We are working on Linux SUSE 11.4 and I am very new to shell scripting. We can manually encrypt an excel file using "executable jar" through command prompt by placing the jar file & the file to be encrypted on a physical... (1 Reply)
Discussion started by: nithin226
1 Replies

2. Shell Programming and Scripting

Incrementing parts of ten digits number by parts

I have number in file which contains date and serial number: 2013101000. The last two digits are serial number (00). So maximum of serial number is 100. After reaching 100 it becomes 00 with incrementing 10 which is day with max 31. after reaching 31 it becomes 00 and increments 10... (31 Replies)
Discussion started by: Natalie
31 Replies

3. Shell Programming and Scripting

Need to search a particular String form a file a write to another file using perl script

I have file which contains a huge amount of data. I need to search the pattern Message id. When that pattern is matched I need to get abcdeff0-1g6g-91g3-1z2z-2mm605m90000 to another file. Kindly provide your input. File is like below Jan 11 04:05:10 linux100 |NOTICE... (2 Replies)
Discussion started by: Raysf
2 Replies

4. UNIX for Dummies Questions & Answers

Extracting parts from an absolute path

Hi, How can I extract parts from an absolute path? For example : The absolute path is /dir1/dir2/dir3/dir4/dir5.I need the relative path starting with directory given as parameter : for instance if the parameter is dir3 then the result should be dir3/dir4/dir5 I need generic solution... (9 Replies)
Discussion started by: mortanon
9 Replies

5. Shell Programming and Scripting

Write an executable file in Unix

Hi, I want to write an executable file in unix env to go to a particular path instead of always typing the long path cd /app/oracle/product/10.2.0/Db_1/scripts/prejib/sample. I have tried with the below script in but not working . please help me bash-3.00$ cat a.sh #!/bin/sh ... (3 Replies)
Discussion started by: prejib
3 Replies

6. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Dear list its my first post and i would like to greet everyone What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters. The file format is something like this 7 100 200 7 100 250 7 100 300 ... (1 Reply)
Discussion started by: Gtolis
1 Replies

7. Shell Programming and Scripting

Extracting parts of a file.

Hello, I have a XML file as below and i would like to extract all the lines between <JOB & </JOB> for every such occurance. The number of lines between them is not fixed. Anyways to do this awk? ============ <JOB APR="1" AUG="1" DEC="1" FEB="1" JAN="1" JUL="1" JUN="1" MAR="1" MAY="1"... (3 Replies)
Discussion started by: srivat79
3 Replies

8. UNIX for Dummies Questions & Answers

running command prompt executable file in shell script

hi i have file extentioned with test.vbs. i am able to run this file n execute through command promt but i dont know how to run in shell script example: file name is test.vbs which contains strSoundFile = "C:\windows\Media\Notify.wav" Set objShell = CreateObject("Wscript.Shell") strCommand... (5 Replies)
Discussion started by: atl@mav
5 Replies

9. Shell Programming and Scripting

Calling an Executable C file from the script (URGENT HELP PLX ! )

hi i'm trying to use tcl/tk on unix machine to call an executable C file .. i am trying just a simple button like this one button .list -text "LIST" -command filename pack .list -padx 10 -pady 10 but its giving me error message when i save it in a file eg script.tcl the button is... (7 Replies)
Discussion started by: phantom308
7 Replies

10. Shell Programming and Scripting

filter parts of a big file using awk or sed script

I need an assistance in file generation using awk, sed or anything... I have a big file that i need to filter desired parts only. The objective is to select (and print) the report # having the string "apple" on 2 consecutive lines in every report. Please note that the "apple" line has a HEX... (1 Reply)
Discussion started by: apalex
1 Replies
Login or Register to Ask a Question