How to extract some parts of a file to create some outfile


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract some parts of a file to create some outfile
# 1  
Old 05-08-2009
How to extract some parts of a file to create some outfile

Hi All,
I am very new in programming. I need some help.
I have one input file like:

Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:
taxId2TaxLevel: 469951
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt

::::: and so on

I want to create some output like:
Out file1.txt(which grep from, next line of "Taxonomy:" upto "+++++++++++End" ) with no space in front of line and so on.

So the desired ouput will be:
outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253

outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890

and so on.

Can anybody please help me in this matter?

I tried with some code like this. But didn't workout.
--------------------------------------------------------------------------
#!/bin/tcsh
if $#argv != "1" then
echo "Usage: process-file-script 1st-output-file-as-inputfile"
exit 0
endif

FIL_NM=$1

str=""
cat $FIL_NM | while read LINE
do
if [ "`echo $LINE | awk '{print $1}'`" = "+++++++++++Begin" ] ; then
n=1
c=1
fi
if [ "`echo $LINE |grep Gamma`"] ; then
NEW_FIL_NM=$FIL_NM"_"$n.txt"
fi

fi
if [ "`echo $LINE | awk '{print $1}'`" = "+++++++++++End" ] ; then
n=0
fi
done
--------------------------------------------------------
Please help...
Many thanks in advance...
Best wishes,
Mitra
# 2  
Old 05-08-2009
Code:
nawk '
    /^Taxonomy/ {p=6;close(out);out="output" ++cnt ".txt";next}
    p &&p-- { print > out }' myInputFile

# 3  
Old 05-08-2009
if you have Python, here's an alternative solution
Code:
f=0;i=0
for line in open("file"):
    line=line.strip()
    if line.startswith("+++++++++++"): 
        f=0
        o.close()
    if "Taxonomy:" in line: 
        f=1;i=i+1
        o=open("out_"+str(i)+".txt","w")
    if f:
        print >>o, line

# 4  
Old 05-08-2009
Hallo ghostdog74,
Thanks for your reply. But I am sorry to say that I forgot to mention : in my input file there are not always only 6 lines. I just copied some lines.. This lines varies from 100 to 200. So it is necessary for the program to read +++++++++End.

Thanks a lot,
Mitra.
# 5  
Old 05-08-2009
And here's a perl solution:

Code:
$
$
$ cat input.txt
Number of disabled taxa: 9
Loading mapping file: ncbi.map
Load mapping:
taxId2TaxLevel: 469951
--- Subsample reads (20%): 66680 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
+++++++++++End of summary for file: B-Red-sum.txt
--- Subsample reads (20%): 67037 of 334386
Processing: tree-from-summary
Running tree-from-summary algorithm
Taxonomy:
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
+++++++++++End of summary for file: B-Red-sum.txt
::::: and so on
$
$
$
$ perl -ne '{$/=""; $i=1;
>   while (/^Taxonomy:.(.*?)\+{11}/msgi) {
>     open(OUT,">outfile".$i++.".txt"); print OUT $1; close(OUT);
>   }}' input.txt
$
$
$ cat outfile1.txt
Gammaproteobacteria: 2767
Alphaproteobacteria: 4123
Deltaproteobacteria: 1343
Epsilonproteobacteria: 26
Not assigned: 1445
No hits: 220253
$
$
$ cat outfile2.txt
Gammaproteobacteria: 2809
Alphaproteobacteria: 4001
Deltaproteobacteria: 1208
Epsilonproteobacteria: 15
Not assigned: 299
No hits: 461890
$
$

tyler_durden
# 6  
Old 05-08-2009
Code:
nawk '
   /^Taxonomy/ {p++;close(out);out="output" ++cnt ".txt";next}
   /^[+]+End/ { p=0}
   p { print > out }' myInputFile

# 7  
Old 05-08-2009
Quote:
Originally Posted by iammitra
Hallo ghostdog74,
Thanks for your reply. But I am sorry to say that I forgot to mention : in my input file there are not always only 6 lines. I just copied some lines.. This lines varies from 100 to 200. So it is necessary for the program to read +++++++++End.

Thanks a lot,
Mitra.
well, i am not sure i get you, but i see other solutions include "End', therefore if you are sure that ++++++++ is not unique, you can add "End"
Code:
....
if line.startswith("+++++++++++End"): 
....

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

TCL script to extract the file name and then create two independent list

I am having one problem as stated below Problem Description I am having some "sv" extension files , I am using "glob" to extract the matching files , Now in these matching files , I need to split them and extract the elements and create different lists. For example set files This... (1 Reply)
Discussion started by: kshitij
1 Replies

2. Shell Programming and Scripting

Extract parts of the line

I have a long list of lines in a txt file which i'm only interested to extract the list of domains like the colored ones. domain.com domain.com/page codes $.09 domain.org domain.org/page2/ codes $0.10 domain.net domain.net/page03 codes $0.05 domain.info ... (3 Replies)
Discussion started by: garfish
3 Replies

3. Solaris

Solaris 10 error "-sh: /tmp/outfile: cannot create"

Hello, Each time a user log on to host, they receive below error: -sh: /tmp/outfile: cannot create Example: $ ssh host user@host's password: Last login: Fri Dec 4 08:17:28 2015 from client.ref |-----------------------------------------------------------------| -sh:... (2 Replies)
Discussion started by: feroccimx
2 Replies

4. Shell Programming and Scripting

Incrementing parts of ten digits number by parts

I have number in file which contains date and serial number: 2013101000. The last two digits are serial number (00). So maximum of serial number is 100. After reaching 100 it becomes 00 with incrementing 10 which is day with max 31. after reaching 31 it becomes 00 and increments 10... (31 Replies)
Discussion started by: Natalie
31 Replies

5. Shell Programming and Scripting

Extract Parts of File

Hello All, I have a file like this Define schema flat_file_schema ( a varchar(20) ,b varchar(30) ,c varchar(40) ); (Insert into table ( a ,b ,c ) values ( 1 ,2 ,3 ); (4 Replies)
Discussion started by: nnani
4 Replies

6. Shell Programming and Scripting

extract certain parts from a file

I have a logfile from which i need to extract certain pattern based on the time but the problem here is the time is not same for all days. Input file: Mon 12:34:56 abvjingjgg Mon 12:34:57 ofjhjgjhgh . . . Mon 22:30:00 kkfng . . . Mon 23:12:23 kjgsdafhkljf . . . Tue 01:04:54... (8 Replies)
Discussion started by: gpk_newbie
8 Replies

7. Shell Programming and Scripting

awk? create similarity matrix by calculating overlaps between sets comprising of individual parts

Hi everyone I am very new at awk and to me the task I need to get done is very very challenging... Nevertheless, after admiring how fast and elegant issues are being solved here I am sure this is my best chance. I have a 2D data file (input file is a plain tab-delimited text file). The first... (1 Reply)
Discussion started by: stonemonkey
1 Replies

8. Shell Programming and Scripting

Create shell script to extract unique information from one file to a new file.

Hi to all, I got this content/pattern from file http.log.20110808.gz mail1 httpd: Account Notice: close igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1 mail1 httpd: Account Information: login sastria9@abc.com proxy sid=gFp4DLm5HnU mail1 httpd: Account Notice: close sastria9@abc.com... (16 Replies)
Discussion started by: Mr_47
16 Replies

9. Shell Programming and Scripting

Extract date from filename and create a new file

Hi, i have a filename CRED20102009.txt in a server 20102009 is the date of the file ddmmaaaa format the complete route is /dprod/informatica/Fuentes/CRED20102009.csv i want to extract the date to create a new file named Parameters.txt I need to create Parameters.txt with this... (6 Replies)
Discussion started by: angel1001
6 Replies

10. Shell Programming and Scripting

extract columns from 2 different files and create new file

Hi All, I have 2 issues while working with file. 1. I have 2 delimited(~) files. I want to extract column numbner 3 from file1 and column number 8 from file2 and paste it into file3. I have tried using cut, but not able to get answer. 2. i have 2 filxed-width file. I wanted to do same... (1 Reply)
Discussion started by: Amit.Sagpariya
1 Replies
Login or Register to Ask a Question