Removing a portion of data in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing a portion of data in a file
# 1  
Old 09-16-2010
Removing a portion of data in a file

Hi,
I have a folder that contains many (multiple) files

1.fasta
2.fasta
3.fasta
4.fasta
5.fasta
.
.
100's of files

Each such file have data in the following format
for example:
vi 1.fasta


Code:
Code:
>AB_1
MLKKPIIIGVTGGSGGGKTSVSRAILDSFPNARIAMIQHDSYYKDQSHMSFEERVKTNYDHPLAFDTDFM
IQQLKELLAGRPVDIPIYDYKKHTRSNTTFRQDPQDVIIVEGILVLEDERLRDLMDIKLFVDTDDDIRII
RRIKRDMMERGRSLESIIDQYTSVVKPMYHQFIEPSKRYADIVIPEGVSNVVAIDVINSKIASILGEV
>AB_2
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_3
MTGYDDFNYALSALKLGADDYLLKPFSKADVEDMLGKLRKKLELSKKTETIQELVEQPQKEVSAIAMAIH
ERLADSDLTLKSLAQQLGFSPNYLSVLIKKELGMPFQDYLVQERLKKAKLFLLTSNLKIYEIAEQVGFED
MNYFSQRFKQLVGVTPSQYKKGGQA
>AB_4 
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_5  
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE

I would like to edit these files such a way that the data below
>AB_1 is removed (including the header) and have an output file like'

Code:
>AB_2
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_3
MTGYDDFNYALSALKLGADDYLLKPFSKADVEDMLGKLRKKLELSKKTETIQELVEQPQKEVSAIAMAIH
ERLADSDLTLKSLAQQLGFSPNYLSVLIKKELGMPFQDYLVQERLKKAKLFLLTSNLKIYEIAEQVGFED
MNYFSQRFKQLVGVTPSQYKKGGQA
>AB_4 
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE
>AB_5  
MRARLIYNPTSGQELMRKSVPEVLDILEGFGYETSAFQTTAKKNSALNEARRAAKAGFDLLIAAGGDGTI
NEVVNGIAPLKKRPKMAIIPTGTTNDFARALKVPRGNPSQAAKLIGKNQTIQMDIGRAKKDTYFINIAAA
GSLTELTYSVPSQLKTMFGYLAYLAKGVELLPRVSNVPVKITHDKGVFEGQVSMIFAAITNSVGGFEMIA
PDAKLDDGMFTLILIKTANLFEIVHLLRLILDGGKHITDRRVEYIKTSKIVIEPQCGKRMMINLDGEYGG
DAPITLENLKNHITFFADTDLISDDALVLDQDELEIEEIVKKFAHEVEDLEQELEE

Like wise I would it to all the files in the folder.
Please let me know the best way to do it in awk or sed/
LA
# 2  
Old 09-16-2010
This is just an observation, but why do most of your question look the same?

Did you learn nothing from previous answers?

Like this one, which seems to me to be the same question...

https://www.unix.com/shell-programmin...ther-file.html
# 3  
Old 09-16-2010
A perl one (test it with backups!):
Code:
perl -ni -e '$a=0 unless $ARGV eq $lf;$lf=$ARGV;if( /^>AB_2/){$a=1};print unless !$a;' *fasta


Last edited by Klashxx; 09-16-2010 at 01:04 PM.. Reason: Sorry Scottn , i didn't see your comment
# 4  
Old 09-16-2010
Dear Scottn,
Sorry for the confusion. I am using only the same type example data but this time it was different question.
LA

Moderator's Comments:
Mod Comment I apologise... saw the question, and input, and jumped to a hasty (wrong) conclusion


---------- Post updated at 12:45 PM ---------- Previous update was at 12:44 PM ----------

Sorry the perl code didn't work

Last edited by Scott; 09-16-2010 at 02:23 PM..
# 5  
Old 09-16-2010
The perl code works for the example provide
# 6  
Old 09-16-2010
Dear Klash xx

It worked..sorry it was my mistake....I didn't realize I have to target the second header.
LA
# 7  
Old 09-16-2010
Code:
for f in *.fasta; do
    printf '/^>/;/^>/-d\nw\nq\n' | ed -s "$f"
done

These 2 Users Gave Thanks to alister For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Archiving or removing few data from log file in real time

Hi, I have a log file that gets updated every second. Currently the size has grown to 20+ GB. I need to have a command/script, that will try to get the actual size of the file and will remove 50% of the data that are in the log file. I don't mind removing the data as the size has grown to huge... (8 Replies)
Discussion started by: Souvik Patra
8 Replies

2. Shell Programming and Scripting

Removing inline binary data from txt file

I am trying to parse a file but the filehas binary data inline mixed with text fields. I tried the binutils strings function , it get the binary data out but put the char following the binary data in a new line . input file app_id:1936 pgm_num:0 branch:TBNY ord_num:0500012(–QMK) deal_num:0... (12 Replies)
Discussion started by: tasmac
12 Replies

3. Shell Programming and Scripting

Unix Scripting : Sort a Portion of a File and not the complete file

Need to sort a portion of a file in a Alphabetical Order. Example : The user adam is not sorted and the user should get sorted. I don't want the complete file to get sorted. Currently All_users.txt contains the following lines. ############## # ARS USERS ############## mike, Mike... (6 Replies)
Discussion started by: evrurs
6 Replies

4. UNIX for Advanced & Expert Users

Removing portion of file name

Hi , I am getting file name like ABC_DATA_CUSTIOMERS_20120617.dat ABC_DATA_PRODUCTS_20120617.dat Need to convert CUSTIOMERS.dat PRODUCTS.dat Help me how to do this. (7 Replies)
Discussion started by: reach_malu
7 Replies

5. Shell Programming and Scripting

Extract portion of data

Hi Gurus, I need some help in extracting some of these information and massage it into the desired output as shown below. I need to extract the last row with the header in below sample which is usually the most recent date, for example: 2012-06-01 142356 mb 519 -219406 mb 1 ... (9 Replies)
Discussion started by: superHonda123
9 Replies

6. Shell Programming and Scripting

parsing a portion of Data from a text file

Hi All, I need some help to effectively parse out a subset of results from a big results file. Below is an example of the text file. Each block that I need to parse starts with "Output of GENE for sequence file 100.fasta" (next block starts with another number). I have given the portion of... (8 Replies)
Discussion started by: Lucky Ali
8 Replies

7. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

8. Shell Programming and Scripting

SFTP to server, pulling data and removing the data

Hi all, I have the following script, but are not too sure about the syntax to complete the script. In essence, the script must connect to a SFTP server at a client site with username and password located in a file on my server. Then change to the appropriate directory. Pull the data to the... (1 Reply)
Discussion started by: codenjanod
1 Replies

9. Shell Programming and Scripting

Extracting a portion of a data file with identifier

Hi, I do have a TAB delimted text file with the following format. 1 (- identifier of each group. this text is not present in the file only number) 1 3 4 65 56 WERTF 2 3 4 56 56 GHTYHU 3 3 5 64 23 VMFKLG 2 1 3 4 65 56 DGTEYDH 2 3 4 56 56 FJJJCKC 3 3 5 64 23 FNNNCHD 3 1 3 4 65 56 JDHJDH... (9 Replies)
Discussion started by: Lucky Ali
9 Replies

10. Shell Programming and Scripting

removing a portion of a code from a file

Hi everyone, I need to know how to remove a chunk of codes from a file for instance i have couple of lines which are commented out of the file and i need to remove that block. here is the example --#------------------------------------------------------------------ --# File name= ... (5 Replies)
Discussion started by: ROOZ
5 Replies
Login or Register to Ask a Question