Stripping characters from a file and reformatting according to another one


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Stripping characters from a file and reformatting according to another one
# 1  
Old 07-21-2011
Stripping characters from a file and reformatting according to another one

Dear experts,
my problem is pretty tricky.
I want to change a file (see attached input.txt), according to another file (help.txt). The output that is desired is in output.txt. The example is attached.
Note that
-dashes should not be treated specially, they are considered normal characters, too.
-whatever is stripped from the first entry of the input.txt according to the help.txt should be stripped at the identical position in the second entry. (entries are separated by >)
I will really appreciate any help!
# 2  
Old 07-21-2011
You will have a better response if you display a small sample in the post.

Some people do not or will not take the time to download your file and unzip it.
# 3  
Old 07-21-2011
ok, you are right.
The files are as these:

input.txt:

>P1;baDHFR
structureX:baDHFR: 1: A:999 : B::::
MAAQGEPQVQFKLVLVGDGGTGKTTFVKRHLTGEFEKKYVATLGVEVHPLVFHTNRGPIKFNVWDTAGQEKFGGLRD
GYYIQAQCAIIMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSN
YNFEKPFLWLARKLIGDPNLEFVAMPALAPPEVVMDPALAAQYEHDLEVAQTTALPDEDDDL
/
SKKVKVSHRSHSTEPGLVLTLGQGDVGQLGLGENVMERKKPALVSIPEDVVQAEAGGMHTVCLSKSGQVYSFGCNDE
GALGRDTSVEGSEMVPGKVELQEKVVQVSAGDSHTAALTDDGRVFLWGSFRDNNGVIGLLEPMKKSMVPVQVQLDVP
VVKVASGNDHLVMLTADGDLYTLGCGEQGQLGRVPELFANRGGRQGLERLLVPKCVMLKSRGSRGHVRFQDAFCGAY
FTFAISHEGHVYGFGLSNYHQLGTPGTESCFIPQNLTSFKNSTKSWVGFSGGQHHTVCMDSEGKAYSLGRAEYGRLG
LGEGAEEKSIPTLISRLPAVSSVACGASVGYAVTKDGRVFAWGMGTNYQLGTGQDEDAWSPVEMMGKQLENRVVLSV
SSGGQHTVLLVKDKEQS*

>P1;hvDHFR
sequence:hvDHFR: 1: A: 999: B: :::
----AEPDVQFKLVLCGDGGTGKTTFVKRHLTGEFEKKYVATLGVEVHPLVFHTTRGTIKYNVWDTAGQEKFGGLRD
GYYIQAQCAIIMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSN
YNFEKPFLWLARKLVGDPNLEFVEMPALAPPEVVMDASLAAQYENDLKVAAETALPDEDDDL
/
-KKVKVSHSSHGQEKGLVLVLGQGDVGQLGLGEDIMERKRPALVTLPEGVVQVAAGGMHTVCLSDTGNIYTFGCNDE
GALGRETTEEGSEMVPGKVSLDERVVQVSAGDSHTAALTDDGAVYIWGSFRDNSGVIGLLEPMKKVTVPVKVPMKGP
VMKIASGNDHLVMLTTSGDLYTSGCGEQGQLGRVPELFANRGGRKGLLRLLIPQIVKVQSRGK---VHFTDAFCGAY
MTIAVSKEGHVYGFGLSNYHQLGTKLINTCFVPIKLTTFKNSTINWIGFSGGQHHTVCLDSAGKVYSLGRAEYGRLG
LGQGAEEKSEPTPVEGLDVAQVVACGASVSYAVTKQGSVYAWGMGTNLQLGTGEEDDEWSPVEMTGKQLENRIVLMV
ASGGQHTVLLVKDKQE-*


help.txt

>P1;1I2M
structureX:1I2M: 1 :A:+553 :B:::-1.00:-1.00
QVQFKLVLVGDGGTGKTTFVKRHLKKYVATLGVEVHPLVFHTNRGPIKFNVWDTAGQEKFGGLRDGYYIQAQCAI
IMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSNYNFEKPFL
WLARKLIGDPNLEFV/KVSHRSHSTEPGLVLTLGQGDVGQLGLGENVMERKKPALVSIPEDVVQAEAGGMHTVCL
SKSGQVYSFGCNDEGALGRDTSVEGSEMVPGKVELQEKVVQVSAGDSHTAALTDDGRVFLWGSFRDNNGVIGLLE
PMKKSMVPVQVQLDVPVVKVASGNDHLVMLTADGDLYTLGCGEQGQLGRVPELFANRGGRQGLERLLVPKCVMLK
HVRFQDAFCGAYFTFAISHEGHVYGFGLSNYHQLGTPGTESCFIPQNLTSFKNSTKSWVGFSGGQHHTVCMDSEG
KAYSLGRAEYGRLGLGEGAEEKSIPTLISRLPAVSSVACGASVGYAVTKDGRVFAWGMGTNYQLGTGQDEDAWSP
VEMMGKQLENRVVLSVSSGGQHTVLLVKD*


output.txt

>P1;baDHFR
structureX:baDHFR: 1: A:999 : B::::
QVQFKLVLVGDGGTGKTTFVKRHLKKYVATLGVEVHPLVFHTNRGPIKFNVWDTAGQEKFGGLRDGYYIQAQCAI
IMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSNYNFEKPFL
WLARKLIGDPNLEF/VKVSHRSHSTEPGLVLTLGQGDVGQLGLGENVMERKKPALVSIPEDVVQAEAGGMHTVCL
SKSGQVYSFGCNDEGALGRDTSVEGSEMVPGKVELQEKVVQVSAGDSHTAALTDDGRVFLWGSFRDNNGVIGLLE
PMKKSMVPVQVQLDVPVVKVASGNDHLVMLTADGDLYTLGCGEQGQLGRVPELFANRGGRQGLERLLVPKCVMLK
HVRFQDAFCGAYFTFAISHEGHVYGFGLSNYHQLGTPGTESCFIPQNLTSFKNSTKSWVGFSGGQHHTVCMDSEG
KAYSLGRAEYGRLGLGEGAEEKSIPTLISRLPAVSSVACGASVGYAVTKDGRVFAWGMGTNYQLGTGQDEDAWSP
VEMMGKQLENRVVLSVSSGGQHTVLLVKD*

>P1;hvDHFR
sequence:hvDHFR: 1: A: 999: B: :::
DVQFKLVLCGDGGTGKTTFVKRHLKKYVATLGVEVHPLVFHTTRGTIKYNVWDTAGQEKFGGLRDGYYIQAQCAI
IMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSNYNFEKPFL
WLARKLVGDPNLEF/VKVSHSSHGQEKGLVLVLGQGDVGQLGLGEDIMERKRPALVTLPEGVVQVAAGGMHTVCL
SDTGNIYTFGCNDEGALGRETTEEGSEMVPGKVSLDERVVQVSAGDSHTAALTDDGAVYIWGSFRDNSGVIGLLE
PMKKVTVPVKVPMKGPVMKIASGNDHLVMLTTSGDLYTSGCGEQGQLGRVPELFANRGGRKGLLRLLIPQIVKVQ
-VHFTDAFCGAYMTIAVSKEGHVYGFGLSNYHQLGTKLINTCFVPIKLTTFKNSTINWIGFSGGQHHTVCLDSAG
KVYSLGRAEYGRLGLGQGAEEKSEPTPVEGLDVAQVVACGASVSYAVTKQGSVYAWGMGTNLQLGTGEEDDEWSP
VEMTGKQLENRIVLMVASGGQHTVLLVKD*


---------- Post updated at 12:13 PM ---------- Previous update was at 12:07 PM ----------

note that in the input file,
MAAQGEPQVQ....EDDDL
SKKVKVS....DKEQS*
----A...EDDD
-KK...KDKQE-*

each are in one line with no spaces, pasting at the forum did the misformating. I fixed by editing it.

Last edited by TheTransporter; 07-21-2011 at 02:27 PM.. Reason: formating
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reformatting of an output file

Hi, i've got the following output file: 170724_1600 | SYSTEM | 449 | 282 | 167 | 62 170724_1600 | CCS_SCP_DATA | 200 | 88 | 112 | 44 170724_1600 | CCS_SCP_SUBS_I | 2001 | 1751 | 250 | 87 170724_1600 | UIS_CDR_INDEX | 2001 | 1 | 2000 | 0 170724_1600 | LCP_INDEX | 200 | 5 | 195 | 2... (4 Replies)
Discussion started by: nms
4 Replies

2. UNIX for Dummies Questions & Answers

Help reformatting input file

Hi, I have an input file that looks like this (columns are tab delimited: Data000005-RA GO:0003735 GO:0005840 GO:0006412 Data000005-RA GO:0003735 Data000009-RA GO:0003735 GO:0005622 GO:0005840 GO:0006412 ... (2 Replies)
Discussion started by: Fahmida
2 Replies

3. Shell Programming and Scripting

Stripping unwanted characters in field

I wrote myself a small little shell script to clean up a file I have issues with. In particular, I am stripping down a fully qualified host/domain name to just the hostname itself. The script works, but from a performance standpoint, it's not very fast and I will be working with large data sets. ... (4 Replies)
Discussion started by: dagamier
4 Replies

4. Shell Programming and Scripting

Reformatting a file for biological purpose

Dear ALL, I would really appreciate if you could help me in reformatting a file in this way: The file refers to a list of genetic coordinates, each lines has a score value and the associated chromosome is listed in the line starting with chrom . If more coordinates are found, the start... (2 Replies)
Discussion started by: paolo.kunder
2 Replies

5. Shell Programming and Scripting

awk multiple file reformatting

I hopefully have a simple request - I need to process multiple files reformatting the output based on tags at the beginning of each line. So the data for the new 3 lines of the output file are in the HDR line and then the details are in the DTL tagged lines. for ifile in $indir do echo... (1 Reply)
Discussion started by: jason_v_brown
1 Replies

6. Shell Programming and Scripting

Stripping characters from a variable

I'm using a shell script to get user input with this command: read UserInput I would then like to take the "UserInput" variable and strip out all of the following characters, regardless of where they appear in the variable or how many occurrences there are: \/":|<>+=;,?*@ I'm not sure... (5 Replies)
Discussion started by: nrogers64
5 Replies

7. Shell Programming and Scripting

Bash script - stripping away characters that can't be used in filenames

I want to create a temp file which is named based on a search string. The search string may contain spaces or characters that aren't supposed to be used in filenames so I want to strip those out. My thought was to use 'tr' with but the result is the opposite of what I want: $ echo "test... (5 Replies)
Discussion started by: mglenney
5 Replies

8. UNIX for Dummies Questions & Answers

Reformatting file

Hi, How can I reformat a file (text file) using unix command. This file was FTP'd from Mainframe and contains some garbage character at the end of each line. Each line contains special characters '<soh>' at the end which should have been spaces when I view it in emacs or nedit. I couldnt do find... (2 Replies)
Discussion started by: mrjunsy
2 Replies

9. Shell Programming and Scripting

stripping certain characters in at the middle of a string

I am trying to strip out certain characters from a string on both (left & right) sides. For example, line=see@hear|touch, i only want to echo the "hear" part. Well i have tried this approach: line=see@hear|touch templine=${line#*@} #removed "see@" echo ${templine%%\|*} #removed... (4 Replies)
Discussion started by: mcoblefias
4 Replies

10. Shell Programming and Scripting

stripping leftmost characters from string

Hi there, if i have some strings ie test_324423 test_242332 test_767667 but I only want the number part (the bolded bit) how do I strip the leftmost 5 characters from the output so that i will have just 324423 242332 767667 any help would be greatly appreciated Gary (5 Replies)
Discussion started by: hcclnoodles
5 Replies
Login or Register to Ask a Question