Deleting words between every appearance of two words


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Deleting words between every appearance of two words
# 1  
Old 09-24-2014
Deleting words between every appearance of two words

Hi there, newbie there. I've been browsing the forums hoping to find a solution that answers a problem similar to what I need, but haven't had much luck. Any help would be greatly appreciated. Thanks!

I need to delete a bunch of text between every appearance of two words in a really large file preferably using sed.

Eg:

Code:
>gi |544340954|emb|HG428755.1|:3360404-3364966 Fesd FMG
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi |307625127|gb|CP002167.1|:319702-324264  Fesd GFT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and this goes on and on, and basically I want to delete everything from 'gi' and 'Fesd' without affecting anything else throughout the entire file to get something like this:

Code:
>FMG
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>GFT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Can anyone help me out with this? Thanks heaps!

Moderator's Comments:
Mod Comment edit by bakunin: these two pairs of shining new CODE-tags was devoted to your cause courtesy of the moderation team.

Last edited by bakunin; 09-24-2014 at 01:53 PM..
# 2  
Old 09-24-2014
Quote:
Originally Posted by lendl
Hi there, newbie there. I've been browsing the forums hoping to find a solution that answers a problem similar to what I need, but haven't had much luck. Any help would be greatly appreciated. Thanks!

I need to delete a bunch of text between every appearance of two words in a really large file preferably using sed.

Eg:

>gi |544340954|emb|HG428755.1|:3360404-3364966 Fesd FMG
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>gi |307625127|gb|CP002167.1|:319702-324264 Fesd GFT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and this goes on and on, and basically I want to delete everything from 'gi' and 'Fesd' without affecting anything else throughout the entire file to get something like this:

>FMG
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>GFT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Can anyone help me out with this? Thanks heaps!
Hello lendl;

Welcome to forum, kindly use code tags while posting codes and commands in your post as per forum rules. Here is the solution for same.

Code:
awk '/gi.*Fesd/ {gsub(/[[:alpha:]]/,X,$1);print $1 $NF;next} 1' Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-24-2014 at 03:11 AM..
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 09-24-2014
As per your exact requirements, a space would remain between > and FMG or GFT.

The below sed command will produce the correct output:

Code:
sed 's/\(^.*\)gi.*Fesd\(.*$\)/\1\2/' file

To remove the space and get the output which you have shown you can amend the above command to the below:

Code:
sed 's/\(^.*\)gi.*Fesd[[:space:]]\(.*$\)/\1\2/' file

This User Gave Thanks to pilnet101 For This Post:
# 4  
Old 09-24-2014
Thank you very much for all your help! Much appreciated.

Also noted on the usage of code tags.
This User Gave Thanks to lendl For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting a list of words from a text file

Hello, I have a list of words separated by spaces I am trying to delete from a text file, and I could not figure out what is the best way to do this. what I tried (does not work) : delete="password key number verify" arr=($delete) for i in arr { sed "s/\<${arr}\>]*//g" in.txt } >... (5 Replies)
Discussion started by: Hawk4520
5 Replies

2. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies

3. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies

4. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies

5. Shell Programming and Scripting

Gawk gensub, match capital words and lowercase words

Hi I have strings like these : Vengeance mitt Men Vengeance gloves Women Quatro Windstopper Etip gloves Quatro Windstopper Etip gloves Girls Thermobite hooded jacket Thermobite Triclimate snow jacket Boys Thermobite Triclimate snow jacket and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies

6. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

7. UNIX for Dummies Questions & Answers

Deleting words and sorting

I have a file that looks some like this: I need to delete most of the information and sort the rest in such way that I get the following output file Any help will be greatly appreciated (3 Replies)
Discussion started by: Xterra
3 Replies

8. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

9. Shell Programming and Scripting

Deleting words between tags

Hi !!! I need to write a script(ksh) that deletes any character outside <start> tag and </start> from a file. For eg: $cat file.txt <start> ad bd </start> as</start> <start> d e f mb<start>mu g h i (7 Replies)
Discussion started by: PRKS
7 Replies

10. Shell Programming and Scripting

deleting symbols and characters between two words

Hi Please tell me how could i delete symbols, whitespaces, characters, words everything between two words in a line. Let my file is aaa BB ccc ddd eee FF kkk xxx 123456 BB 44^& iop FF 999 xxx uuu rrr BB hhh nnn FF 000 I want to delete everything comes in between BB and FF( deletion... (3 Replies)
Discussion started by: rish_max
3 Replies
Login or Register to Ask a Question