Delete strings in file1 based on the list of strings in file2


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete strings in file1 based on the list of strings in file2
# 1  
Old 01-23-2011
Question Delete strings in file1 based on the list of strings in file2

Hello guys,

should be a very easy questn for you:

I need to delete strings in file1 based on the list of strings in file2.

like file2:
word1_word2_
word3_word5_
word3_word4_
word6_word7_

file1:
word1_word2_otherwords..,word3_word5_others
word3_word4_otherwords....,word6_word7_others

.. and the output would be:

otherwords..,others
otherwords....,others

etc.

I tried it with the "for" of "while read" loops and sed, but no result: I guess the problems might be: 1. if read $line, the line ending is an obstacle, 2. if read $word, the "_" probably is..


thanx in advance
# 2  
Old 01-23-2011
See if this works:
Code:
awk -F'[_,]' 'NR==FNR{A[$1,$2]=1;next}{for (i=1;i<=NF;i++) if(A[$i,$(i+1)])sub($i FS $(i+1) FS,x)}1' file2 file1

# 3  
Old 01-23-2011
Thanks a lot, but this awk solution deletes from file1 only some part of the strings specified in file2.. why would it not recognize the rest?..
I'm attaching a small example of files 1 and 2, and the awk output, maybe that would make thing clearer.

Thanks again
# 4  
Old 01-23-2011
Hi that is because that solution was catered to entries in file2 with two underscores, like in the sample. Try this instead:
Code:
awk 'NR==FNR{A[$1];next}{for(i in A)gsub(i,x)}1' file2 file1

# 5  
Old 01-23-2011
Thanks for help: works fine!
Except in cases where a string is a substring of another one in file2 (eg, BACILLUS_Bacilli_ is a sub of LACTOBACILLUS_Bacilli_), in which cases there are leftovers after the deletion.
Is there a way to make awk find and delete only strings corresponding to entire lines in file2?

Last edited by roussine; 01-23-2011 at 06:17 PM..
# 6  
Old 01-23-2011
You could do a reverse length sort first:
Code:
awk '{print length($1),$1}' file2 | sort -rn | awk '{print $2}' > file2.sort

and then:
Code:
awk 'NR==FNR{A[++n]=$1;next}{for(i=1;i<=n;i++)gsub(A[i],x)}1' file2.sort file1

# 7  
Old 01-23-2011
Thanks indeed - you helped a lot. Cheers
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Delete rows based on array of strings

Hi All, I have a file with around 1000 rows and one of the fields is an account number. I have been provided with a few account numbers, when any of the account number appears in a row then the row has to be deleted. Can we place the account numbers in an array and use awk or sed for this task?... (3 Replies)
Discussion started by: mrcool4
3 Replies

2. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

3. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1. file1: EN_comp12952_c0_seq3:367-1668 ES_comp17168_c1_seq6:1-864 EN_comp13395_c3_seq14:231-1088... (5 Replies)
Discussion started by: pathunkathunk
5 Replies

4. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

[Solved] delete line from file1 by reading from file2

Hi All, I have to arrange one of the text file by deleting specific lines. cat file1.txt 3595 3595 -0.00842773 -0.0085077 0.00368851 12815 12815 -0.00929239 0.00439785 0.0291697 3747 3747 -0.00974353 0.00228922 0.0225058 3574 3574 -0.00711399 -0.00315748 0.0141206 .... 12734... (7 Replies)
Discussion started by: senayasma
7 Replies

7. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

8. UNIX for Dummies Questions & Answers

Delete lines with duplicate strings based on date

Hey all, a relative bash/script newbie trying solve a problem. I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like 2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies

9. Shell Programming and Scripting

replacing text in file1 with list from file2

I am trying to automate a process of searching through a set of files and replace all occurrences of a formatted text with the next item in the list of a second file. Basically i need to replace all instances of T????CLK???? with an IP address from a list in a second file. the second file is one IP... (9 Replies)
Discussion started by: dovetail
9 Replies

10. Shell Programming and Scripting

delete lines from file2 beginning w/file1

I've been searching around here and other places, but can't put this together... I've got a unique list of words in file 1 (one word on each line). I need to delete each line in file2 that begins with the word in file1. I started this way, but want to know how to use file1 words instead... (13 Replies)
Discussion started by: michieka
13 Replies
Login or Register to Ask a Question