Remove or rename based on contents of file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove or rename based on contents of file
# 1  
Old 05-09-2015
Remove or rename based on contents of file

I am trying to use the two files shown below to either remove or rename contents in one of those files. If in file1.txt $5 matches $5 of file2.txt and the value in $1 of file1.txt is not "No Match" then that value is substituted for all values in $5 and $1 of file2.txt. If however in $1 of file1.txt the value is "No Match", then the row in file2.txt with that in it and the one below it are removed. Thank you Smilie.

Contents of file1.txt
Code:
file1.txt
No Match	chr1	35696	36106	DTE3504500000004
PXL-A0000005	chr1	69066	69311	DTE3504500000005

Contents of file2.txt
Code:
RefPrimer	ref	antiref	omosome	PrimerSet	SeqRxn
AntirefPrimer	antiref	ref	omosome		
DTE3504500000001ref	34529	35031	1	DTE3504500000001	SeqRxn4
DTE3504500000001antiref	35031	34529	1		
DTE3504500000002ref	35032	35283	1	DTE3504500000002	SeqRxn4
DTE3504500000002antiref	35283	35032	1		
DTE3504500000003ref	35284	35506	1	DTE3504500000003	SeqRxn4
DTE3504500000003antiref	35506	35284	1		
DTE3504500000004ref	35696	36106	1	DTE3504500000004	SeqRxn4
DTE3504500000004antiref	36106	35696	1		
DTE3504500000004ref	69066	69311	1	DTE3504500000004	SeqRxn4
DTE3504500000004antiref	69311	69066	1

For example,
"DTE3504500000004" is the value of $5 in file1.txt and that matches row 3 of file2.txt $5 , since the value in $1 of file1.txt is "No Match", rows 3 and 4 are removed from file2.txt.

"DTE3504500000005" is the value of $5 in file1.txt and that matches row 9 of file2.txt $5 , since the value in $1 of file1.txt is not "No Match", rather "PXL-A0000005" that new value is used to replace all occurrences of the old value.

Desired output:
Code:
RefPrimer	ref	antiref	omosome	PrimerSet	SeqRxn
AntirefPrimer	antiref	ref	omosome		
	(rows 3 and 4 removed)
DTE3504500000002ref	35032	35283	1	DTE3504500000002	SeqRxn4
DTE3504500000002antiref	35283	35032	1		
DTE3504500000003ref	35284	35506	1	DTE3504500000003	SeqRxn4
DTE3504500000003antiref	35506	35284	1		
DTE3504500000004ref	35696	36106	1	DTE3504500000004	SeqRxn4
PXL-A0000005ref	69066	69311	1	PXL-A0000005	SeqRxn4
PXL-A0000005antiref	69311	69066	1

# 2  
Old 05-10-2015
I'm lost.

The first line in file1.txt has DTE3504500000004 in field 5. From your description (with the 1st field on that line being No Match), the last four lines of file2.txt should have been removed; not the 3rd and 4th lines.

The second line in file1.txt has DTE3504500000005 in field 5. Since that string does not appear in file2.txt, why should anything in file2.txt be changed because of that line?
# 3  
Old 05-11-2015
I hope this is more clear:
I am trying to use the two files shown below to either remove or rename contents in one of those files. If in combine.txt $5 matches $5 of output.txt and the value in $1 of combine.txt is not "No Match" then that value is substituted for all values in $5 and $1 of output.txt. If however in $1 of combine.txt the value is "No Match", then the row in output.txt with that $5 value in it and the one below it are removed. Thank you Smilie.

For example,
"DTE3504500000004" is the value of $5 in combine.txt and that matches row 3 of output.txt $5 , since the value in $1 of combine.txt is "No Match", rows 9 and 10 are removed from output.txt.

"DTE3504500000005" is the value of $5 in combine.txt and that matches row 11 of output.txt $5 , since the value in $1 of combine.txt is not "No Match", rather "PXL-A0000005" that new value is used to replace all occurrences of the old value in output.txt.

Code:
file1.txt
No Match    chr1    35696    36106    DTE3504500000004
PXL-A0000005    chr1    69066    69311    DTE3504500000005

Code:
Initial output.txt:
RefPrimer    ref    antiref    omosome    PrimerSet    SeqRxn
AntirefPrimer    antiref    ref    omosome        
DTE3504500000001ref    34529    35031    1    DTE3504500000001    SeqRxn4
DTE3504500000001antiref    35031    34529    1        
DTE3504500000002ref    35032    35283    1    DTE3504500000002    SeqRxn4
DTE3504500000002antiref    35283    35032    1        
DTE3504500000003ref    35284    35506    1    DTE3504500000003    SeqRxn4
DTE3504500000003antiref    35506    35284    1        
DTE3504500000004ref    35696    36106    1    DTE3504500000004    SeqRxn4
DTE3504500000004antiref    36106    35696    1        
DTE3504500000005ref    69066    69311    1    DTE3504500000005    SeqRxn4
DTE3504500000005antiref    69311    69066    1

Code:
Desired output.txt:
RefPrimer    ref    antiref    omosome    PrimerSet    SeqRxn
AntirefPrimer    antiref    ref    omosome        
DTE3504500000001ref    34529    35031    1    DTE3504500000001    SeqRxn4
DTE3504500000001antiref    35031    34529    1        
DTE3504500000002ref    35032    35283    1    DTE3504500000002    SeqRxn4
DTE3504500000002antiref    35283    35032    1        
DTE3504500000003ref    35284    35506    1    DTE3504500000003    SeqRxn4
DTE3504500000003antiref    35506    35284    1        
PXL-A0000005ref    69066    69311    1    PXL-A0000005    SeqRxn4
PXL-A0000005antiref    69311    69066    1


Last edited by cmccabe; 05-11-2015 at 03:09 PM..
# 4  
Old 05-11-2015
Assuming that when you said:
Code:
file1.txt
No Match    chr1    35696    36106    DTE3504500000004
PXL-A0000005    chr1    69066    69311    DTE3504500000005

is what is in combine.txt, you really meant that the file you referred to as combine.txt is really named file1.txt (rather than the first line of combine.txt containing the line file1.txt, then maybe something like:
Code:
awk -F'\t' '
NR == FNR {
	r[$5] = $1
	next
}
FNR > 2 && m <= 0 && $5 in r {
	p = $5
	m = 2
}
m-- > 0 {
	if(r[p] == "No Match") 
		next
	gsub(p, r[p])
}
1' file1.txt output.txt > output.$$ && cp output.$$ output.txt && rm -f output.$$

will do what you want.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 05-12-2015
Works great.... thank you Smilie.
# 6  
Old 05-13-2015
Quote:
Originally Posted by Don Cragun
Assuming that when you said:
Code:
file1.txt
No Match    chr1    35696    36106    DTE3504500000004
PXL-A0000005    chr1    69066    69311    DTE3504500000005

is what is in combine.txt, you really meant that the file you referred to as combine.txt is really named file1.txt (rather than the first line of combine.txt containing the line file1.txt, then maybe something like:
Code:
awk -F'\t' '
NR == FNR {
    r[$5] = $1
    next
}
FNR > 2 && m <= 0 && $5 in r {
    p = $5
    m = 2
}
m-- > 0 {
    if(r[p] == "No Match") 
        next
    gsub(p, r[p])
}
1' file1.txt output.txt > output.$$ && cp output.$$ output.txt && rm -f output.$$

will do what you want.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.
Holy shit.

Moderator's Comments:
Mod Comment edit by bakunin: please note that swearing is not allowed in our forum. I can understand that you are being impressed by Don Craguns awk skills (he manages to make me stand in awe every time he exerts them) but nevertheless i ask you to voice your admiration in a more family-compatible fashion.

Furthermore, contentless posts like yours are not welcome in our forums. Please consider other ways, like, for instance, using the "thanks"-feature on Dons posting. Thank you for your consideration.

Last edited by bakunin; 05-13-2015 at 10:08 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Rename file in directory using contents within each file

In the below there are two generic .vcf files (genome.S1.vcf and genome.S2.vcf) in a directory. There wont always be two genaric files but I am trying to use bash to rename each of these generic files with specfic text (unique identifier) within in each .vcf. The text will always be different, but... (11 Replies)
Discussion started by: cmccabe
11 Replies

2. Shell Programming and Scripting

How to remove contents from file which are under bracket?

hello Friend, In hostgroup file, i have define lots of hostgroups. I need to remove few of them without manually editing file. Need script or syntax. I want to search particular on hostgroup_members and delete hostgoup defination of it. for example. define hostgroup{ hostgroup_name... (8 Replies)
Discussion started by: ghpradeep
8 Replies

3. Shell Programming and Scripting

How to read contents in each file and rename the file?

Hello All, Can you help me in writing a script for reading the specific position data in a file and if that data found in that file that particular file should be renamed. Ex: Folder : C:\\test and Filename : CLSACK_112214.txt,CLSACK_112314.txt,CLSACK_112414.txt Contents in the file would... (3 Replies)
Discussion started by: nanduedi
3 Replies

4. Shell Programming and Scripting

How to remove a line based on contents of the first column?

Good day all. Using basic UNIX/Linux tools, how would you delete a line based on a character found in column 1? For example, if the CITY name contains an 'a' or 'A', delete the line: New York City; New York Los Angeles; California Chicago; Illinois Houston; Texas Philadelphia;... (3 Replies)
Discussion started by: BRH
3 Replies

5. Shell Programming and Scripting

File comparison based on contents

Hi I have 2 files 1.del ---- 1,2,3,4,5 1,2,3,4,4 1,1,1,1,2 2.del ---- 1,2,3,4,5 1, 1,2,3,4,4 1,1,1,1,2 I need to compare the above two files in unix, as in the output should only tell the difference in contents as I should get only the line 1 ( from 2.del) , rest all lines are... (4 Replies)
Discussion started by: Ethen561
4 Replies

6. Shell Programming and Scripting

Remove all digits and rename a file

Hi, I have a file nexus-1234 in a directory. I want to generate a random number and replace the 1234 with it and rename the file. So nexus-1234 becomes nexus-2863 after running the script. Any help is appreciated. Thanks in advance. (2 Replies)
Discussion started by: scorpioraghu
2 Replies

7. Shell Programming and Scripting

Remove lines based on contents of another file

So, this issue is driving me nuts! I was hoping to get a lending hand here... I have 2 files: file1.txt contains: this is example1 this is example2 this is example3 this is example4 this is example5 file2.txt contains: example3 example5 Basically, I need a script or command to... (4 Replies)
Discussion started by: bashshadow1979
4 Replies

8. Shell Programming and Scripting

Read File and use contents to rename another

Hello guys, thank God that I found this forum. I hope that someone can help me because I don't have any idea on how to start it. I know that for some of you this is a very simple task but I'm not as advance on shell scripting like many people out there. I got this file with a permanent... (10 Replies)
Discussion started by: Shark Tek
10 Replies

9. Solaris

remove the contents of a file

Hi Let say a flat file contains 1000 lines. The cursor is at the 530 line number. Now I like to delete all the line at one ahot. how it can be done? (2 Replies)
Discussion started by: surjyap
2 Replies
Login or Register to Ask a Question