sed replace file contents by reading from another file


 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support sed replace file contents by reading from another file
# 1  
Old 11-15-2016
sed replace file contents by reading from another file

Hello,

My input file1 is like this by tab-delimited

Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";

My input file2 is like this
Code:
uc007aeu.1	Xkr4
uc011wht.1	Tcea1

Now I want to replace the contents of inputfile1 (after gene_id and transcript_id) with the second column value in inputfile2. I did try by separating out the columns and joining based on the columns but since join needs to sort and I DO NOT want this order of input file to be sorted, it is becoming hard for me to get output. Any ideas are highly appreciated.

Please note that the input file row order should not be changed.

Thanks
# 2  
Old 11-15-2016
Not sure what you mean exactly. Perhaps something like this?
Code:
awk 'NR==FNR{A[$1]=$2; next} $2 in A{$2=$4=A[$2]}1' FS='\t' file2 FS=\" OFS=\" file1

Output:
Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1";

--
Or did you mean:
Code:
awk 'NR==FNR{A[$1]=$2; next} $2 in A{$0=$0 A[$2]}1' FS='\t' file2 FS=\" file1

Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";Tcea1


Last edited by Scrutinizer; 11-15-2016 at 03:59 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 11-15-2016
That specification is not too clear. If I interpreted it correctly, try
Code:
awk 'NR == FNR {T["\"" $1 "\";"] = $2; next} $12 in T {sub ($12 ".$", "& " T[$12])} 1' file2 file1

This User Gave Thanks to RudiC For This Post:
# 4  
Old 11-15-2016
Exactly what I was looking for. Thank you @Scrutinizer
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ksh Script, Reading A File, Grepping A File Contents In Another File

So I'm stumped. First... APOLOGIES... my work is offline in an office that has zero internet connectivity, as required by our client. If need be, I could print out my script attempts and retype them here. But on the off chance... here goes. I have a text file (file_source) of terms, each line... (3 Replies)
Discussion started by: Brusimm
3 Replies

2. Shell Programming and Scripting

sed - Replace string with file contents

Hello, I have two files: file1 and file2 file1 has the following info: --- host: "localhost" port: 3000 reporter_type: "zookeeper" zk_hosts: - "localhost:2181" file2 contains an IP address (1.1.1.1) What I want to do is replace localhost with 1.1.1.1, so that the... (4 Replies)
Discussion started by: Jay Kah
4 Replies

3. Shell Programming and Scripting

sed to replace specific positions on line with file contents

Hi, I am trying to use an awk command to replace specific character positions on a line beginning with 80 with contents of another file. The line beginning with 80 in file1 is as follows: I want to replace the 000000000178800 (positions 34 - 49) on this file with the contents of... (2 Replies)
Discussion started by: nwalsh88
2 Replies

4. UNIX for Dummies Questions & Answers

Looping/Reading file contents not working

Hi, I am doing something basic, but I am missing something. Im trying to read the contents of a file and taking those values and connecting to a database. However, it only connect to one (or reads in) value and then exists. Here is what it looks like: listname.txt db1 db2 db3 Script:... (15 Replies)
Discussion started by: DBnixUser
15 Replies

5. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

6. Shell Programming and Scripting

Reading file contents until a keyword

Hi Guys, I need to read a file until I find a blank line. and in the next iteration I want to continue reading from the line I find a keyword. For ex: my file looks like PDS_JOB_ALIAS CRITERIA_ITEM_TYPE PDS_JOB_CRITERIA_ITEM CRITERIA_ITEM_TYPE First I want to read the file... (2 Replies)
Discussion started by: infintenumbers
2 Replies

7. Shell Programming and Scripting

Replace partial contents of file with contents read from other file

Hi, I am facing issue while reading data from a file in UNIX. my requirement is to compare two files and for the text pattern matching in the 1st file, replace the contents in second file by the contents of first file from start to the end and write the contents to thrid file. i am able to... (2 Replies)
Discussion started by: seeki
2 Replies

8. Shell Programming and Scripting

script to grep a pattern from file compare contents with another file and replace

Hi All, Need help on this I have 2 files one file file1 which has several entries as : define service{ hostgroup_name !host1,!host5,!host6,.* service_description check_nrpe } define service{ hostgroup_name !host2,!host4,!host6,.* service_description check_opt } another... (2 Replies)
Discussion started by: namitai
2 Replies

9. Shell Programming and Scripting

Reading and printing one by one contents of a file

I have a file which has following contents: localhost_IP_SIP_1233026552455.xml localhost_IP_SIP_1233026552460.xml localhost_IP_SIP_1233026552467.xml localhost_IP_SIP_1233026552759.xml localhost_IP_SIP_1233026552969.xml localhost_IP_SIP_1233026552975.xml ... (2 Replies)
Discussion started by: Aditya.Gurgaon
2 Replies

10. Shell Programming and Scripting

Reading specific contents from a file and appending it to another file

Hi, I need to write a shell script (ksh) to read contents starting at a specific location from one file and append the contents at specific location in another file. Please find below the contents of the source file that I need to read the contents from, File 1 -----# more... (5 Replies)
Discussion started by: dnicky
5 Replies
Login or Register to Ask a Question