sed replace file contents by reading from another file

11-15-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

sed replace file contents by reading from another file

Hello,

My input file1 is like this by tab-delimited

Code:

chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";

My input file2 is like this

Code:

uc007aeu.1	Xkr4
uc011wht.1	Tcea1

Now I want to replace the contents of inputfile1 (after gene_id and transcript_id) with the second column value in inputfile2. I did try by separating out the columns and joining based on the columns but since join needs to sort and I DO NOT want this order of input file to be sorted, it is becoming hard for me to get output. Any ideas are highly appreciated.

Please note that the input file row order should not be changed.

Thanks

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

11-15-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Not sure what you mean exactly. Perhaps something like this?

Code:

awk 'NR==FNR{A[$1]=$2; next} $2 in A{$2=$4=A[$2]}1' FS='\t' file2 FS=\" OFS=\" file1

Output:

Code:

chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1";

--
Or did you mean:

Code:

awk 'NR==FNR{A[$1]=$2; next} $2 in A{$0=$0 A[$2]}1' FS='\t' file2 FS=\" file1

Code:

chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";Tcea1

Last edited by Scrutinizer; 11-15-2016 at 04:59 PM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

11-15-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

That specification is not too clear. If I interpreted it correctly, try

Code:

awk 'NR == FNR {T["\"" $1 "\";"] = $2; next} $12 in T {sub ($12 ".$", "& " T[$12])} 1' file2 file1

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-15-2016

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Exactly what I was looking for. Thank you @Scrutinizer

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

Emergency UNIX and Linux Support

sed replace file contents by reading from another file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ksh Script, Reading A File, Grepping A File Contents In Another File

Discussion started by: Brusimm

2. Shell Programming and Scripting

sed - Replace string with file contents

Discussion started by: Jay Kah

3. Shell Programming and Scripting

sed to replace specific positions on line with file contents

Discussion started by: nwalsh88

4. UNIX for Dummies Questions & Answers

Looping/Reading file contents not working

Discussion started by: DBnixUser

5. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Discussion started by: jacobs.smith

6. Shell Programming and Scripting

Reading file contents until a keyword

Discussion started by: infintenumbers

7. Shell Programming and Scripting

Replace partial contents of file with contents read from other file

Discussion started by: seeki

8. Shell Programming and Scripting

script to grep a pattern from file compare contents with another file and replace

Discussion started by: namitai

9. Shell Programming and Scripting

Reading and printing one by one contents of a file

Discussion started by: Aditya.Gurgaon

10. Shell Programming and Scripting

Reading specific contents from a file and appending it to another file

Discussion started by: dnicky