sed replace file contents by reading from another file


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 11-15-2016
sed replace file contents by reading from another file

Hello,

My input file1 is like this by tab-delimited

Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";

My input file2 is like this
Code:
uc007aeu.1	Xkr4
uc011wht.1	Tcea1

Now I want to replace the contents of inputfile1 (after gene_id and transcript_id) with the second column value in inputfile2. I did try by separating out the columns and joining based on the columns but since join needs to sort and I DO NOT want this order of input file to be sorted, it is becoming hard for me to get output. Any ideas are highly appreciated.

Please note that the input file row order should not be changed.

Thanks
# 2  
Old 11-15-2016
Not sure what you mean exactly. Perhaps something like this?
Code:
awk 'NR==FNR{A[$1]=$2; next} $2 in A{$2=$4=A[$2]}1' FS='\t' file2 FS=\" OFS=\" file1

Output:
Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1"; 
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "Tcea1"; transcript_id "Tcea1";

--
Or did you mean:
Code:
awk 'NR==FNR{A[$1]=$2; next} $2 in A{$0=$0 A[$2]}1' FS='\t' file2 FS=\" file1

Code:
chr1	mm10_knownGene	stop_codon	3216022	3216024	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3216025	3216968	0.000000	-	2	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3214482	3216968	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3421702	3421901	0.000000	-	1	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3421702	3421901	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	CDS	3670552	3671348	0.000000	-	0	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	3671346	3671348	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	exon	3670552	3671498	0.000000	-	.	gene_id "uc007aeu.1"; transcript_id "uc007aeu.1"; Xkr4
chr1	mm10_knownGene	start_codon	4857914	4857916	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4857914	4857976	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4857694	4857976	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4867470	4867532	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4867470	4867532	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4878027	4878132	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4878027	4878132	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4886744	4886831	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4886744	4886831	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4889460	4889602	0.000000	+	1	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4889460	4889602	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4890740	4890796	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4890740	4890796	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4891915	4892069	0.000000	+	2	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4891915	4892069	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4893417	4893563	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4893417	4893563	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4894934	4895005	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4894934	4895005	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	CDS	4896356	4896361	0.000000	+	0	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	stop_codon	4896362	4896364	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1"; Tcea1
chr1	mm10_knownGene	exon	4896356	4897909	0.000000	+	.	gene_id "uc011wht.1"; transcript_id "uc011wht.1";Tcea1


Last edited by Scrutinizer; 11-15-2016 at 02:59 PM..
This User Gave Thanks to Scrutinizer For This Post:
jacobs.smith (11-15-2016)
# 3  
Old 11-15-2016
That specification is not too clear. If I interpreted it correctly, try
Code:
awk 'NR == FNR {T["\"" $1 "\";"] = $2; next} $12 in T {sub ($12 ".$", "& " T[$12])} 1' file2 file1

This User Gave Thanks to RudiC For This Post:
jacobs.smith (11-15-2016)
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
ksh Script, Reading A File, Grepping A File Contents In Another File Brusimm Shell Programming and Scripting 3 11-28-2018 06:36 PM
sed - Replace string with file contents Jay Kah Shell Programming and Scripting 4 07-08-2014 02:54 PM
How to replace a string in a file with contents of another file? vijay14 Shell Programming and Scripting 3 07-05-2014 03:11 AM
sed to replace specific positions on line with file contents nwalsh88 Shell Programming and Scripting 2 02-22-2013 07:29 AM
Reading the contents of the file and splitting using ksh bittu129 Shell Programming and Scripting 2 02-07-2013 11:24 AM
Looping/Reading file contents not working DBnixUser UNIX for Dummies Questions & Answers 15 01-31-2013 03:28 PM
Run a program-print parameters to output file-replace op file contents with max 4th col jacobs.smith Shell Programming and Scripting 7 01-24-2013 01:08 PM
Reading file contents until a keyword infintenumbers Shell Programming and Scripting 2 07-31-2012 04:23 PM
Replace Contents between 2 strings in a file with contens of another file powelltallen Shell Programming and Scripting 9 07-23-2012 02:36 AM
Formatting Report and Reading data and fetching the details from contents file rameshds Shell Programming and Scripting 0 04-26-2012 05:21 AM
Replace partial contents of file with contents read from other file seeki Shell Programming and Scripting 2 03-12-2012 06:54 AM
script to grep a pattern from file compare contents with another file and replace namitai Shell Programming and Scripting 2 08-30-2011 01:31 PM
sed command for copying the contents of other file replacing it another file on specifc pattern balrajg Shell Programming and Scripting 0 01-26-2011 09:42 PM
Reading and printing one by one contents of a file Aditya.Gurgaon Shell Programming and Scripting 2 01-27-2009 05:08 AM
Reading specific contents from a file and appending it to another file dnicky Shell Programming and Scripting 5 10-04-2005 05:45 AM