awk to ignore whitespace in field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to ignore whitespace in field
# 1  
Old 08-28-2017
awk to ignore whitespace in field

The awk below executes and update the desired field in my first awk. However, the white space between
nonsynonymous SNV in $9 is being split into tabs and my attempt to correct this does not update the field
unless it is removed. I am not sure what I am doing wrong? Thank you Smilie.


file1
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritence	ExonicFunc.refGene	AAChange.refGene	avsnp147	PopFreqMax	1000G_ALL	1000G_AFR	1000G_AMR	1000G_EAS	1000G_EUR	1000G_SAS	ExAC_ALL	ExAC_AFR	ExAC_AMR	ExAC_EAS	ExAC_FIN	ExAC_NFE	ExAC_OTH	ExAC_SAS	ESP6500siv2_ALL	ESP6500siv2_AA	ESP6500siv2_EA	CG46	SIFT_score	SIFT_pred	Polyphen2_HDIV_score	Polyphen2_HDIV_pred	Polyphen2_HVAR_score	Polyphen2_HVAR_pred	LRT_score	LRT_pred	MutationTaster_score	MutationTaster_pred	MutationAssessor_score	MutationAssessor_pred	dpsi_max_tissue	dpsi_zscore	CLINSIG	CLNDBN	CLNACC	CLNDSDB	CLNDSDBID	Quality	Reads	Zygosity	Score	Classification	Rank	HGMD	Sanger
11	chr2	220494118	220494118	A	C	exonic	SLC4A3	.	.	nonsynonymous SNV	SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P	rs597306	1.	0.95	0.84	0.98	1.	1.	1.	0.98	0.84	0.99	1.	1.	1.	0.99	1.	0.95	0.85	1.	0.84	1.0	T	0.0	B	0.0	B	0.013	N	1	P	-1.545	N	-0.0806	-0.387	.	.	.	.	.	GOOD	78	hom	22

file2
Code:
SLC4A3 unknown

current output
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritence	ExonicFunc.refGene	AAChange.refGene	avsnp147	PopFreqMax	1000G_ALL	1000G_AFR	1000G_AMR	1000G_EAS	1000G_EUR	1000G_SAS	ExAC_ALL	ExAC_AFR	ExAC_AMR	ExAC_EAS	ExAC_FIN	ExAC_NFE	ExAC_OTH	ExAC_SAS	ESP6500siv2_ALL	ESP6500siv2_AA	ESP6500siv2_EA	CG46	SIFT_score	SIFT_pred	Polyphen2_HDIV_score	Polyphen2_HDIV_pred	Polyphen2_HVAR_score	Polyphen2_HVAR_pred	LRT_score	LRT_pred	MutationTaster_score	MutationTaster_pred	MutationAssessor_score	MutationAssessor_pred	dpsi_max_tissue	dpsi_zscore	CLINSIG	CLNDBN	CLNACC	CLNDSDB	CLNDSDBID	Quality	Reads	Zygosity	Score	Classification	Rank	HGMD	Sanger
11	chr2	220494118	220494118	A	C	exonic	SLC4A3	.	unknown	nonsynonymous	SNV	SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P	rs597306	1.	0.95	0.84	0.98	1.	1.	1.	0.98	0.84	0.99	1.	1.	1.	0.99	1.	0.95	0.85	1.	0.84	1.0	T	0.0	B	0.0	B	0.013	N	1	P	-1.545	N	-0.0806	-0.387	.	.	.	.	.	GOOD	78	hom	22

desired output field in bold updated and not split
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritence	ExonicFunc.refGene	AAChange.refGene	avsnp147	PopFreqMax	1000G_ALL	1000G_AFR	1000G_AMR	1000G_EAS	1000G_EUR	1000G_SAS	ExAC_ALL	ExAC_AFR	ExAC_AMR	ExAC_EAS	ExAC_FIN	ExAC_NFE	ExAC_OTH	ExAC_SAS	ESP6500siv2_ALL	ESP6500siv2_AA	ESP6500siv2_EA	CG46	SIFT_score	SIFT_pred	Polyphen2_HDIV_score	Polyphen2_HDIV_pred	Polyphen2_HVAR_score	Polyphen2_HVAR_pred	LRT_score	LRT_pred	MutationTaster_score	MutationTaster_pred	MutationAssessor_score	MutationAssessor_pred	dpsi_max_tissue	dpsi_zscore	CLINSIG	CLNDBN	CLNACC	CLNDSDB	CLNDSDBID	Quality	Reads	Zygosity	Score	Classification	Rank	HGMD	Sanger
11	chr2	220494118	220494118	A	C	exonic	SLC4A3	.	unknown	nonsynonymous SNV	SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P	rs597306	1.	0.95	0.84	0.98	1.	1.	1.	0.98	0.84	0.99	1.	1.	1.	0.99	1.	0.95	0.85	1.	0.84	1.0	T	0.0	B	0.0	B	0.013	N	1	P	-1.545	N	-0.0806	-0.387	.	.	.	.	.	GOOD	78	hom	22

awk
Code:
awk 'FNR==NR {a[$1]=$2; next} a[$8]{$10=a[$8]}1' OFS="\t" file2 file1 > output

To ignore the whitespace I tried:
Code:
awk -F '' 'FNR==NR {a[$1]=$2; next} a[$8]{$10=a[$8]}1' OFS="\t" file2 file1 > output

# 2  
Old 08-28-2017
Try:

Code:
awk -F '\t' ...

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 08-28-2017
I didn't know that -F'\t' could be used for setting the deliminator within each field as well (thought is set only between each field). Thank you very much Smilie.

---------- Post updated at 02:11 PM ---------- Previous update was at 01:22 PM ----------

I spoke too soon and $8 does not update, I think because file1 is space-delimited. If I add a tab in file1 I get the desired output. If a tab is not added to file1 is there a way to ignore the whitespace in $9 of the output? The space seems to be causing an issue, so maybe just removing it before processing will be the best. Thank you Smilie.
# 4  
Old 08-28-2017
If you want different input field separators for your two input files, you need to change the value of FS when you switch input files. Try:
Code:
awk 'FNR==NR {a[$1]=$2; next} a[$8]{$10=a[$8]}1' OFS="\t" file2 FS='\t' file1 > output

which uses the default (strings of one or more <space> and/or <tab> characters) as the field separator when reading from file2 and uses a single <tab> character as the field separator when reading from file1.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 08-29-2017
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk sed to repeat every character on same position from the upper line replacing whitespace

Hello is it possible with awk or sed to replace any white space with the previous line characters in the same position? I am asking this because the file I have doesn't always follow a pattern. For example the file I have is the result of a command to obtain windows ACLs: icacls C:\ /t... (5 Replies)
Discussion started by: nakaedu
5 Replies

2. Shell Programming and Scripting

How can awk ignore the field delimiter like comma inside a field?

We have a csv file as mentioned below and the requirement is to change the date format in file as mentioned below. Current file (file.csv) ---------------------- empname,date_of_join,dept,date_of_resignation ram,08/09/2015,sales,21/06/2016 "akash,sahu",08/10/2015,IT,21/07/2016 ... (6 Replies)
Discussion started by: gopal.biswal
6 Replies

3. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Is there a reliable way to deal with whitespace in array indicies? I am trying to annotate fails in a database using a table of known fails. In a begin block I have code like this: # Read in Known Fail List getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline <... (6 Replies)
Discussion started by: Michael Stora
6 Replies

4. Shell Programming and Scripting

How to ignore relative few occurrences of a field value?

Hi experts, I have a very long file that looks about like this. aaad_1577 64000 aaad_1577 72000 aaad_1577 72000 aaad_1577 65000 aaad_1577 65000 (...aaad about a thousand times...) bbbd_2002 56000 bbbd_2002 57000 bbbd_3045 57000 cccd_3452 150000 dddd_6014 150000 dddd_6014 150000... (2 Replies)
Discussion started by: abercrom
2 Replies

5. UNIX for Dummies Questions & Answers

[Solved] How remove leading whitespace from xml (sed /awk?)

Hi again I have an xml file and want to remove the leading white space as it causes me issues later in my script I see sed is possible but cant seem to get it to work I tried sed 's/^ *//' file.xml output <xn:VsDataContainer id="1U104799" modifier="update"> ... (10 Replies)
Discussion started by: aniquebmx
10 Replies

6. Shell Programming and Scripting

awk - How to preserve whitespace?

Given a file: # configuration file for newsyslog # $FreeBSD: /repoman/r/ncvs/src/etc/newsyslog.conf,v 1.50 2005/03/02 00:40:55 brooks Exp $ # # Entries which do not specify the '/pid_file' field will cause the # syslogd process to be signalled when that log file is rotated. This # action... (10 Replies)
Discussion started by: jnojr
10 Replies

7. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

8. Shell Programming and Scripting

How to match (whitespace digits whitespace) sequence?

Hi Following is an example line. echo "192.22.22.22 \"33dffwef\" 200 300 dsdsd" | sed "s:\(\ *\ \):\1:" I want it's output to be 200 However this is not the case. Can you tell me how to do it? I don't want to use AWK for this. Secondly, how can i fetch just 300? Should I use "\2"... (3 Replies)
Discussion started by: shahanali
3 Replies

9. Shell Programming and Scripting

Unable to assign value to variable using awk coz of whitespace in value

Unix gurus, I have a file as below, which is basically the result set obtained from a sql query on an Oracle database. ID PROG_NAME USER_PROG_NAME -------- --------------- ---------------------------------------- 33045 INCOIN Import Items 42690 ... (3 Replies)
Discussion started by: sunpraveen
3 Replies

10. UNIX for Advanced & Expert Users

find columns with whitespace as field seperator?

Hai I am using bash-2.03$ bash --version GNU bash, version 2.03.0(1)-release (sparc-sun-solaris) I am not able to use gawk command its showing command not found , why ? Eg: awk 'NR==1' fix.txt | gawk 'BEGIN { FIELDWIDTHS = "3 2" } { printf($1"|"$2); }'... (8 Replies)
Discussion started by: tkbharani
8 Replies
Login or Register to Ask a Question