awk help


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers awk help
# 1  
Old 01-18-2015
awk help

i am newbie in awk and bioinformatics.

i have a tab deliminated text file. the format is as follows. 3 records, 4 fields separated by tabs
Code:
SNP 100 101 102 103
M1 a:5 b:4 a:26 b:3
M2 a:4 b:45 a:6 b:18
M3 a:50 b:40 a:26 b:30

please help me in processing the file based on number behind the colon [:].

1) let's say, if the number after the colon [:] equal or more than 5, then just print "a" or "b"
2)if the number after the colon [:] is less than 5 then print "X".

Last edited by jim mcnamara; 01-18-2015 at 11:17 PM..
# 2  
Old 01-18-2015
Quote:
Originally Posted by jaisask
i am newbie in awk and bioinformatics.

i have a tab deliminated text file. the format is as follows. 3 records, 4 fields separated by tabs
Code:
SNP 100 101 102 103
M1 a:5 b:4 a:26 b:3
M2 a:4 b:45 a:6 b:18
M3 a:50 b:40 a:26 b:30

please help me in processing the file based on number behind the colon [:].

1) let's say, if the number after the colon [:] equal or more than 5, then just print "a" or "b"
2)if the number after the colon [:] is less than 5 then print "X".
You say you have 3 input records and show us 4 input lines???

You say your input fields are tab delimited, but there are no tabs in your sample input???

The description isn't at all clear as to what your desired output should be. From your description it could easily be:
Code:
SNP 100 101 102 103
M1 a X a X
M2 X b a b
M3 a b a b

or:
Code:
a
X
a
X
X
b
a
b
a
b
a
b

or:
Code:
a X a X
X b a b
a b a b

Please give us a clear description of what you are trying to do.

Unless the output you want looks like the 2nd sample above, what delimiter should be used between fields in the output (or does it matter)?

Please show us what you have tried (even if it isn't working).
# 3  
Old 01-18-2015
Sorry. there are 4 input lines (and 5 fields) and i want to keep header line (first line). the output should be look like:

Code:
SNP 100 101 102 103
M1 a X a X
M2 X b a b
M3 a b a b


i am still in learning process and therefore, it,s hard for me to code.


please let me know awk code for converting

Code:
SNP 100 101 102 103
M1 a:5 b:4 a:26 b:3
M2 a:4 b:45 a:6 b:18
M3 a:50 b:40 a:26 b:30


to

Code:
SNP 100 101 102 103 
M1 a X a X
M2 X b a b
M3 a b a b



Thanks

Last edited by Scrutinizer; 01-19-2015 at 12:02 AM.. Reason: Removed spurious formatting; introduced code tags
# 4  
Old 01-19-2015
Hello jaisask,

Kindly use code tags in codes/inputs/commands used in your posts as per forum rules, following may help you in same.
Code:
awk '{for(i=2;i<=NF;i++){if($i ~ /:/){A=B=$i;sub(/.*:/,X,A);sub(/:.*/,Y,B);$i=A>=5?B:"X"}}} 1'  Input_file

Output will be as follows.
Code:
SNP 100 101 102 103
M1 a X a X
M2 X b a b
M3 a b a b


EDIT: Adding a non oneliner form of solution.
Code:
awk '   {
                for(i=2;i<=NF;i++)
        {
                if($i ~ /:/)
                                {
                                        A=B=$i;
                                        sub(/.*:/,X,A);
                                        sub(/:.*/,Y,B);
                                        $i=A>=5?B:"X"
                                }
        }
        }
   1' Input_file


Thanks,
R. Singh

Last edited by RavinderSingh13; 01-19-2015 at 12:52 AM.. Reason: Added a non one liner form of solution
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 01-19-2015
Building on something that Jim McNamara posted earlier (and deleted when the output format was questioned), you could also try:
Code:
awk '
#BEGIN {	OFS = "\t"
#}
#NR == 1 {
#	$1 = $1		# reset field separators
#}
NR > 1 {for(i = 2; i <= NF; i++) { 
		p = index($i, ":")
		if(p > 0)
			if(substr($i, p + 1) +0 >= 5 )
				$i = substr($i, 1, 1)
			else	$i = "X"
	}
}
1' file

If you want tabs as output field separators instead of spaces (as originally indicated), remove the comment characters marked in red.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

As shown above, the output produced by your sample input is:
Code:
SNP 100 101 102 103
M1 a X a X
M2 X b a b
M3 a b a b

with the comments turned into live code, the output is:
Code:
SNP	100	101	102	103
M1	a	X	a	X
M2	X	b	a	b
M3	a	b	a	b

This User Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk output yields error: awk:can't open job_name (Autosys)

Good evening, Im newbie at unix specially with awk From an scheduler program called Autosys i want to extract some data reading an inputfile that comprises jobs names, then formating the output to columns for example 1. This is the inputfile: $ more MapaRep.txt ds_extra_nikira_usuarios... (18 Replies)
Discussion started by: alexcol
18 Replies

2. Shell Programming and Scripting

Pass awk field to a command line executed within awk

Hi, I am trying to pass awk field to a command line executed within awk (need to convert a timestamp into formatted date). All my attempts failed this far. Here's an example. It works fine with timestamp hard-codded into the command echo "1381653229 something" |awk 'BEGIN{cmd="date -d... (4 Replies)
Discussion started by: tuxer
4 Replies

3. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

4. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

5. Shell Programming and Scripting

awk command to compare a file with set of files in a directory using 'awk'

Hi, I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files. To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Discussion started by: anandek
10 Replies

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

7. Shell Programming and Scripting

Problem with awk awk: program limit exceeded: sprintf buffer size=1020

Hi I have many problems with a script. I have a script that formats a text file but always prints the same error when i try to execute it The code is that: { if (NF==17){ print $0 }else{ fields=NF; all=$0; while... (2 Replies)
Discussion started by: fate
2 Replies

8. Shell Programming and Scripting

awk: assign variable with -v didn't work in awk filter

I want to filter 2nd column = 2 using awk $ cat t 1 2 2 4 $ VAR=2 #variable worked in print $ cat t | awk -v ID=$VAR ' { print ID}' 2 2 # but variable didn't work in awk filter $ cat t | awk -v ID=$VAR '$2~/ID/ { print $0}' (2 Replies)
Discussion started by: honglus
2 Replies

9. Shell Programming and Scripting

scripting/awk help : awk sum output is not comming in regular format. Pls advise.

Hi Experts, I am adding a column of numbers with awk , however not getting correct output: # awk '{sum+=$1} END {print sum}' datafile 2.15291e+06 How can I getthe output like : 2152910 Thank you.. # awk '{sum+=$1} END {print sum}' datafile 2.15079e+06 (3 Replies)
Discussion started by: rveri
3 Replies

10. Shell Programming and Scripting

Awk problem: How to express the single quote(') by using awk print function

Actually I got a list of file end with *.txt I want to use the same command apply to all the *.txt Thus I try to find out the fastest way to write those same command in a script and then want to let them run automatics. For example: I got the file below: file1.txt file2.txt file3.txt... (4 Replies)
Discussion started by: patrick87
4 Replies
Login or Register to Ask a Question