awk to combine matching lines in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to combine matching lines in file
# 1  
Old 09-08-2016
awk to combine matching lines in file

I am trying to combine all matching lines in the tab-delimited using awk. The below runs but no output results. Thank you Smilie.

input
Code:
chrX    110925349    110925532    ALG13
chrX    110925349    110925532    ALG13
chrX    110925349    110925532    ALG13
chrX    47433390    47433999    SYN1
chrX    47433390    47433999    SYN1
chr18    53298518    53298629    TCF4
chr18    53298518    53298629    TCF4
chr18    53298640    53298695    TCF4
chr18    53298640    53298695    TCF4

desired output
Code:
chrX    110925349    110925532    ALG13
chrX    47433390    47433999    SYN1
chr18    53298518    53298629    TCF4
chr18    53298640    53298695    TCF4

Code:
awk '!(NR){print$0p}{p=$0}' input

# 2  
Old 09-08-2016
Code:
awk '!A[$0]++' file

These 2 Users Gave Thanks to Yoda For This Post:
# 3  
Old 09-08-2016
Hi cmccabe,
The code you were using:
Code:
awk '!(NR){print$0p}{p=$0}' input

only tries to print anything when the condition !(NR) evaluates to a non-zero value. But, since the awk NR variable is set to one when awk reads the first record from your input files and increments by 1 every time another input record is read, !NR ALWAYS evaluates to zero. Therefore, the above script is logically equivalent to:
Code:
awk '{p=$0}

which, as you said, produces no output.

If you are just trying to remove duplicated adjacent lines in a file (and the first line in your file is never an empty line), you could try:
Code:
awk '$0 != p {print;p = $0}' input

If you could have an empty line as the first line in your file (and you want to keep that empty line in the output), you would need to make it a little more complicated:
Code:
awk '$0 != p || NR == 1 {print;p = $0}' input

The code Yoda suggested removes duplicated lines even if they are not adjacent. If you just need to worry about adjacent lines, Yoda's code does that as well but takes more time and memory to get the job done. For a small file like your sample; it doesn't matter. For a file with a huge number of lines with different contents, the code above should run considerably faster.

Hope this helps.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 09-10-2016
Thank you both very much Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to average matching lines in file

The awk below executes and is close (producing the first 4 columns in desired). However, when I add the sum of $7, I get nothing returned. Basically, I am trying to combine all the matching $4 in f1 and output them with the average of $7 in each match. Thank you :). f1 ... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

awk to combine lines if fields match in lines

In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a. If it does have that... (12 Replies)
Discussion started by: cmccabe
12 Replies

4. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

awk to combine all matching dates and remove non-matching

Using the awk below I am able to combine all the matching dates in $1, but I can not seem to remove the non-matching from the file. Thank you :). file 20161109104500.0+0000,x,5631 20161109104500.0+0000,y,2 20161109104500.0+0000,z,2 20161109104500.0+0000,a,4117... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

Combine multiple unique lines from event log text file into one line, use PERL or AWK?

I can't decide if I should use AWK or PERL after pouring over these forums for hours today I decided I'd post something and see if I couldn't get some advice. I've got a text file full of hundreds of events in this format: Record Number : 1 Records in Seq : ... (3 Replies)
Discussion started by: Mayday22
3 Replies

7. Shell Programming and Scripting

awk file comparison, x lines after matching as output

Hello, I couldn't find anything on the Forum that would help me to solve this problem. Could any body help me process below data using awk? I have got two files: file1: Worker1: Thomas Position: Manager Department: Sales Salary: $5,000 Worker2: Jason Position: ... (5 Replies)
Discussion started by: killerbee
5 Replies

8. Shell Programming and Scripting

Print lines matching value(s) in other file using awk

Hi, I have two comma separated files. I would like to see field 1 value of File1 exact match in field 2 of File2. If the value matches, then it should print matched lines from File2. I have achieved the results using cut, paste and egrep -f but I would like to use awk as it is efficient way and... (7 Replies)
Discussion started by: SBC
7 Replies

9. Shell Programming and Scripting

search and combine lines in awk

Hi All, I have 1 "keyword" file like this: 00-1F-FB-00-04-18 00-19-CB-8E-66-DF 00-1F-FB-00-48-9C 00-1F-FB-00-AA-4F .... and the 2nd "details" file like this: Wed Feb 11 00:00:02 2009 NAS-IP-Address = xxxxxxxxxxxxxxxxxx Class = "P1-SHT-AAA01;1233704662;4886720" ... (6 Replies)
Discussion started by: xajax7
6 Replies

10. UNIX for Dummies Questions & Answers

SImple HELP! how to combine two lines together using sed or awk..

hi..im new to UNIX... ok i have this information in the normal shell... there are 2 lines display like this: h@hotmail.com k@hotmail.com i want it to display like this with a space betweem them h@hotmail.com k@hotmail.com the information is stored in a text file.... anyone... (10 Replies)
Discussion started by: forevercalz
10 Replies
Login or Register to Ask a Question