Merging non-repeating columns of lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging non-repeating columns of lines
# 1  
Old 02-09-2010
Merging non-repeating columns of lines

Hello,
I have file to work with. It has 5 columns. The first three, altogether, constitutes the position. The 4th column contains some values for downstream analysis and the fifth column contains some values that I want to add to 4th column (only if they happen to be in the same position).

My file looks like this:
Code:
chr3    10163261        10163262        A>R_32_32_50_22 rs71760202
chr3    10163295        10163296        A>R_28_28_50_20 rs71757232
chr3    10163295        10163296        A>R_28_28_50_20 rs71760202
chr3    10163306        10163307        T>Y_34_34_50_20 rs71757232
chr3    10163306        10163307        T>Y_34_34_50_20 rs71760202
chr3    10163306        10163307        T>Y_34_34_50_20 rs5030624

And I am trying to make it look like this:
Code:
chr3   10163261    10163262  A>R_32_32_50_22>rs71760202
chr3   10163295    10163296  A>R_28_28_50_20>rs71757232, rs71760202
chr3   10163306    10163307  T>Y_34_34_50_20>rs71757232, rs71760202, rs5030624

Any help / recommendation / pointer would be appreciated.
Cheers

Last edited by Scott; 02-09-2010 at 10:59 AM.. Reason: Code tags
# 2  
Old 02-09-2010
code:-


Code:
nawk '{a[$1" "$2" "$3" "$4]=a[$1" "$2" "$3" "$4]$5","}
END{for (i in a) print i,">",a[i]}' infile.txt | sort -k2.6 > outfile.txt

SmilieSmilieSmilie

---------- Post updated at 17:37 ---------- Previous update was at 16:48 ----------

in perl:-

Code:
perl -lane '$h{"@F[0..3]"}=$h{"@F[0..3]"}."$F[4]," ;
END{ foreach $k (sort keys %h) {print "$k > $h{$k}"}  ; } ;' infile.txt

SmilieSmilieSmilie

piece of cake SmilieSmilieSmilie
# 3  
Old 02-09-2010
Thanks a lot, it works like a charm Smilie
# 4  
Old 02-09-2010
which one you like more the nawk or the perl code?

SmilieSmilieSmilie
# 5  
Old 02-09-2010
I am a newbie in bash and every new thing seems like magic to me Smilie So I liked the nawk version better but I could use some explanation.
# 6  
Old 02-09-2010
or:
Code:
awk '{t=$5;$5="";if(p!=$0){if(p)print p s;p=$0;s=">"t}else s=s","t}END{print p s}' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash repeating lines for some files but not all

The bash below executes and seems to work fine on those files in which . However on those files where there is no additional CNV detected that line repeats multiple times instead of only once. I tried adding an END as all lines are printed but that doesn't help. I can not seem to solve this... (5 Replies)
Discussion started by: cmccabe
5 Replies

2. Shell Programming and Scripting

Merging multiple lines to columns with awk, while inserting commas for missing lines

Hello all, I have a large csv file where there are four types of rows I need to merge into one row per person, where there is a column for each possible code / type of row, even if that code/row isn't there for that person. In the csv, a person may be listed from one to four times... (9 Replies)
Discussion started by: RalphNY
9 Replies

3. Shell Programming and Scripting

Compare last 90 logs and print repeating lines with >20

*log files are in date order sample logs... ciscoresets_20120314 ciscoresets_20120313 ciscoresets_20120312 ciscoresets_20120311 ciscoresets_20120310 cat ciscoresets_20120314 SYDGRE04,10,9 SYDGRE04,10,10 SYDGRE04,10,11 SYDGRE04,10,12 SYDGRE04,10,13 SYDGRE04,10,14 SYDGRE04,10,15... (2 Replies)
Discussion started by: slashbash
2 Replies

4. Shell Programming and Scripting

Printing the lines which are repeating in a files

Hi, I need to find the lines which are repeating in a file cat file1 abcdef 23-1 abcdef 24-1 bcdeff 25-0 ttdcfg 26-0 ttdcfg 20-0 bcdef1 25-0 bcdef2 25-0 bcdef3 25-0 bcdef4 25-0 bcdef4 00-0any help is greatly appreciated. Thanks in advance. In need to find which one are... (3 Replies)
Discussion started by: jpkumar10
3 Replies

5. Shell Programming and Scripting

Removing repeating lines from a data frame (AWK)

Hey Guys! I have written a code which combines lots of files into one big file(.csv). However, each of the original files had headers on the first line, and now that I've combined the files the headers are interspersed throughout the new combined data frame. For example, throughout the data... (21 Replies)
Discussion started by: gd9629
21 Replies

6. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

7. UNIX for Dummies Questions & Answers

Remove groups of repeating lines

I know uniq exists, but am not sure how to remove repeating lines when they are groups of two different lines repeating themselves, without using sort. I need them to be sorted in the original order, just to remove repeats. cd /media/AUDIO/WAVE/9780743518673/mp3 ~/Desktop/mp3-to-m4b... (1 Reply)
Discussion started by: glev2005
1 Replies

8. Shell Programming and Scripting

Value repeating problem in columns

Hi, I have a file like this 0817 0201364 1 866 . . . . . . . 574 . 100.0 100.0 5529737 1 TV 0817 0201364 2 1440 . . . . . . . . . . . 5529737 1 TV 0817 0201364 6 1323 . . . . ... (2 Replies)
Discussion started by: Sandeep_Malik
2 Replies

9. Shell Programming and Scripting

merge 2 files (without repeating any lines)

I need to add the content of file1 to file2 - all lines but not those existing in file2 already, so the "cat file1 >> file2" doesn't work. For example, file1: 100 xxxxxx str1 102 xxxxxx str2 File2: 50 xxxxxxx xxx 30 xxxxxxxxxxx 102 xxxxxx str2 xxxx ...... the result: 50 xxxxxxx... (9 Replies)
Discussion started by: bluemoon1
9 Replies

10. UNIX for Dummies Questions & Answers

Omit repeating lines

Can someone help me with the following 2 objectives? 1) The following command is just an example. It gets a list of all print jobs. From there I am trying to extract the printer name. It works with the following command: lpstat -W "completed" -o | awk -F- '{ print $1}' Problem is, I want... (6 Replies)
Discussion started by: TheCrunge
6 Replies
Login or Register to Ask a Question