Concatenate values in the first column based on the second column.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Concatenate values in the first column based on the second column.
# 1  
Old 01-29-2016
[Solved] Concatenate values in the first column based on the second column.

I have a file (myfile.txt) with contents like this:

Code:
1.txt apple is
3.txt apple is
5.txt apple is
2.txt apple is a
7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is

The above file is already sorted using this command:
Code:
sort -k2 myfile.txt

My objective is to get this:
Code:
1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is

You can notice that if the text in the second column is same as we go downwards, we concatenate the values from the first column until they remain the same.

This is what I have tried, but not working perfectly well:
Code:
awk -F' ' 'NF>2{a[$2] = a[$2]"_"$1}END{for(i in a){print a[i]" "i}}' myfile.txt

The output that I get using the above command is this:
Code:
_4.txt orange
_1.txt_3.txt_5.txt_2.txt_7.txt_8.txt apple
_6.txt zero

Any help? I am using Linux with BASH.

Last edited by shoaibjameel123; 01-29-2016 at 09:54 AM.. Reason: Changed bigram.txt to myfile.txt for clarity
# 2  
Old 01-29-2016
Hello shoaibjameel123,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A=$1;$1="";array[$0]=array[$0]?array[$0] "_" A:A;next} {$1="";B=$0} (B in array){C=B;sub(/^[[:space:]]+/,X,B);print array[C] OFS B;delete array[C]}'   Input_file  Input_file

Output will be as follows.
Code:
1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is

Also if you are not worried about the sequence then following may help you in same too.
Code:
awk '{A=$1;$1="";array[$0]=array[$0]?array[$0] "_" A:A} END{for(i in array){j=i;sub(/^[[:space:]]+/,X,i);print array[j] OFS i}}'  Input_file

Output will be as follows.
Code:
8.txt apple is a fruit
2.txt_7.txt apple is a
6.txt zero is
1.txt_3.txt_5.txt apple is
4.txt orange not a fruit

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 01-29-2016
Thanks. Yes, both of them work. It solves my problem.
# 4  
Old 01-29-2016
How about
Code:
awk '
        {sub (" ", FS)
         $0=$0
         T[$2]=(T[$2]?T[$2]"_":"") $1
        }
END     {for (t in T) print T[t], t
        }
' FS="\001" file
8.txt apple is a fruit
1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
4.txt orange not a fruit
6.txt zero is

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group/concatenate certain column and basis on this do addition on other column

Hi Experts, Need your support I want to group/concatenate column 1,2,12 and 13 and if found duplicate then need to sum value of column 17,20,21 and column22. After concatenation if found unique then no action to be taken. Secondly want to make duplicate rows basis on grouping/concatenation of... (1 Reply)
Discussion started by: as7951
1 Replies

2. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ... (25 Replies)
Discussion started by: daashti
25 Replies

3. UNIX for Beginners Questions & Answers

Concatenate column values when header is Matching from multiple files

there can be n number of columns but the number of columns and header name will remain same in all 3 files. Files are tab Delimited. a.txt Name 9/1 9/2 X 1 7 y 2 8 z 3 9 a 4 10 b 5 11 c 6 12 b.xt Name 9/1 9/2 X 13 19 y 14 20 z 15 21 a 16 22 b 17 23 c 18 24 c.txt Name 9/1 9/2... (14 Replies)
Discussion started by: Nina2910
14 Replies

4. UNIX for Dummies Questions & Answers

Repositioning based on column values

Dear all ... I have a file which I want to change the structure based on the values in some columns and I would be grateful if you can help... one of my files looks like ... they all have ten rows 1,0,0 10,0,0 2,0,0 3,0,0 4,1,1 4,1,1 4,1,1 5,0,0 6,0,0 7,0,0 8,0.5,2 9,0.33,3 9,0.33,3... (1 Reply)
Discussion started by: A-V
1 Replies

5. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

6. Shell Programming and Scripting

Choosing rows based on column values

I have a .csv file: A,B,0.6 C,D,-0.7 D,E,0.1 A,E,0.45 D,G, -0.4 I want to select rows based on the values of the 3rd columns such that it is >=0.5 or <= -0.5 Thanks. A,B,0.6 D,G, -0.7 (1 Reply)
Discussion started by: Sanchari
1 Replies

7. Shell Programming and Scripting

Adding values of a column based on another column

Hello, I have a data such as this: ENSGALG00000000189 329 G A 4 2 0 ENSGALG00000000189 518 T C 5 1 0 ENSGALG00000000189 1104 G A 5 1 0 ENSGALG00000000187 3687 G T 5 1 0 ENSGALG00000000187 4533 A T 4 2 0 ENSGALG00000000233 5811 T C 4 2 0 ENSGALG00000000233 5998 C A 5 1 0 I want to... (3 Replies)
Discussion started by: Homa
3 Replies

8. Shell Programming and Scripting

join rows based on the column values

Hi, Please help me to convert the input file to a new one. input file: -------- 1231231231 3 A 4561223343 0 D 1231231231 1 A 1231231231 2 A 1231231231 4 D 7654343444 2 A 4561223343 1 D 4561223343 2 D the output should be: -------------------- 1231231231 3#1#2 A 4561223343 0 D... (3 Replies)
Discussion started by: vsachan
3 Replies

9. Shell Programming and Scripting

How to averaging column based on first column values

Hello I have file that consist of 2 columns of millions of entries timestamp and throughput I want to find the average (throughput ) for each equal timestamp before change it to proper format e.g : i want to average 2 coloumnd fot all 1308154800 values in column 1 and then print... (4 Replies)
Discussion started by: aadel
4 Replies

10. Shell Programming and Scripting

How to pick values from column based on key values by usin AWK

Dear Guyz:) I have 2 different input files like this. I would like to pick the values or letters from the inputfile2 based on inputfile1 keys (A,F,N,X,Z). I have done similar task by using awk but in that case the inputfiles are similar like in inputfile2 (all keys in 1st column and values in... (16 Replies)
Discussion started by: repinementer
16 Replies
Login or Register to Ask a Question