Group by a column and first 2 characters of another


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Group by a column and first 2 characters of another
# 1  
Old 10-05-2015
Group by a column and first 2 characters of another

Hello all,

I want to consolidate my data group by cities into 4 rolled up code categories (first 2 characters of 2nd col), namely PR, GR, TY and others.

Code:
CHICAGO	PR1
CHICAGO	GR3
CHICAGO	GR4
CHICAGO	HT5
CHICAGO	HT6
CHICAGO	TY5
HOUSTON	PR3
HOUSTON	PR9
HOUSTON	TY4
HOUSTON	HJ5
DESMOINES	PR5
DESMOINES	PR6
DESMOINES	GR7
DESMOINES	HT7


Now I want to have

Code:
	PR	GR 	TY	OTHERS
CHICAGO	PR1	GR3|GR4	TY5	HT5|HT6
HOUSTON	PR3|PR9		TY4	HJ5
DESMOINES	PR5|PR6	GR7		HT7


I tried to achieve this by the following code

Code:
awk -F"\t" '{if(a[$1]){a[$1]=a[$1]"|"$2} else { a[$1]=$2}} END {for (i in a) {print i"\t"a[i]}}'

but I`m not sure how to include PR, GR , TY and all others separately. Please help.
# 2  
Old 10-05-2015
You could try something like:
Code:
awk '
BEGIN {	print "\tPR\tGR\tTY\tOTHERS"
	grp["PR"]; grp["GR"]; grp["TY"]
}
{	if(!((g = substr($2, 1, 2)) in grp))
		g = "OTHERS"
	if(!($1 in city))
		citylist[++cc] = city[$1] = $1
	data[$1, g] = data[$1, g] == "" ? $2 : data[$1, g] "|" $2
}
END {	for(i = 1; i <= cc; i++)
		printf("%s\t%s\t%s\t%s\t%s\n", citylist[i],
		    data[citylist[i], "PR"], data[citylist[i], "GR"],
		    data[citylist[i], "TY"], data[citylist[i], "OTHERS"])
}' file

which will work even if the input lines for a given city are not all contiguous. If the data for all cities are grouped on contiguous lines (as in your sample data), this script could be simplified.

This code produces the output you said you wanted except it does not put a space in the heading line after the GR before the tab field separator.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group/concatenate certain column and basis on this do addition on other column

Hi Experts, Need your support I want to group/concatenate column 1,2,12 and 13 and if found duplicate then need to sum value of column 17,20,21 and column22. After concatenation if found unique then no action to be taken. Secondly want to make duplicate rows basis on grouping/concatenation of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Group by and translate rows to column

I've a comma separated file with data below: 61401370587,505,1;0402686146,123;2387936.0;20170812 61401370587,505,2;0401296221,34;3.0;20170811 61401370587,505,5;0431169322,123;2387936.0;20170812 My requirement is to group by using 1st,2nd column . And translate the 3rd column's row data... (1 Reply)
Discussion started by: bhagat-reena
1 Replies

3. Shell Programming and Scripting

Remove the first character from the fourth column only if the column has four characters

I have a file as follows ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C ATOM 5184 O MET K... (14 Replies)
Discussion started by: hasanabdulla
14 Replies

4. Shell Programming and Scripting

Group by column and concatenate

Hi, Can you please help in resolving below issue: I have input file as below. COL1 COL2 ABC 111asdf ABC 222dfgh ABC 333fghy ABC 4rtyu XYZ 1yuio XYZ 2ytre Lookg for the Output File as below COL1 COL2 ABC '111asdf','222dfgh','333fghy','4rtyu' XYZ ... (6 Replies)
Discussion started by: Phani_Raghava
6 Replies

5. UNIX for Dummies Questions & Answers

Split binary file every occurrence of a group of characters

Hello I am new to scripts, codes, bash, terminal, etc. I apologize this my be very scattered because I frankly don't have any idea where to begin and I have had trouble sleeping lately. I have several 2GB files I wish to split. This Code 00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00... (17 Replies)
Discussion started by: PatrickE
17 Replies

6. Shell Programming and Scripting

sed or tr to remove specific group of special characters

Hi, I have a input of the form: ..., word1, word2, word3... I want out put of the form word1, word2, word3 I tried echo '..., word1, word2, word3...' | tr -d '...,' but that takes out the commas in the middle too so I get word1 word2 word3 but I want the commas in the middle. ... (3 Replies)
Discussion started by: forumbaba
3 Replies

7. Shell Programming and Scripting

Group by column from CSV file

Hello Friends, I need a shell script to extract a column in a csv file, group by the column, take the count and print the group with count more than one. Thanks (1 Reply)
Discussion started by: raka_rjit
1 Replies

8. Shell Programming and Scripting

Sum of column by group wise

Hello All , I have a problem with summing of column by group Input File - COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11 3010,21,1923D ,6,0,0.26,0,0.26,-0.26,1,200807 3010,21,192BI ,6,24558.97,1943.94,0,1943.94,22615.03,1,200807 3010,21,192BI... (8 Replies)
Discussion started by: jambesh
8 Replies

9. Shell Programming and Scripting

Merge group numbers and add a column containing group names

Hi All I do have a file like this with 6 columns. Groups of data merge together and the group number is indicated above each group. 1 1 12 26 289 3.2e-027 GCGTATGGCGGC 2 12 26 215 6.7e+006 TTCCACCTTTTG 3 9 26 175 ... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

10. Shell Programming and Scripting

Merge group numbers and add a column containing group names

I have a file in the following format. Groups of data merge together and the group number is indicated above each group. 1 adrf dfgr dfg 2 dfgr dfgr 3 dfef dfr fd 4 fgrt fgr fgg 5 fgrt fgr (3 Replies)
Discussion started by: Lucky Ali
3 Replies
Login or Register to Ask a Question