Sponsored Content
Top Forums Shell Programming and Scripting Group by a column and first 2 characters of another Post 302956976 by jalaj841 on Monday 5th of October 2015 04:14:46 PM
Old 10-05-2015
Group by a column and first 2 characters of another

Hello all,

I want to consolidate my data group by cities into 4 rolled up code categories (first 2 characters of 2nd col), namely PR, GR, TY and others.

Code:
CHICAGO	PR1
CHICAGO	GR3
CHICAGO	GR4
CHICAGO	HT5
CHICAGO	HT6
CHICAGO	TY5
HOUSTON	PR3
HOUSTON	PR9
HOUSTON	TY4
HOUSTON	HJ5
DESMOINES	PR5
DESMOINES	PR6
DESMOINES	GR7
DESMOINES	HT7


Now I want to have

Code:
	PR	GR 	TY	OTHERS
CHICAGO	PR1	GR3|GR4	TY5	HT5|HT6
HOUSTON	PR3|PR9		TY4	HJ5
DESMOINES	PR5|PR6	GR7		HT7


I tried to achieve this by the following code

Code:
awk -F"\t" '{if(a[$1]){a[$1]=a[$1]"|"$2} else { a[$1]=$2}} END {for (i in a) {print i"\t"a[i]}}'

but I`m not sure how to include PR, GR , TY and all others separately. Please help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge group numbers and add a column containing group names

I have a file in the following format. Groups of data merge together and the group number is indicated above each group. 1 adrf dfgr dfg 2 dfgr dfgr 3 dfef dfr fd 4 fgrt fgr fgg 5 fgrt fgr (3 Replies)
Discussion started by: Lucky Ali
3 Replies

2. Shell Programming and Scripting

Merge group numbers and add a column containing group names

Hi All I do have a file like this with 6 columns. Groups of data merge together and the group number is indicated above each group. 1 1 12 26 289 3.2e-027 GCGTATGGCGGC 2 12 26 215 6.7e+006 TTCCACCTTTTG 3 9 26 175 ... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

3. Shell Programming and Scripting

Sum of column by group wise

Hello All , I have a problem with summing of column by group Input File - COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11 3010,21,1923D ,6,0,0.26,0,0.26,-0.26,1,200807 3010,21,192BI ,6,24558.97,1943.94,0,1943.94,22615.03,1,200807 3010,21,192BI... (8 Replies)
Discussion started by: jambesh
8 Replies

4. Shell Programming and Scripting

Group by column from CSV file

Hello Friends, I need a shell script to extract a column in a csv file, group by the column, take the count and print the group with count more than one. Thanks (1 Reply)
Discussion started by: raka_rjit
1 Replies

5. Shell Programming and Scripting

sed or tr to remove specific group of special characters

Hi, I have a input of the form: ..., word1, word2, word3... I want out put of the form word1, word2, word3 I tried echo '..., word1, word2, word3...' | tr -d '...,' but that takes out the commas in the middle too so I get word1 word2 word3 but I want the commas in the middle. ... (3 Replies)
Discussion started by: forumbaba
3 Replies

6. UNIX for Dummies Questions & Answers

Split binary file every occurrence of a group of characters

Hello I am new to scripts, codes, bash, terminal, etc. I apologize this my be very scattered because I frankly don't have any idea where to begin and I have had trouble sleeping lately. I have several 2GB files I wish to split. This Code 00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00... (17 Replies)
Discussion started by: PatrickE
17 Replies

7. Shell Programming and Scripting

Group by column and concatenate

Hi, Can you please help in resolving below issue: I have input file as below. COL1 COL2 ABC 111asdf ABC 222dfgh ABC 333fghy ABC 4rtyu XYZ 1yuio XYZ 2ytre Lookg for the Output File as below COL1 COL2 ABC '111asdf','222dfgh','333fghy','4rtyu' XYZ ... (6 Replies)
Discussion started by: Phani_Raghava
6 Replies

8. Shell Programming and Scripting

Remove the first character from the fourth column only if the column has four characters

I have a file as follows ATOM 5181 N AMET K 406 12.440 6.552 25.691 0.50 7.37 N ATOM 5182 CA AMET K 406 13.685 5.798 25.578 0.50 5.87 C ATOM 5183 C AMET K 406 14.045 5.179 26.909 0.50 5.07 C ATOM 5184 O MET K... (14 Replies)
Discussion started by: hasanabdulla
14 Replies

9. Shell Programming and Scripting

Group by and translate rows to column

I've a comma separated file with data below: 61401370587,505,1;0402686146,123;2387936.0;20170812 61401370587,505,2;0401296221,34;3.0;20170811 61401370587,505,5;0431169322,123;2387936.0;20170812 My requirement is to group by using 1st,2nd column . And translate the 3rd column's row data... (1 Reply)
Discussion started by: bhagat-reena
1 Replies

10. Shell Programming and Scripting

Group/concatenate certain column and basis on this do addition on other column

Hi Experts, Need your support I want to group/concatenate column 1,2,12 and 13 and if found duplicate then need to sum value of column 17,20,21 and column22. After concatenation if found unique then no action to be taken. Secondly want to make duplicate rows basis on grouping/concatenation of... (1 Reply)
Discussion started by: as7951
1 Replies
col(1)								   User Commands							    col(1)

NAME
col - reverse line-feeds filter SYNOPSIS
col [-bfpx] DESCRIPTION
The col utility reads from the standard input and writes to the standard output. It performs the line overlays implied by reverse line- feeds, and by forward and reverse half-line-feeds. Unless -x is used, all blank characters in the input will be converted to tab characters wherever possible. col is particularly useful for filtering multi-column output made with the .rt command of nroff(1) and output resulting from use of the tbl(1) preprocessor. The ASCII control characters SO and SI are assumed by col to start and end text in an alternative character set. The character set to which each input character belongs is remembered, and on output SI and SO characters are generated as appropriate to ensure that each character is written in the correct character set. On input, the only control characters accepted are space, backspace, tab, carriage-return and newline characters, SI, SO, VT, reverse line- feed, forward half-line-feed and reverse half-line-feed. The VT character is an alternative form of full reverse line-feed, included for compatibility with some earlier programs of this type. The only other characters to be copied to the output are those that are printable. The ASCII codes for the control functions and line-motion sequences mentioned above are as given in the table below. ESC stands for the ASCII escape character, with the octal code 033; ESC- means a sequence of two characters, ESC followed by the character x. reverse line-feed ESC-7 reverse half-line-feed ESC-8 forward half-line-feed ESC-9 vertical-tab (VT) 013 start-of-text (SO) 016 end-of-text (SI) 017 OPTIONS
-b Assume that the output device in use is not capable of backspacing. In this case, if two or more characters are to appear in the same place, only the last one read will be output. -f Although col accepts half-line motions in its input, it normally does not emit them on output. Instead, text that would appear between lines is moved to the next lower full-line boundary. This treatment can be suppressed by the -f (fine) option; in this case, the output from col may contain forward half-line-feeds (ESC-9), but will still never contain either kind of reverse line motion. -p Normally, col will ignore any escape sequences unknown to it that are found in its input; the -p option may be used to cause col to output these sequences as regular characters, subject to overprinting from reverse line motions. The use of this option is highly discouraged unless the user is fully aware of the textual position of the escape sequences. -x Prevent col from converting blank characters to tab characters on output wherever possible. Tab stops are considered to be at each column position n such that n modulo 8 equals 1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of col: LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following error values are returned: 0 Successful completion. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | |CSI |enabled | +-----------------------------+-----------------------------+ SEE ALSO
nroff(1), tbl(1), ascii(5), attributes(5), environ(5) NOTES
The input format accepted by col matches the output produced by nroff with either the -T37 or -Tlp options. Use -T37 (and the -f option of col) if the ultimate disposition of the output of col will be a device that can interpret half-line motions, and -Tlp otherwise. col cannot back up more than 128 lines or handle more than 800 characters per line. Local vertical motions that would result in backing up over the first line of the document are ignored. As a result, the first line must not have any superscripts. SunOS 5.11 1 Feb 1995 col(1)
All times are GMT -4. The time now is 04:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy