Deleting repeated strings in column 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting repeated strings in column 2
# 1  
Old 05-24-2009
Deleting repeated strings in column 2

Hi to all,

I have a file where the subject could contain "Summarized Availability Report" or only "Summarized Report"
If the subject is "Summarized Availability Report" I want to apply it Scrip1 and if the subject is "Summarized Report"
I want to apply it Scrip2.

1-) I would like you help me how to choose Script1 if Subject contains "Summarized Availability Report".
2-) To develop part of this Script1.

The Inputfile in $2 has strings with 2 or 3 between "_M" and "X-Z".
Code:
 
 
Inputfile example when Subject contain the string "Availability":
Subject: Summarized Availability Report
Comment             GHH_M55X            May 21 2009 4:45PM 
Comment             GHH_M55Y            May 21 2009 4:45PM
Comment             GHH_M55Z            May 21 2009 4:45PM
Comment             YUP_M19Y            May 18 2009 7:45PM
Comment             YUP_M19Y            May 18 2009 7:45PM
Comment             WON_M123X           May 17 2009 11:22AM
Comment             CET_M123X           May 15 2009 9:12AM

Desired output:
(Script1_part 1: After line containing "Subject:...", delete last letter of strings in $2)
(With my knowledge I got this
Code:
awk -F"[X-Z] " '/M[0-9][0-9]|[0-9][X-Z]/ {print $1" "$2}')

Code:
 
Subject: Summarized Report
 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             YUP_M19            May 18 2009 7:45PM 
Comment             YUP_M19            May 18 2009 7:45PM 
Comment             WON_M123           May 17 2009 11:22AM
Comment             CET_M123           May 15 2009 9:12AM

(Scrip1_part 2: After line containing "Subject:...", delete lines with repeated elements in $2)
(In this part I need help, I don´t know how to eliminate repeated strings in column 2 )
Code:
Subject: Summarized Report
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             YUP_M19            May 18 2009 7:45PM
Comment             WON_M123           May 17 2009 11:22AM
Comment             CET_M123           May 15 2009 9:12AM

(Script1_part 3: After line containing "Subject:...", delete $1 and join lines with their Subject line)
Code:
 
Last lasta result 
Subject: Summarized Report->GHH_M55 May 21 2009 4:45PM, YUP_M19 May 18 2009 7:45PM, WON_M123 May 17 2009 11:22AM, CET_M123 May 15 2009 9:12AM

Thanks in advance for any help
# 2  
Old 05-25-2009
To Remove the repeated lines and to print one copy .

Code:
awk '/^Comment/ { print $1,substr($2,1,length($2)-1),$3,$4,$5,$6 }' inputfile.txt | uniq -ud

# 3  
Old 05-25-2009
Code:
awk 'NR==1{printf("%s-->",$0)}/^Comment/{a[$2]=$2" "$3" "$4" "$5" "$6}END{for (i in a) printf("%s%s", a[i],OFS)}' OFS="," filename


-Devaraj Takhellambam
# 4  
Old 05-26-2009
Hey guys, thanks for your help. I tested both solutions, but I would like to
do a mix between them.

For panyam solution I get unique lines but not joined like

Code:
 
Subject: Summarized Report->GHH_M55 May 21 2009 4:45PM, YUP_M19 May 18 2009 7:45PM, WON_M123 May 17 2009 11:22AM, CET_M123 May 15 2009 9:12AM

and for devtakh solution I get the solution like a joined sentence, but including repeated items.

I replace in your code the part
Code:
a[$2]=$2" "$3...

to
Code:
a[$2]=substr($2,1,length($2)-1)" "$3...

But from here I´m not sure how to present uniques lines in a joined sentence.

One more thing:

Assuming I have 2 scripts how to choose Script1 if "Subject" contains "Summarized Availability Report" within?

Thanks again,

Best regards
# 5  
Old 05-26-2009
Code:
But from here I´m not sure how to present uniques lines in a joined sentence.

use
Code:
"uniq -ud"

to get a single copy of the repeated lines.

Assuming I have 2 scripts how to choose Script1 if "Subject" contains "Summarized Availability Report" within?

that can be done by conditional checking.
# 6  
Old 05-26-2009
try devtakh's solution with small modification
Code:
awk 'NR==1{printf("%s-->",$0)}/^Comment/{a[substr($2,1,length($2)-1)]=$2" "$3" "$4" "$5" "$6}END{for (i in a) printf("%s%s", a[i],OFS)}' OFS="," filename

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help tabulating file putting repeated strings as headers

Hi. May somebody help me with this. I´m trying to tabulate the following input file, but the desired output I´m getting is incorrect. I have access to GNU/LINUX (Ubuntu) and Cygwin Input file STAGE = 1 ID = 0 NAME = JFMSC TYPE = MLRR DFRUL = PERMISSION ADDR = 1001 RRUL =... (10 Replies)
Discussion started by: Ophiuchus
10 Replies

2. UNIX for Dummies Questions & Answers

Append no of times a column is repeated at the end

Hi folks, Iam working on a bash script, i need to print how many times column 2 repeated at the end of each line. Input.txt COL1 COL2 COL3 COL4 1 XX 45 N 2 YY 34 y 3 ZZ 44 N 4 XX 89 Y 5 XX 45 N 6 YY 84 D 7 ZZ 22 S Output.txt COL1 COL2 COL3 COL4 COL5 1 XX 45 N 3 2 YY 34... (6 Replies)
Discussion started by: tech_frk
6 Replies

3. UNIX for Dummies Questions & Answers

Joining ends of strings in certain order with repeated ID's

I posted this a few days ago and got some help (Putting together substrings if pattern is matched - Page 2 | Unix Linux Forums | Shell Programming and Scripting) But I am now stuck on an issue that is similar but not the same really. I want to join parts of one line with parts of another line... (8 Replies)
Discussion started by: verse123
8 Replies

4. Shell Programming and Scripting

Getting the most repeated column

Hi all , i want to get the most repeated column in my file File: name,ID adam,12345 ----1 adam,12345 ----2 adam,934 adam,12345 ----3 john,14 john,13 john,25 ----1 john,25 ----2 tom,1 -----1 tom,2 -----1 so my output to be (5 Replies)
Discussion started by: teefa
5 Replies

5. Shell Programming and Scripting

Deleting repeated lines by keeping only one.

Dear Buddies, Need ur help once again. I have a flat file with around 20 million lines (Huge file it is). However, many of the lines are of no use hence I want to remove it. To find and delete such lines we have certain codes written at the starting of each line. Basis that we can delete the... (2 Replies)
Discussion started by: anushree.a
2 Replies

6. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

7. Emergency UNIX and Linux Support

[Solved] Extract records based on a repeated column value

Hi guys, I need help in making a command to find some data. I have multiple files in which multiple records are present.. Each record is separated with a carriage return and in each record there are multiple fields with each field separated by "|" what i want is that I want to extract... (1 Reply)
Discussion started by: m_usmanayub
1 Replies

8. UNIX for Dummies Questions & Answers

Average for repeated elements in a column

I have a file that looks like this 452 025_E3 8 025_E3 82 025_F5 135 025_F5 5 025_F5 23 025_G2 38 025_G2 71 025_G2 9 026_A12 81 026_A12 10 026_A12 some of the elements in column2 are repeated. I want an output file that will extract the... (1 Reply)
Discussion started by: FelipeAd
1 Replies

9. Shell Programming and Scripting

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends! Each line of my input file has this format: word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the... (2 Replies)
Discussion started by: mjomba
2 Replies

10. Shell Programming and Scripting

repeated column data filter and make as a row

I need to get the output in row wise for the repeated column data Ex: Input: que = five ans = 5 que = six ans = 6 Required output: que = five six ans = 5 6 Any body can guide me?"""""" (2 Replies)
Discussion started by: vasanth_vadalur
2 Replies
Login or Register to Ask a Question