sed/awk: Delete matching words leaving only the first instance

01-19-2010

Registered User

2, 0

Join Date: Jan 2010

Last Activity: 19 January 2010, 4:11 PM EST

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

sed/awk: Delete matching words leaving only the first instance

I have an input text that looks like this (comes already sorted):

Code:

on Caturday 22 at 10:15, some event
on Caturday 22 at 10:15, some other event
on Caturday 22 at 21:30, even more events
on Funday 23 at 11:00, yet another event

I need to delete all the matching words between the lines, from the start of each line, leaving only the first instance of each date.
To clarify, i need to turn it into something like this:

Code:

on Caturday 22 at 10:15, some event
                         some other event
               at 21:30, even more events
on Funday 23 at 11:00, yet another event

So then I could format it like this to make it shorter, which is what I'm after:

Code:

on Caturday 22 at 10:15, some event; some other event; at 21:30, even more events
on Funday 23 at 11:00, yet another event

Is there a way to do something like this with sed and awk?

GrinningArmor

View Public Profile for GrinningArmor

Find all posts by GrinningArmor

01-19-2010

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Code:

$ cat data
on Caturday 22 at 10:15, some event
on Caturday 22 at 10:15, some other, comma event
on Caturday 22 at 21:30, even more events
on Funday 23 at 11:00, yet another event

$ awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} t!=$2 {t=$2; s=s"; at "$2","substr($0,index($0,",")+1); next} {s=s";"substr($0,index($0,",")+1)} END {print s}' data
on Caturday 22 at 10:15, some event; some other, comma event; at 21:30, even more events
on Funday 23 at 11:00, yet another event

d=day
t=time
s=string being built for printing

Alister

---------- Post updated at 05:14 PM ---------- Previous update was at 04:40 PM ----------

A bit shorter, if not clearer

Code:

awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} (e=substr($0,index($0,",")+1)) && t!=$2 {t=$2; s=s"; at "$2","e; next} {s=s";"e} END {print s}' data

e=event text

---------- Post updated at 05:21 PM ---------- Previous update was at 05:14 PM ----------

If you are certain that ", " (comma-space) and " at " (space-a-t-space) sequences will not appear in the event text, then this simpler code will do:

Code:

awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} t!=$2 {t=$2; s=s"; at "$2", "$3; next} {s=s"; "$3} END {print s}' data

Last edited by alister; 01-19-2010 at 06:14 PM.. Reason: Added code tags

alister

View Public Profile for alister

Find all posts by alister

01-20-2010

Banned

947, 38

Join Date: Apr 2009

Last Activity: 30 July 2012, 5:38 AM EDT

Location: /usr/bin/vim

Posts: 947

Thanks Given: 13

Thanked 38 Times in 36 Posts

In perl,

Code:

while(<>)  {
        chomp;

        # parse the required words
        @word = split /\s+/, $_;

        #print "$prev_time : $word[4]: $prev_day : $word[2] \n";

        # if the current lines day & previous day equals substitute it with space.
        if ( $word[2] == $prev_day )  {
                if ( $word[4] eq $prev_time )  {
                        # if the current lines time & previous time equals substitute it with space.
                        $word[$_] =~ s/./ /g for ( 0 .. 4 );
                }  else  {
                        # store previous time.
                        $prev_time = $word[4];
                        $word[$_] =~ s/./ /g for ( 0 .. 2 );
                }
        }  else  {
                # store the previous day & time.
                $prev_day = $word[2];
                $prev_time = $word[4];
        }
        print "@word\n";
}

Code:

$ perl t.pl t
on Caturday 22 at 10:15, some event
                         some other event
               at 21:30, even more events
on Funday 23 at 11:00, yet another event

thegeek

View Public Profile for thegeek

Find all posts by thegeek

Shell Programming and Scripting

sed/awk: Delete matching words leaving only the first instance

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Seemingly simple sed, delete between matching lines

Discussion started by: bash_in_my_head

2. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Discussion started by: KC_Rules

3. Shell Programming and Scripting

SED - delete words between two possible words

Discussion started by: meuser

4. Shell Programming and Scripting

Using Sed to Delete Words in a File

Discussion started by: SkySmart

5. UNIX for Dummies Questions & Answers

Delete all rows but leaving first and last ones

Discussion started by: Gery

6. UNIX for Dummies Questions & Answers

sed how to delete between two words within a file

Discussion started by: martinsmith

7. Shell Programming and Scripting

SED: delete matching row and 4 next rows?

Discussion started by: Juha

8. Shell Programming and Scripting

sed find matching pattern delete next line

Discussion started by: aveitas

9. UNIX for Dummies Questions & Answers

sed [delete everything between two words]

Discussion started by: Orbix

10. UNIX for Dummies Questions & Answers

sed option to delete two words within a file

Discussion started by: klannon