Visit The New, Modern Unix Linux Community


sed/awk: Delete matching words leaving only the first instance


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed/awk: Delete matching words leaving only the first instance
# 1  
Question sed/awk: Delete matching words leaving only the first instance

I have an input text that looks like this (comes already sorted):
Code:
on Caturday 22 at 10:15, some event
on Caturday 22 at 10:15, some other event
on Caturday 22 at 21:30, even more events
on Funday 23 at 11:00, yet another event

I need to delete all the matching words between the lines, from the start of each line, leaving only the first instance of each date.
To clarify, i need to turn it into something like this:
Code:
on Caturday 22 at 10:15, some event
                         some other event
               at 21:30, even more events
on Funday 23 at 11:00, yet another event

So then I could format it like this to make it shorter, which is what I'm after:
Code:
on Caturday 22 at 10:15, some event; some other event; at 21:30, even more events
on Funday 23 at 11:00, yet another event

Is there a way to do something like this with sed and awk?
# 2  
Code:
$ cat data
on Caturday 22 at 10:15, some event
on Caturday 22 at 10:15, some other, comma event
on Caturday 22 at 21:30, even more events
on Funday 23 at 11:00, yet another event

$ awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} t!=$2 {t=$2; s=s"; at "$2","substr($0,index($0,",")+1); next} {s=s";"substr($0,index($0,",")+1)} END {print s}' data
on Caturday 22 at 10:15, some event; some other, comma event; at 21:30, even more events
on Funday 23 at 11:00, yet another event

d=day
t=time
s=string being built for printing

Alister

---------- Post updated at 05:14 PM ---------- Previous update was at 04:40 PM ----------

A bit shorter, if not clearer Smilie

Code:
awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} (e=substr($0,index($0,",")+1)) && t!=$2 {t=$2; s=s"; at "$2","e; next} {s=s";"e} END {print s}' data

e=event text

---------- Post updated at 05:21 PM ---------- Previous update was at 05:14 PM ----------

If you are certain that ", " (comma-space) and " at " (space-a-t-space) sequences will not appear in the event text, then this simpler code will do:

Code:
awk -F' at |, ' 'd!=$1 {if(s)print s; s=$0; d=$1; t=$2; next} t!=$2 {t=$2; s=s"; at "$2", "$3; next} {s=s"; "$3} END {print s}' data


Last edited by alister; 01-19-2010 at 06:14 PM.. Reason: Added code tags
# 3  
In perl,

Code:
while(<>)  {
        chomp;

        # parse the required words
        @word = split /\s+/, $_;

        #print "$prev_time : $word[4]: $prev_day : $word[2] \n";

        # if the current lines day & previous day equals substitute it with space.
        if ( $word[2] == $prev_day )  {
                if ( $word[4] eq $prev_time )  {
                        # if the current lines time & previous time equals substitute it with space.
                        $word[$_] =~ s/./ /g for ( 0 .. 4 );
                }  else  {
                        # store previous time.
                        $prev_time = $word[4];
                        $word[$_] =~ s/./ /g for ( 0 .. 2 );
                }
        }  else  {
                # store the previous day & time.
                $prev_day = $word[2];
                $prev_time = $word[4];
        }
        print "@word\n";
}

Code:
$ perl t.pl t
on Caturday 22 at 10:15, some event
                         some other event
               at 21:30, even more events
on Funday 23 at 11:00, yet another event


Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #20
Difficulty: Medium
India's first Super Computer, the PARAM 8000, was installed in 1991.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Seemingly simple sed, delete between matching lines

There are many matching blocks of text in one file that need to be deleted. This example below is one block that needs to be either deleted or replaced with an empty line. This text below is the input file. The ouput file should be empty Searching Checks. Based on search criteria name: Value :... (2 Replies)
Discussion started by: bash_in_my_head
2 Replies

2. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Sample file: This is line one, this is another line, this is the PRIMARY INDEX line l ; This is another line The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Discussion started by: KC_Rules
5 Replies

3. Shell Programming and Scripting

SED - delete words between two possible words

Hi all, I want to make an script using sed that removes everything between 'begin' (including the line that has it) and 'end1' or 'end2', not removing this line. Let me paste an 2 examples: anything before any string begin few lines of content end1 anything after anything before any... (4 Replies)
Discussion started by: meuser
4 Replies

4. Shell Programming and Scripting

Using Sed to Delete Words in a File

This is a Nagios situation. So i have a list of servers in one file called Servers.txt And in another file called hostgroups.cfg, i want to remove each and every one of the servers in the Servers.txt file. The problem is, the script I wrote is having a problem removing the exact servers in... (5 Replies)
Discussion started by: SkySmart
5 Replies

5. UNIX for Dummies Questions & Answers

Delete all rows but leaving first and last ones

Hello, Merry Christmas to all! I wish you the best for these holidays and the best for the next year 2011. I'd like your help please, I need to delete all the rows in the third column of my file, but without touching nor changing the first and last value position, this is an example of my... (2 Replies)
Discussion started by: Gery
2 Replies

6. UNIX for Dummies Questions & Answers

sed how to delete between two words within a file

I'm hoping someone could help me out please :) I have several .txt files with several hundred lines in each that look like this: 10241;</td><td>10241</td><td class="b">x2801;</td><td>2801</td><td>TEXT-1</td></tr> 10242;</td><td>10242</td><td... (4 Replies)
Discussion started by: martinsmith
4 Replies

7. Shell Programming and Scripting

SED: delete matching row and 4 next rows?

Hi, Tried to look for solution, and found something similar but could not adapt the solution for my needs.. I'm trying to match a pattern (in this case "ProcessType")in a logfile, then delete that line and the 4 following lines. The logfile looks as follows: ProcessType: PROCESS_A... (5 Replies)
Discussion started by: Juha
5 Replies

8. Shell Programming and Scripting

sed find matching pattern delete next line

trying to use sed in finding a matching pattern in a file then deleting the next line only .. pattern --> <ad-content> I tried this but it results are not what I wish sed '/<ad-content>/{N;d;}' akv.xml > akv5.xml ex, <Celebrant2First>Mickey</Celebrant2First> <ad-content> Minnie... (2 Replies)
Discussion started by: aveitas
2 Replies

9. UNIX for Dummies Questions & Answers

sed [delete everything between two words]

Hi, I have the following codes below that aims to delete every words between two pattern word. Say I have the files To delete every word between WISH_LIST=" and " I used the below codes (but its not working): #!/bin/sh sed ' /WISH_LIST=\"/ { N /\n.*\"/ {... (3 Replies)
Discussion started by: Orbix
3 Replies

10. UNIX for Dummies Questions & Answers

sed option to delete two words within a file

Could someone please help me with the following. I'm trying to figure out how to delete two words within a specific file using sed. The two words are directory and named. I have tried the following: sed '//d' sedfile sed '//d' sedfile both of these options do not work..... ... (4 Replies)
Discussion started by: klannon
4 Replies

Featured Tech Videos