Visit Our UNIX and Linux User Community


sed newbie scripting assistance


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed newbie scripting assistance
# 1  
Old 08-11-2012
Question sed newbie scripting assistance

Howdy folks,
I'm trying to craft a log file summarisation tool for an application that creates a lot of duplicate entries with only a different suffix to indicate point of execution. I thought I'd gotten close but I'm clearly missing something.

Here's a genericized version:
A text_file (infile_grocery.txt) with these contents.
Code:
milk skim fruit apple banana
milk skim fruit orange
milk skim fruit mango
milk skim fruit pomegranate
milk 2 percent fruit cherry tomato
milk 2 percent fruit peach
milk whole fruit pineapple
milk skim fruit strawberry raspberry
milk skim fruit strawberry rhubarb
milk whole fruit pineapple

What I'm hoping to get is:
Code:
milk skim fruit apple banana, orange, mango, pomegranate
milk 2 percent fruit cherry tomato, peach
milk whole fruit pineapple
milk skim fruit strawberry raspberry, strawberry rhubarb
milk whole fruit pineapple

The command line I've cooked up is:
Code:
sed -rn "{H;x;s|^(.+) fruit ([^\n]+)\n(.*)\1 fruit (.+)$|\1 fruit \2, \4|;x}; ${x;s/^\n//;p}" infile_grocery.txt

But the results I'm getting are:
Code:
milk skim fruit apple banana, mango, strawberry raspberry
milk skim fruit strawberry rhubarb
milk whole fruit pineapple

Clearly I'm skipping chunks of lines somehow but I've been staring at this too long and I can't see it. Anyone have any suggestions for me?
# 2  
Old 08-12-2012
If you aren't absolutely set on using sed, I think it's easier with awk:


Code:
awk '
    function printlist()
    {
        sub( ", $", "", list );
        printf( "%sfruit %s\n", last, list );
        list = "";
    }

    {
        x = $0;
        sub( "fruit.*", "", x );
        gsub( ".*fruit ", "", $0 );
        if( list && x != last )
            printlist();
        list = list $0 ", ";
        last = x;
    }
    END {
        if( list )
            printlist();
    }
 ' input-file >output-file

# 3  
Old 08-13-2012
Won't this awk solution have a problem with very large files? To my eye, it looks like it's trying to load the whole thing into memory first...

If that's the case, couldn't it be problematic since the log files have already gotten to 500 MB in a half day? (partly why I'm looking to summarise duplicate content)

Personally I was leaning more toward sed just because it's a lighterweight install for the pc platform and can hopefully be invoked with a single commandline (aiming to shell out of notepad++, modify the buffer, and reload)

Last edited by mthespian; 08-13-2012 at 03:42 PM.. Reason: adding info
# 4  
Old 08-13-2012
It's not trying to load the whole file in memory. It caches one copy of the first bits of a new line (things up to "fruit") and the list of 'items' that follow. When the first bits change, the record, with the summary of items, is written and the caching/list starts anew (list = ""). So, the only significant amount of "stuff" that is ever held in memory is the list of items.


Now, if that list is huge, then the programme could be amended to write them out as it finds them. My assumption was that the list wasn't going to be more than 1 or 2 K.

---------- Post updated at 21:55 ---------- Previous update was at 21:42 ----------

Turns out that printing the list as we go is a simpler programme; just didn't see it that way the other night.


Code:
awk '
    {
        x = $0;
        sub( "fruit.*", "", x );
        gsub( ".*fruit ", "", $0 );
         if( x != last )       # if first bits are different, print newline (if needed) and the current line
            printf( "%s%sfruit %s", NR > 1 ? "\n" : "", x, $0 );
         else         # first bits are the same, print just what is after fruit
            printf( "%s%s", NR > 1 ? ", " : "", $0 );
        last = x;
    }
     END { printf( "\n" );  }     # must have final newline
' input-file


Previous Thread | Next Thread
Test Your Knowledge in Computers #453
Difficulty: Easy
The TiVo digital video recorder is based on the Linux kernel and GNU software.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed assistance

Hello everyone. I am trying to replace sprintf(buffer, "{\"id\":1,\"method\":\"mining.update_block\",\"params\":}\n", coinid, blockhash); with sprintf(buffer, "{\"id\":1,\"method\":\"mining.update_block\",\"params\":}\n", coinid, blockhash); this is the code I was trying but is... (9 Replies)
Discussion started by: crombiecrunch
9 Replies

2. Shell Programming and Scripting

Noob to scripting needs some assistance

Hello, I am in a Unix class and have been out of town. I have been tasked to generate a couple of scripts and ahve never done it before. I have a virtual machine running Ubuntu. The task is below Prompt the system administrator for all valid input parameters Generate a menu to ask which... (1 Reply)
Discussion started by: jkeeton81
1 Replies

3. Shell Programming and Scripting

Dhcp.config file scripting assistance

Hello everyone! I am brand new at this forum thing and wanted to thank all of you for your time and help in advance for helping me troubleshoot my issue. I am fairly new to shell scripting and scoured the entire internet to find a solution for my issue to no avail and hope you're able to help. ... (2 Replies)
Discussion started by: sedrocks
2 Replies

4. Shell Programming and Scripting

Using ii for irc chat - scripting assistance?

I am using ii for irc on my pogoplug... hxxp://hg.suckless.org/ii/file/d163c8917af7/FAQ If you look at the bottom of there, it states 31 What other fancy stuff can I do with ii? 32 ---------------------------------------- 33 It is very easy to write irc bots in ii: 34... (3 Replies)
Discussion started by: spartan2006
3 Replies

5. UNIX for Advanced & Expert Users

Need assistance with sed

Hi All, I need your assistance, I would like to replace all lines beginning with the word "begin" with the below text: Device | IPMB0-A | IPMB0-B Board Address |Sent SentErr %Errr |Sent SentErr ... (10 Replies)
Discussion started by: Dendany83
10 Replies

6. Shell Programming and Scripting

Need assistance with sed

Hi All, I need your assistance, I would like to replace all lines beginning with the word "begin" with the below text: Device | IPMB0-A | IPMB0-B Board Address |Sent SentErr %Errr |Sent SentErr ... (9 Replies)
Discussion started by: Dendany83
9 Replies

7. Shell Programming and Scripting

Assistance in Perl scripting

PFA file "color.txt". Note : There is no newline character in the file. I have manually inserted the newline char to make it easy to understand. I am expecting out in the form as specified in second file "out.txt" I need a perl script to perform the task. Thanks in advance. (2 Replies)
Discussion started by: deo_kaustubh
2 Replies

8. Shell Programming and Scripting

Scripting neophyte needs file manipulation assistance

I need to write two shell scripts for an rsync backup solution. The first script will copy all backed up files into a folder named after the original folder, plus a date stamp (so e.g. if the original folder name is 'foo' and is backed up on the 10th of September, then the backup folder will be... (0 Replies)
Discussion started by: LambdaCalculus
0 Replies

9. Shell Programming and Scripting

scripting newbie... some help please?

hi all, i am just getting in to bash scripting, so don't be too harsh... i've created this little backup script, and it's just awfull... ugly, doesn't work like I want it to, the works. anyways, i was hoping some of you might help me improve it and learn a little in the process. what i... (13 Replies)
Discussion started by: jmd9qs
13 Replies

10. Shell Programming and Scripting

Scripting Newbie

Seems simple but I am having difficulty with this one: I am trying to write a single command line argument (which will be a path) - the program should print out the owner of the path. I can not get anything I write to run. Please help. (5 Replies)
Discussion started by: Kymmers7
5 Replies

Featured Tech Videos