Using sed's hold-space to filter file contents


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using sed's hold-space to filter file contents
# 1  
Old 08-02-2017
Using sed's hold-space to filter file contents

I wrote an awk script to filter "uninteresting" commands from my ~/.bash_history (I know about HISTIGNORE, but I don't want to exclude these commands from my current session's history, I just want to avoid persisting them across sessions).

The history file can contain multi-line entries with embedded newlines, and entries are separated by timestamps. Given an input file like:
Code:
#1501304269
git stash
#1501304270
ls
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done

the script filters out single-line ls, man, and cat commands, producing:
Code:
#1501304269
git stash
#1501304318
ls | while IFS= read line; do
echo 'line is: ' $line
done

Notice that multi-line entries are unfiltered -- I figure if they're interesting enough to warrant multiple lines, they're worth remembering.

I've been reading about Sed's multiline capabilities and I'm curious how its hold-space and pattern-space might be manipulated to acheive the same filtering as my Awk script. Rather than use Gnu-sed's -z flag to treat the whole file as a single massive pattern space, I'm looking for a solution that uses commands such as h,H,x,G,N,etc. to accumulate lines in the hold space and swap/delete lines as necessary.

Here's the Awk script:
Code:
/^#[[:digit:]]{10}$/ {
  timestamp = $0
  histentry = ""
  next
}
$1 ~ /^(ls?|man|cat)$/ {
  if (! timestamp) {
    print
  } else {
    histentry = $0
  }
  next
}
timestamp {
  print timestamp
  timestamp = ""
}
histentry {
  print histentry
  histentry = ""
}
{ print }


Last edited by ivanbrennan; 08-02-2017 at 12:31 AM.. Reason: adjust line spacing
# 2  
Old 08-02-2017
Since you are only excluding single line commands, you could just peak ahead one line using the N command and only leave out those entries:
Code:
sed '/^#[[:digit:]]\{10\}$/{N; /\nls$/d; /\nman$/d; /\ncat$/d;}' file

or
with GNU sed or BSD sed:
Code:
sed -E '/^#[[:digit:]]{10}$/{N; /\n(ls|man|cat)$/d;}' file

# 3  
Old 08-02-2017
Hm... peaking ahead one line won't let me distinguish a single-line command (which should be excluded if it contains ls|cat|man) from the beginning of a multiline command (which should be kept even if it contains ls|cat|man).

For example, if the exclusion pattern was "xxx", the following input,
Code:
#0000000001
aaa
#0000000002
xxx
bbb
#0000000003
ccc

would result in this output:
Code:
#0000000001
aaa
bbb
#0000000003
ccc

The second record should have passed through unmodified since it has multiple lines, but instead it's head was removed and the rest got tacked onto the previous record.

I was thinking something like, when you reach a timestamp, exchange pattern-space with hold-space (x). Now hold-space is ready to start accumulating the oncoming entry and pattern-space holds whichever entry was previously accumulated. I should be able to perform whatever substitution is necessary on pattern-space now to filter out commands I'm not interested in, since I have the full entry. That gets complicated a bit trying to correctly handle the first and last lines of the file.

My latest failed attempt:
Code:
1,/^#[[:digit:]]{10}$/ {
  /^#[[:digit:]]{10}$/! {
    p
    d
  }
}

/^#[[:digit:]]{10}$/ {
  x
  /^$/ d
  /\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
  p
}

/^#[[:digit:]]{10}$/ !{
  H
  d
}

$ {
  x
  /\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
  p
}

# 4  
Old 08-02-2017
Hi,
Your awk script is'nt ok here.
I must change the first line.
Code:
/^#[0-9][0-9]*$/ {
uname -a
Linux debian-linux 4.11.0-1-amd64 #1 SMP Debian 4.11.6-1 (2017-06-19) x86_64 GNU/Linux
awk -Wv
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
compiled limits:
max NF             32767
sprintf buffer      2040

You can try that with sed, I think it's ok.
Code:
sed -n '/^#[0-9]\{10\}$/{:A;/\(ls *$\)\|\(\ncat \)\|\(\nman \)/b;$p;N;/\n#[0-9]\{10\}$/!bA;h;s/\(^.*\)\(\n.*$\)/\1/;p;x;s/.*\n//;bA}' lefile

cat & man with space (ie cat lefile or man tr)
It's more hard with ls.
# 5  
Old 08-02-2017
Apparently mawk doesn't support regex repetitions, and maybe not POSIX character classes either.

I couldn't get the desired results from your sed snippet. Not sure why though.

---------- Post updated at 08:20 PM ---------- Previous update was at 08:12 PM ----------

I finally came up with something that works. It's nasty, and I don't doubt there's a better way, but it was satisfying to at least get something working.
Code:
$ {
  1 h
  1!H
  x
  /^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
  p
}

/^#[[:digit:]]{10}$/ !{
  1 h
  1!H
  d
}

/^#[[:digit:]]{10}$/ {
  x
  /^$/ d
  /^#[[:digit:]]{10}$/ d
  /^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d
}

I benchmarked it against my original awk script, as well as against the following gsed script:
Code:
gsed -z -E 's/(#[0-9]{10}\n(cat|ls?|man)([^[:alnum:]][^\n]*)?\n)+(#[0-9]{10}\n|$)/\4/g' histfile

Run on a ~50,000 line file, I get the following results:
  • sed: 80 milliseconds
  • awk: 70 milliseconds
  • gsed: 60 milliseconds
This User Gave Thanks to ivanbrennan For This Post:
# 6  
Old 08-03-2017
Quote:
Originally Posted by ivanbrennan
Apparently mawk doesn't support regex repetitions, and maybe not POSIX character classes either.
[..]
Indeed the mawk version that gets installed by distributions supports neither. I think the latest version does, but you would need to get the source and compile yourself..

--
Your approach seems to also leave out one line commands that do not contain ls man or cat.

Last edited by Scrutinizer; 08-03-2017 at 03:56 AM..
# 7  
Old 08-03-2017
Because d directly jumps to the next cycle, and the input line is not modified in the condition branch, the following code does not need a negated condition.
Code:
/^#[[:digit:]]{10}$/ !{
  1 h
  1!H
  d
}

x
/^$/ d
/^#[[:digit:]]{10}$/ d
/^#[[:digit:]]{10}\n(ls?|cat|man)([^[:alnum:]][[:print:]]*)?$/ d

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Linux

Filter log file contents between date

Hi, Could you please provide me command to filter contents between date in a log file? Say for example, in a log file I want to capture contents between date May 01 from 5am to 9 am. OS -- Linux Regards, Maddy (1 Reply)
Discussion started by: Maddy123
1 Replies

2. Shell Programming and Scripting

BASH- Hold script until all contents of a file is written

I have to hit a very large database to pull fields of information. I have a script that runs multiple instance of the same query against the data base and writes contents to a file. The script terminates before the file is completely written to confirmed by ps -ef | grep <script name>... (3 Replies)
Discussion started by: popeye
3 Replies

3. Shell Programming and Scripting

Hold buffer in sed

Hi Experts, i have a file like below **** table name is xyz row count for previous day 10 row count for today 20 diff between previous and today 10 scan result PASSED **** table name is abc row count for previous day 90 row count for today 35 diff between previous and today 55... (4 Replies)
Discussion started by: Lakshman_Gupta
4 Replies

4. Shell Programming and Scripting

Hold, Replace and Print with sed

Hi, I'm a newbie with scripting so I'd appreciate any help. I have a file import.txt with below text AA_IDNo=IDNoHere AA_Name=NameHere AA_Address=AddressHere AA_Telephone=TelephoneHere AA_Sex=SexHere AA_Birthday=BirthdayHere What I need is that the Lines for Name, Address and... (3 Replies)
Discussion started by: heretolearn
3 Replies

5. Shell Programming and Scripting

Filter date and time form a file using sed command

I want to filter out the date and time from this line in a file. How to do this using sed command. on Tue Apr 19 00:48:29 2011 (12 Replies)
Discussion started by: vineet.dhingra
12 Replies

6. Shell Programming and Scripting

sed: hold buffer question

I've been using sed to help with reformatting some html content into latex slides using the beamer class. Since I'm new to sed, I've been reading a lot about it but I'm stuck on this one problem. I have text that looks like this: ******************* line of text that needs to be... (4 Replies)
Discussion started by: tfrei
4 Replies

7. Shell Programming and Scripting

sed pattern and hold space issues

Good day. Trying to make a sed script to take text file in a certain format and turn it into mostly formatted html. I'm 95% there but this last bit is hurting my head finally. Here's a portion of the text- Budgeting and Debt: Consumer Credit Counseling of Western PA CareerLink 112... (5 Replies)
Discussion started by: fiendracer
5 Replies

8. Shell Programming and Scripting

filter out all the records which are having space in the 8th filed of my file

I have a file which is having fileds separtaed by delimiter. Ex: C;4498;qwa;cghy;;;;40;;222122 C;4498;sample;city;;;;34 2;;222123 C;4498;qwe;xcbv;;;;34-2;;222124 C;4498;jj;sffz;;;;41;;222120 C;4498;eert;qwq;;;;34 A;;222125 C;4498;jj;szxzzd;;;;34;;222127 out of these records I... (3 Replies)
Discussion started by: indusri
3 Replies

9. Shell Programming and Scripting

injecting new line in sed substitution (hold space)

Morning, people! I'd like to call upon your expertise again, this time for a sed endeavor. I've already searched around the forums, didn't find anything that helped yet. background: Solaris 9.x, it's a closed system and there are restrictions to what is portable to it. So let's assume I... (4 Replies)
Discussion started by: ProGrammar
4 Replies

10. Shell Programming and Scripting

filter parts of a big file using awk or sed script

I need an assistance in file generation using awk, sed or anything... I have a big file that i need to filter desired parts only. The objective is to select (and print) the report # having the string "apple" on 2 consecutive lines in every report. Please note that the "apple" line has a HEX... (1 Reply)
Discussion started by: apalex
1 Replies
Login or Register to Ask a Question