I inherited a script that contains the following sed command:
What I'm wondering is whether ABCD has a special pattern matching value in sed, such as a character class similar or identical to [A-Z].
I'm thinking they were intended to be literal values.
Thanks in advance!
Last edited by Franklin52; 04-29-2011 at 03:51 AM..
Reason: Please use code tags
From what I can tell the whole string (ABCD|/p) is a literal. With the wildcard (*), beginning of line ('^'), and end of line ('$') being the only regex.
This basically does these 4 operations:
1. Match any character from beginning of line up to and including 'ABCD|/p' and only print the matching lines.
2. Take the output from the previous sed command and them match from beginning of line any character up to and inclusing the string literal 'ABCD|' and remove the matching string from the output.
3. Take the output from the previous sed command and them match the string literal '|ABCD' if it is at the end of the line and remove it.
4. Output the results to a file named ${fileName}.tmp
Example:
File (test.txt) Contents:
Command (output to screen and not a file):
Results:
From what I can tell the whole string (ABCD|/p) is a literal.
Mmmmm... no.
The string 'ABCD|' is a literal, the rest '/p' is the end of regex ('/') and print command ('p'). So the first sed command
is an instruction to print only lines containing 'ABCD|'. There is some redundancy there; the following would do the same:
The second sed removes the beginning of line until ABCD| including. Note that matching here is greedy, so if you have multiple instances of ABCD| there, the pattern is gonna match the longest possible substring. E.g.:
The third one removes trailing '|ABCD'
I oddly missed that '/' was the deliminating character for the regex pattern and therefore the p on the other side was actually a sed command.
I guess the '|/' threw me off...
Thank you both. I too feel the ABCD is literal. I think mirni is correct about the overall behavior (as were you ddreggors with the exception of the print).
The twist in all of this is that this sed command has been used at the end of an extraction process, supposedly to remove lines that contain leading ABDC followed by a pipe, and trailing ABCD preceded by a pipe. And to print (p), everything else to a .tmp file. Then a mv command was usedto overwrite the original file with the tmp file.
But when I run this command against a sample file created with the first 1000 lines of one of our prod files, the tmp file is empty each time. However when it runs in production the original file has the same byte count as the processed and renamed output file (as measured before the mv command is carried out). So print is working in prod.
The files are too big to diff though I compared the first 100K rows of the before and after to each other, and then the last 100K rows and came up with no differences each time. So in production it is writing every line to the tmp file, but when I run it it writes no lines to the tmp file.
My original intent was to change this over to a Perl pattern match and replace in the hopes of speeding up the process, but I wanted to understand the sed statement first. Now it's looking like at best sed is doing nothing, given that the before and after files are identical. But I still need to figure out why my tmp file is empty, using the same command (from the command line), while it prod the tmp file is the same size as the original file (when run from a script).
You are not gonna get much speed-up, if any at all, by using Perl.
If you could post a sample input, we might be able to help you out.
Can you post output of:
Also, you have -n switch there, so sed will not print unless explicitly instructed ('p' command). Which means output of this sed filter should be smaller than original input.
You are not gonna get much speed-up, if any at all, by using Perl.
If you could post a sample input, we might be able to help you out.
Can you post output of:
Also, you have -n switch there, so sed will not print unless explicitly instructed ('p' command). Which means output of this sed filter should be smaller than original input.
I can't post a sample because the data is sensitive.
I have yet to actually find any records that begin with .*ABCD| or end with |ABCD
The p command is in fact explicitly included because the command used is always:
sed -n -e '/^.*ABCD|/p' $fileName | sed -e 's/^.*ABCD|//' | sed -e 's/|ABCD$//' > ${fileName}.tmp
One other thing that is confusing me is that the pattern before the print matches the one after the print. The only difference seems to be that the first is being fed to print, while the second occurrence is being targeted for removal. I'm am not quite sure what the developer's intent was there.
As for sed itself, would it be safe to say that the result of this specific command would be that any entire line which either begins with any one or more characters followed by a literal ABCD and a | would be removed. And any line ending with a pipe followed by a literal ABCD and end of line would be removed?
I need to check the condition of a variable before the script continues and it needs to match a specific pattern such as EPS-03-0 or PDF-02-1.
The first part is a 3 or 4 letter string followed by a hyphen, then a 01,02 or 03 followed by a hyphen then a 0 or a 1.
I know I could check for every... (4 Replies)
'Hi
I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match.
Which option is to be used to exclude the line containing the pattern?
sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Hi guys
I have the following case statement in my script:
case $pn.$db in
*?.fcp?(db)) set f ${pn} cp ;;
*?.oxa?(oxa) ) set oxa $pn ;;
esac
Can somebody help me to understand how to interpret *?.fcp?(db)) or *?.oxa?(oxa) ?
I cannot figure out how in this case pattern maching... (5 Replies)
Hi Guys
I am trying to check if the pattern "# sign followed by one or several tabs till the end of the line" exists in my file. I am using the following query:
$ cat myfile | nawk '{if(/^#\t*$/) print "T"}'
Unfortunately it does not return the desired output since I know for sure that the line... (4 Replies)
Hi Guys,
I am trying to setup a check for the string using an "if" statement. The valid entry is only the one which contain Numbers and Capital Alpha-Numeric characters, for example: BA6F, BA6E, BB21 etc...
I am using the following "if" constract to check the input, but it fails allowing Small... (3 Replies)
Hi guys,
I have a file in the following format:
4222 323K 323L D222
494 8134 A023 A024
49 812A 9871 9872
492 A961 A962 A963
491 0B77 0B78 0B79
495 0B7A 0B7B 0B7C
4949 WER9 444L 999O
I need to grep the line... (5 Replies)
Hi,
I am writing a simple log parsing system and have a question on pattern matching.
It is simply grep -v -f patterns.re /var/log/all.log
Now, I have the following in my logs
Apr 16 07:33:17 ad-font-dc1 EvntSLog: AD-FONT-DC1/NTDS ISAM (700) - "NTDS (384) NTDSA: Online defragmentation... (5 Replies)
Hi guys,
got a problem here with sed on the command line.
If i have a string as below:
online xx:wer:xcv: sdf:/asdf/http:https-asdfd
How can i match the pattern "http:" and replace the start of the string to the pattern with null?
I tried the following but it doesn't work:
... (3 Replies)
Hi guys,
I have the following expression :
typeset EXBYTEC_CHK=`egrep ^"+${PNUM}" /bb/data/firmexbytes.dta`
can anybody please explain to me what
^"+${PNUM}"
stands for in egrep statement? Thanks -A (3 Replies)
i can only find the first occurance of a pattern how do i set it to loop untill all occurances have changed.
#! /usr/bin/perl
use POSIX;
open (DFH_FILE, "./dfh") or die "Can not read file ($!)";
foreach (<DFH_FILE>) {
if ($_ !~ /^#|^$/) {
chomp;
... (1 Reply)