Shell Scripting help

06-18-2019

Registered User

9, 0

Join Date: Jun 2019

Last Activity: 16 July 2019, 8:13 AM EDT

Posts: 9

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by bakunin

Yes, it doesn't work, because your search string is "<mis>" and what the file contains is "<val:mis>" (and "<seg>" instead of "<val:seg>", etc.). It is rather obvious that you find only what you search for, nothing else. No?

But isn't it obvious how the command above must be changed to reflect the changes in your input? I am convinced that a brilliant young man like you can do that, can't you? Just show us what you tried.

bakunin

Hi Bakunin,

I was using this command and was getting expected output. but then for few lines i found html tags contains some prefix as well.

Code:

 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this

Code:

 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.

--- Post updated at 03:47 AM ---

Quote:

Originally Posted by RudiC

Different approach, including your val: case:

Code:

sed -n 's/" pName="vin">//; T; s/^.*dateTime="//; s/<[^>]*>/,/g; s/[ ,]\{2,\}/,/gp' file
2019-06-14 08:30,11111111,Pit,
2019-06-14 10:30,333333,zit,
2019-06-14 08:30,11111111,Pit,

Thanks RudiC....This works perfectly but if sequence changes or any new element is there in lines that also gets printed. I was thinking of greping the specific parameter values (like mis, seg values) and for this i was using below command

Code:

grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

To consider prefixes as well before mis or seg i was trying asterisk to make it optional but seems the way i am using is not correct.

Code:

grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p'

Sequence or prefix may change this way.

Code:

<l:ev dateTime="2019-06-14 08:30" pName="vin"> <mis>11111111</mis><seg>Pit</seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:xyz>4444</val:xyz><val:seg>sit</val:seg><val:mis>222</val:mis>< </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <n:mis>222</n:mis><n:seg>sit</n:seg> </l:ev>

nit42

View Public Profile for nit42

Find all posts by nit42

06-19-2019

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by nit42

Code:

 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this

Code:

 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.

That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:

s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:

<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:

[^:]*:

Let us put that in:

Code:

<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:

<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:

Code:

 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to

Code:

sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin

This User Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-19-2019

Registered User

9, 0

Join Date: Jun 2019

Last Activity: 16 July 2019, 8:13 AM EDT

Posts: 9

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by bakunin

That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:

s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:

<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:

[^:]*:

Let us put that in:

Code:

<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:

<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:

Code:

 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to

Code:

sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin

Thank you bakunin for such a great explanation....This is really going to help me in learning shell.

One thing if i use

Code:

<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>). So can you help me in getting both the results and how can i use this in command.

i was trying this but not sure on how to use this in group

Code:

sed -n 's/<[^:>]*:mis>/,/g; s/[,]\{1,\}/,/gp' temp2.txt

nit42

View Public Profile for nit42

Find all posts by nit42

06-19-2019

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by nit42

One thing if i use

Code:

<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>)

Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:

Code:

<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:

:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:

<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:

Code:

<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-19-2019

Registered User

9, 0

Join Date: Jun 2019

Last Activity: 16 July 2019, 8:13 AM EDT

Posts: 9

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by bakunin

That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:

s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:

<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:

[^:]*:

Let us put that in:

Code:

<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:

<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:

Code:

 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to

Code:

sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin

Quote:

Originally Posted by bakunin

Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:

Code:

<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:

:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:

<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:

Code:

<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin

Thank you so much......Such a nice explanation and i am really learning from these detailed explanation.

How can this regex expression be used in sed command to fetch the values. Earlier i was using this cmd and when i am changing it with new regex exp i am getting some syntax errors

Code:

sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>..*:\1:p' temp2.txt

Also one more thing if a single line contains same element twice or any number of time (Not known), how can i get all values separated by any delimiter.

Code:

<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>222</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>
output
222-333

nit42

View Public Profile for nit42

Find all posts by nit42

06-19-2019

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

How about perl?

Code:

perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

06-23-2019

Registered User

9, 0

Join Date: Jun 2019

Last Activity: 16 July 2019, 8:13 AM EDT

Posts: 9

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by MadeInGermany

How about perl?

Code:

perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.

This is really good but i am not able to get this.

if a single line contains same element twice or more number of time (Not known), how can i get all values separated by any delimiter.
for ex:

Code:

<l:ev dateTime="2019-06-14 09:30" pName="vin"> <nim:mis>222</nim:mis><nim:seg>sit</nim:seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>4444</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>

output using perl command:

Code:

perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp2.txt
2019-06-14 09:30 222
2019-06-14 09:30 4444

Expected Output:

Code:

2019-06-14 09:30,222
2019-06-14 09:30,4444 & 333

nit42

View Public Profile for nit42

Find all posts by nit42

UNIX for Beginners Questions & Answers

Shell Scripting help

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

Discussion started by: VikrantD

2. Shell Programming and Scripting

help me in Shell Scripting

Discussion started by: kattak1511

3. Shell Programming and Scripting

Shell scripting

Discussion started by: akansha singh

4. UNIX for Dummies Questions & Answers

Shell Scripting

Discussion started by: sampandey31

5. Web Development

Perl scripting or shell scripting?

Discussion started by: Anna Hussie

6. What is on Your Mind?

Shell Scripting vs Perl scripting

Discussion started by: Pouchie1

7. Android

Android Scripting Environment: Shell Scripting and Android

Discussion started by: Neo

8. What is on Your Mind?

Shell scripting vs Perl scripting

Discussion started by: Pouchie1

9. Shell Programming and Scripting

Call Shell scripting from Perl Scripting.

Discussion started by: anupdas

10. Shell Programming and Scripting

difference between AIX shell scripting and Unix shell scripting.

Discussion started by: haroonec