Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Shell Scripting help


Login or Register to Reply

 
Thread Tools Search this Thread
# 8  
Quote:
Originally Posted by bakunin
Yes, it doesn't work, because your search string is "<mis>" and what the file contains is "<val:mis>" (and "<seg>" instead of "<val:seg>", etc.). It is rather obvious that you find only what you search for, nothing else. No?

But isn't it obvious how the command above must be changed to reflect the changes in your input? I am convinced that a brilliant young man like you can do that, can't you? Just show us what you tried.

bakunin
Hi Bakunin,

I was using this command and was getting expected output. but then for few lines i found html tags contains some prefix as well.

Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this
Code:
 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.

--- Post updated at 03:47 AM ---

Quote:
Originally Posted by RudiC
Different approach, including your val: case:
Code:
sed -n 's/" pName="vin">//; T; s/^.*dateTime="//; s/<[^>]*>/,/g; s/[ ,]\{2,\}/,/gp' file
2019-06-14 08:30,11111111,Pit,
2019-06-14 10:30,333333,zit,
2019-06-14 08:30,11111111,Pit,

Thanks RudiC....This works perfectly but if sequence changes or any new element is there in lines that also gets printed. I was thinking of greping the specific parameter values (like mis, seg values) and for this i was using below command
Code:
grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

To consider prefixes as well before mis or seg i was trying asterisk to make it optional but seems the way i am using is not correct.
Code:
grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p'

Sequence or prefix may change this way.
Code:
<l:ev dateTime="2019-06-14 08:30" pName="vin"> <mis>11111111</mis><seg>Pit</seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:xyz>4444</val:xyz><val:seg>sit</val:seg><val:mis>222</val:mis>< </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <n:mis>222</n:mis><n:seg>sit</n:seg> </l:ev>

# 9  
Quote:
Originally Posted by nit42
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this
Code:
 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 10  
Quote:
Originally Posted by bakunin
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
Thank you bakunin for such a great explanation....This is really going to help me in learning shell.

One thing if i use
Code:
<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>). So can you help me in getting both the results and how can i use this in command.

i was trying this but not sure on how to use this in group
Code:
sed -n 's/<[^:>]*:mis>/,/g; s/[,]\{1,\}/,/gp' temp2.txt

# 11  
Quote:
Originally Posted by nit42

One thing if i use
Code:
<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>)
Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:
Code:
<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:
:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:
<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:
Code:
<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin
# 12  
Quote:
Originally Posted by bakunin
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
Quote:
Originally Posted by bakunin
Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:
Code:
<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:
:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:
<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:
Code:
<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin

Thank you so much......Such a nice explanation and i am really learning from these detailed explanation.

How can this regex expression be used in sed command to fetch the values. Earlier i was using this cmd and when i am changing it with new regex exp i am getting some syntax errors
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>..*:\1:p' temp2.txt

Also one more thing if a single line contains same element twice or any number of time (Not known), how can i get all values separated by any delimiter.

Code:
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>222</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>
output
222-333

# 13  
How about perl?
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.
# 14  
Quote:
Originally Posted by MadeInGermany
How about perl?
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.
This is really good but i am not able to get this.

if a single line contains same element twice or more number of time (Not known), how can i get all values separated by any delimiter.
for ex:
Code:
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <nim:mis>222</nim:mis><nim:seg>sit</nim:seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>4444</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>

output using perl command:
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp2.txt
2019-06-14 09:30 222
2019-06-14 09:30 4444

Expected Output:
Code:
2019-06-14 09:30,222
2019-06-14 09:30,4444 & 333

Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Perl scripting or shell scripting?
Anna Hussie
i am going to study any one of the scripting languages mentioned above(shell 0r perl scripting) . Which is having more scope for a fresher?... Web Development
1
Web Development
Shell Scripting vs Perl scripting
Pouchie1
Gents, I have been working in a Solaris/Unix environment for about 9 months. I took some linux classses online before getting the job. But, I am not very good at scripting. I want to learn how to script. Do you think that I should start with Shell scripting or Perl? I wanted to continue with...... What is on Your Mind?
2
What is on Your Mind?
Shell scripting vs Perl scripting
Pouchie1
Hi all, I would like to start developping some good scripting skills. Do you think it would be best to start with shell scripting or Perl? I already got a fundation, really basics, in perl. but I am wondering what would be best to be good at first. Can you please help me determine which one to...... What is on Your Mind?
14
What is on Your Mind?
Call Shell scripting from Perl Scripting.
anupdas
Hi How to call a shell scripting through a Perl scripting? Actually I need some value from Shell scripting and passes in the Perl scripting. So how can i do this?... Shell Programming and Scripting
2
Shell Programming and Scripting
difference between AIX shell scripting and Unix shell scripting.
haroonec
please give the difference between AIX shell scripting and Unix shell scripting.... Shell Programming and Scripting
2
Shell Programming and Scripting

Featured Tech Videos