Shell Scripting help


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Shell Scripting help
# 8  
Old 06-18-2019
Quote:
Originally Posted by bakunin
Yes, it doesn't work, because your search string is "<mis>" and what the file contains is "<val:mis>" (and "<seg>" instead of "<val:seg>", etc.). It is rather obvious that you find only what you search for, nothing else. No?

But isn't it obvious how the command above must be changed to reflect the changes in your input? I am convinced that a brilliant young man like you can do that, can't you? Just show us what you tried.

bakunin
Hi Bakunin,

I was using this command and was getting expected output. but then for few lines i found html tags contains some prefix as well.

Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this
Code:
 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.

--- Post updated at 03:47 AM ---

Quote:
Originally Posted by RudiC
Different approach, including your val: case:
Code:
sed -n 's/" pName="vin">//; T; s/^.*dateTime="//; s/<[^>]*>/,/g; s/[ ,]\{2,\}/,/gp' file
2019-06-14 08:30,11111111,Pit,
2019-06-14 10:30,333333,zit,
2019-06-14 08:30,11111111,Pit,

Thanks RudiC....This works perfectly but if sequence changes or any new element is there in lines that also gets printed. I was thinking of greping the specific parameter values (like mis, seg values) and for this i was using below command
Code:
grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

To consider prefixes as well before mis or seg i was trying asterisk to make it optional but seems the way i am using is not correct.
Code:
grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p'

Sequence or prefix may change this way.
Code:
<l:ev dateTime="2019-06-14 08:30" pName="vin"> <mis>11111111</mis><seg>Pit</seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:xyz>4444</val:xyz><val:seg>sit</val:seg><val:mis>222</val:mis>< </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <n:mis>222</n:mis><n:seg>sit</n:seg> </l:ev>

# 9  
Old 06-19-2019
Quote:
Originally Posted by nit42
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

So the prefix in few lines were different((like <s:mis> or <t:mis) ) but the parameter to grep is always same (mis or seg). I was trying to make the prefix optional using asterisk in command like this
Code:
 ( grep 'pName="vin'  temp.txt | sed -n 's:.*<*mis>\(.*\)</*mis>.*<*seg>\(.*\)</*seg>.*:\1\,\2:p' )

but it seems asterisk cant be used inside html tags to make it optional or i am not aware of.
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 10  
Old 06-19-2019
Quote:
Originally Posted by bakunin
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
Thank you bakunin for such a great explanation....This is really going to help me in learning shell.

One thing if i use
Code:
<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>). So can you help me in getting both the results and how can i use this in command.

i was trying this but not sure on how to use this in group
Code:
sed -n 's/<[^:>]*:mis>/,/g; s/[,]\{1,\}/,/gp' temp2.txt

# 11  
Old 06-19-2019
Quote:
Originally Posted by nit42

One thing if i use
Code:
<\([^:]*:\)*mis>

then the result will exclude the lines which doesn't have colon (<mis>11111111</mis>)
Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:
Code:
<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:
:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:
<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:
Code:
<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin
# 12  
Old 06-19-2019
Quote:
Originally Posted by bakunin
That is not the reason at all. In fact you should read about (POSIX basic) regular expressions, because you obviously don't correctly understand how they work:

The asterisk ("*") makes the previous expression optional, but it doesn't match anything in itself. It means "zero or more occurrences of what comes before". Here is an example:

The regular expression "abcd" matches a fixed string, "a", followed by "b", followed by "c", followed by "d". Now, if you change it to "abc*d" its meaning changes to: "a", followed by "b" followed by zero or more occurrences of "c", followed by "d". Here is an example list of strings that would be matched by this expression:

"abd"
"abcd"
"abccd"
"abccccccd"
etc.

Now, in light of this, read your regexp again:

Code:
s:.*<*mis>...

What you did by inserting the "*" after the "<" was to make the "<" optional. Instead of exactly one "<" you now match any number of "<", including zero (that makes it optional). But what you want is to match the "<", then anything that might precede a ":" including the ":" itself. To phrase it differently: a "<", then zeror or more occurrences of "something, followed by a ":", then what you already matched.

So, let us take you original regexp:

Code:
<mis>

and change it to the specification above. First: something, followed by a ':" - or, more robustly, any number of any character save for a ":", followed by a ":" is:

Code:
[^:]*:

Let us put that in:

Code:
<[^:]*:mis>

Next, we need "zero or more" occurrences of this whole group" and therefore we need to first group it to be able to address it with a single asterisk, hence:

Code:
<\([^:]*:\)*mis>

Note, that groups are numbered automatically, so you may need to replace "\1", "\2", etc. in your replacement string with other numbers maybe.

On a side note: you don't need the grep at all because sed can do that itself:

Change:
Code:
 grep 'pName="vin'  temp.txt | sed -n 's:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

to
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>.*<seg>\(.*\)</seg>.*:\1\,\2:p'

I hope this helps.

bakunin
Quote:
Originally Posted by bakunin
Actually: no. Analyse the regexp carefully, i will put in a few extra spaces for emphasis:
Code:
<    \([^:]*:\)*  mis>

So you have: <, which is simply a (fixed string of one) character and at the end mis>, which is also a fixed string. This matches <{something}mis>, yes?

Now, let us get to the interesting part, the middle expression, which will match the {something}: inside the grouping we have [^:]*:. That means: zero or more non-":" characters, followed by a ":". So, it would match (list of examples):

Code:
:
t:
bla:
something:
a list of words:
etc....

Now, as we have grouped that and put an asterisk at the end, we can have OR can not have such an expression before the "mis". Hence we match (putting it all together:

Code:
<mis>              # in this case the expression \([^:]*:\)* occurs simply zero times - not at all
<t:mis>            # [^:]* covers the "t", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla-foo:mis>      # [^:]* covers the "bla-foo", the ":" covers the ":" and the whole \([^:]*:\) occurs one time
<bla:foo:mis>      # [^:]* covers the "bla" (first) and "foo" (second), the ":" covers the ":" and the whole \([^:]*:\) occurs two times

you see from the last example that there is still room for making the regexp more specific, but i didn't want to confuse you with too much information at once. Maybe this is all the precision you need anyway - only you know your data and can know that. If you would need the additional precision to not match the last example you can do that:
Code:
<\([^:]*:\)\{0,1\}mis>

The \{0,1\} works similar to the asterisk, but instead of zero or more occurrences it specifies zero or more but at most one occurrence (this sounds like i'm phrasing it more difficult than necessary but you can change the numbers so that other ranges of allowed occurrences are required).

I hope this helps.

bakunin

Thank you so much......Such a nice explanation and i am really learning from these detailed explanation.

How can this regex expression be used in sed command to fetch the values. Earlier i was using this cmd and when i am changing it with new regex exp i am getting some syntax errors
Code:
sed -n '/pName="vin/ s:.*<mis>\(.*\)</mis>..*:\1:p' temp2.txt

Also one more thing if a single line contains same element twice or any number of time (Not known), how can i get all values separated by any delimiter.

Code:
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>222</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>
output
222-333

# 13  
Old 06-19-2019
How about perl?
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.
# 14  
Old 06-23-2019
Quote:
Originally Posted by MadeInGermany
How about perl?
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp.txt

The .*? is a minimum match, as opposed to the .* greedy match.
The m (match) operator lets you set the delimiter, here #. /expr/ is default i.e. like m/expr/.
grouping works with ( ) in ERE style (like egrep or grep -E). Each group can be referred as $1 $2 ...
(.*?:)? is an optional prefix.
This is really good but i am not able to get this.

if a single line contains same element twice or more number of time (Not known), how can i get all values separated by any delimiter.
for ex:
Code:
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <nim:mis>222</nim:mis><nim:seg>sit</nim:seg> </l:ev>
<l:ev dateTime="2019-06-14 09:30" pName="vin"> <val:mis>4444</val:mis><val:seg>sit</val:seg> <val:mis>333</val:mis> </l:ev>

output using perl command:
Code:
perl -lne 'm#pName="vin"# and m#dateTime="(.*?)".*?<(.*?:)?mis>(.*?)</# and print "$1 $3"' temp2.txt
2019-06-14 09:30 222
2019-06-14 09:30 4444

Expected Output:
Code:
2019-06-14 09:30,222
2019-06-14 09:30,4444 & 333

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

sxsaaas (3 Replies)
Discussion started by: VikrantD
3 Replies

2. Shell Programming and Scripting

help me in Shell Scripting

Hi there please have a look at the code..i want to create Using a named pipe. Run a find in the background starting in the working directory While this is happening wait for input from the user to ask him which file to find. If the user does not enter any data in 10 seconds ask the user again.... (1 Reply)
Discussion started by: kattak1511
1 Replies

3. Shell Programming and Scripting

Shell scripting

Hi, if in a network there are lots of PCs connected with either windows or linux as operating system.Then what will be the shell script for the same and also if the PC has linux in it then we have to find if it is occupied or unoccupied. If the PC has windows in it then we have to find if it is... (6 Replies)
Discussion started by: akansha singh
6 Replies

4. UNIX for Dummies Questions & Answers

Shell Scripting

Hey I have a data in the file named as outputFile.txt. The data is in the format 123456,12345678912345,400,09/09/09,INACTIVE. I want this output without commas ie 12345612345678912345400090909INACTIVE. Please tell me what to do and clear explain all the terms, as I am new to it. (6 Replies)
Discussion started by: sampandey31
6 Replies

5. Web Development

Perl scripting or shell scripting?

i am going to study any one of the scripting languages mentioned above(shell 0r perl scripting) . Which is having more scope for a fresher? (1 Reply)
Discussion started by: Anna Hussie
1 Replies

6. What is on Your Mind?

Shell Scripting vs Perl scripting

Gents, I have been working in a Solaris/Unix environment for about 9 months. I took some linux classses online before getting the job. But, I am not very good at scripting. I want to learn how to script. Do you think that I should start with Shell scripting or Perl? I wanted to continue with... (2 Replies)
Discussion started by: Pouchie1
2 Replies

7. Android

Android Scripting Environment: Shell Scripting and Android

I just upgraded to Android 2.2 from 2.1. The GPS issue that was troublesome in 2.1 seems to have been fixed. Some of web browsing seems faster, but it could just be my connection is better today ;) Flash works in some browsers but not very good and it is too slow for Flash apps designed for... (0 Replies)
Discussion started by: Neo
0 Replies

8. What is on Your Mind?

Shell scripting vs Perl scripting

Hi all, I would like to start developping some good scripting skills. Do you think it would be best to start with shell scripting or Perl? I already got a fundation, really basics, in perl. but I am wondering what would be best to be good at first. Can you please help me determine which one to... (14 Replies)
Discussion started by: Pouchie1
14 Replies

9. Shell Programming and Scripting

Call Shell scripting from Perl Scripting.

Hi How to call a shell scripting through a Perl scripting? Actually I need some value from Shell scripting and passes in the Perl scripting. So how can i do this? (2 Replies)
Discussion started by: anupdas
2 Replies

10. Shell Programming and Scripting

difference between AIX shell scripting and Unix shell scripting.

please give the difference between AIX shell scripting and Unix shell scripting. (2 Replies)
Discussion started by: haroonec
2 Replies
Login or Register to Ask a Question