sed with pattern using variable


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed with pattern using variable
# 1  
Old 01-11-2018
sed with pattern using variable

Dear Community;

I have a long xml file (100k+ lines) with patterns like below:

Code:
<OfferDefinition Id="123">
        <Type>Timer</Type>
        <Description>Test Text1</Description>
        <MajorPriority>95</MajorPriority>
        <SelectableInPolicy>0</SelectableInPolicy>
    </OfferDefinition>
    <OfferDefinition Id="456">
        <Type>Timer</Type>
        <Description>Test Text2</Description>
        <EnableAtProvisioning>0</EnableAtProvisioning>
        <EndOfProvisioning>0</EndOfProvisioning>
        <SelectableInPolicy>0</SelectableInPolicy>
    </OfferDefinition>

I need to print the Id value in each pattern and add it in the description

Id Value is 456 in below line
Code:
<OfferDefinition Id="456">

New Pattern:
Code:
<OfferDefinition Id="123">
        <Type>Timer</Type>
        <Description>123_Test Text1</Description>
        <MajorPriority>95</MajorPriority>
        <SelectableInPolicy>0</SelectableInPolicy>
    </OfferDefinition>
    <OfferDefinition Id="456">
        <Type>Timer</Type>
        <Description>456_Test Text2</Description>
        <EnableAtProvisioning>0</EnableAtProvisioning>
        <EndOfProvisioning>0</EndOfProvisioning>
        <SelectableInPolicy>0</SelectableInPolicy>
    </OfferDefinition>

I have tried below command but it does not work:

Code:
 var=`sed -n -e '/OfferDefinition Id="/ s/.*\=" *//; s/">//p' file.txt`; 
 sed -n '/<OfferDefinition Id/,/OfferDefinition>/ H;/OfferDefinition>/ {g;s/<Description>/a "$var"/p;x;}' file.txt

Instead of using a variable outside sed, I was trying to get the "id" within the same sed command and append it in the line "Description", but so far - no luck!

Thanks for any help/suggestions on this.
# 2  
Old 01-11-2018
this adds Id value in the Description tag:
Code:
awk '
/<\/OfferDefinition>/ {id=""}
/<OfferDefinition Id=/ {id=$0; sub(".*Id=\"", "", id); sub("\".*", "", id);}
/<Description>.*<\/Description>/ && length(id) {sub("<\/", " " id "</")}
{print $0}
' file.txt

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 01-11-2018
Try also
Code:
awk -F\" '/OfferDefinition>/ {ID = $2} /Description>/ {sub (/>/, "&" ID "_")} 1' file

This User Gave Thanks to RudiC For This Post:
# 4  
Old 01-11-2018
With sed
Code:
sed -E '/Offer/{h;s/.*"(.*)">/\1/;x;};/Description/G;s/(.*>)(.*>)\n(.*)/\1\3_\2/' infile

# 5  
Old 01-11-2018
Also doable with sed, continuing on your attempt
Code:
sed -e '/<OfferDefinition Id="/ {h; s/.*=" *//; s/ *">.*//; x; n;}' -e '/<Description>/ {H; x; s/\(.*\)\n\(.*<Description>\)/\2\1_/;}' file.txt

Better readable in two lines
Code:
sed '
  /<OfferDefinition Id="/ {h; s/.*=" *//; s/" *>.*//; x; n;}
  /<Description>/ {H; x; s/\(.*\)\n\(.*<Description>\)/\2\1_/;}
' file.txt

Of course digging out the saved value from the hold space is a bit of a hack (also in the previous post).

Last edited by MadeInGermany; 01-11-2018 at 01:44 PM..
This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 01-11-2018
Quote:
Originally Posted by mystition
I have tried below command but it does not work:

Code:
 var=`sed -n -e '/OfferDefinition Id="/ s/.*\=" *//; s/">//p' file.txt`; 
 sed -n '/<OfferDefinition Id/,/OfferDefinition>/ H;/OfferDefinition>/ {g;s/<Description>/a "$var"/p;x;}' file.txt

Instead of using a variable outside sed, I was trying to get the "id" within the same sed command and append it in the line "Description", but so far - no luck!
Standard disclaimer: to "understand" XML a program(ming language) needs to work context-sensitive. For this you need a (recursive) parser Because regexp machines (like sed or awk) aren't parserswhatever you can create with these will always retain some sort of uncertainty - in other works it will always be possible to trick them into doing something they shouldn't by crafting the input in a respective way.

Having said this: there is nothing wrong with a "best-effort" solution as long as you are aware that it is exactly this.

Your sed script was already quite close, here is how it goes:

First, you need to set rules what happens with which type of lines:

1) In a line of the form <OfferDefinition Id=...> we need to extract the value ID and store it somewhere.

2) In a line of the form </OfferDefinition> the block within which the ID makes sense ends and we have to drop the stored value there.

3) In a line of the form <Description>....</Description>we need to insert the stored value if there is one.

Notice that i assume the lines to be "well-behaved". This tag:

Code:
<Description>
....
</Description>

would be well inside the definition but would confuse the regexp as it is. You would have to work on this if you want to cover that too. Likewise for some other quirks - this is what i was talking above.

Now let us implement the three rules, notice that the explanations are NOT part of the script. Also notice (the last line) that th content of the hold space contains a line break, which we have to clear. This is one of the more tricky things when you work with multiline patterns:

Code:
sed '/<OfferDefinition Id=.*>/ {                # rule 1-lines
          p                                     # print, so that the unaltered line is in the output
          s/.*Id="//                            # remove everything up to Id="
          s/">.*//                              # remove the trailing part, isolating the value
          h                                     # move that to the hold space
          d                                     # and delete from pattern space
     }
     /<\/OfferDefinition>/ {                    # rule 2-lines
          p                                     # print unaltered line
          d                                     # delete pattern space
          x                                     # exchange hold/pattern (= clear hold)
          d                                     # and delete pattern again
     }
     /<Description>.*<\/Description>/ {         # rule 3-lines
          s/[   ]*$//                           # clear trailing whitespace
          G                                     # append hold space content to pattern space
          s/\(<Description>\)\(.*\)\(<\/Description>\)\(.*\)/\1\4_\2\3/
                                                # rearrange contents:
                                                # from: <Des>content</Desc>val
                                                # to:   <Des>val_content</Desc>
          s/\n//                                # remove extra line breaks
     }' /path/to/input

I hope this helps.

bakunin
These 3 Users Gave Thanks to bakunin For This Post:
# 7  
Old 01-11-2018
Good idea, clear the hold buffer, so a <Description> outside the </OfferDefinition> block will not be altered.
But in my tests
Code:
/<\/OfferDefinition>/ {p; d; x; d;}

and
Code:
/<\/OfferDefinition>/ {p; d; h;}

failed(?), but
Code:
/<\/OfferDefinition>/ {x; s/.*//; x;}

worked.
Ah, of course, the d command jumps to the next input cycle, so the following commands are not run.

Last edited by MadeInGermany; 01-11-2018 at 02:45 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[sed] Finding and sticking the pattern to the beginning of successive lines up to the next pattern

I have a file like below. 2018.07.01, Sunday 09:27 some text 123456789 0 21 0.06 0.07 0.00 2018.07.02, Monday 09:31 some text 123456789 1 41 0.26 0.32 0.00 09:39 some text 456789012 1 0.07 0.09 0.09 09:45 some text 932469494 1 55 0.29 0.36 0.00 16:49 some text 123456789 0 48 0.12 0.15 0.00... (9 Replies)
Discussion started by: father_7
9 Replies

2. Shell Programming and Scripting

How to delete all lines before a particular pattern when the pattern is defined in a variable?

I have a file Line 1 a Line 22 Line 33 Line 1 b Line 22 Line 1 c Line 4 Line 5 I want to delete all lines before last occurrence of a line which contains something which is defined in a variable. Say a variable var contains 'Line 1', then I need the following in the output. ... (21 Replies)
Discussion started by: Soham
21 Replies

3. Shell Programming and Scripting

sed -- Find pattern -- print remainder -- plus lines up to pattern -- Minus pattern

The intended result should be : PDF converters 'empty line' gpdftext and pdftotext?xml version="1.0"?> xml:space="preserve"><note-content version="0.1" xmlns:/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">PDF converters gpdftext and pdftotext</note-content>... (9 Replies)
Discussion started by: Klasform
9 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. Shell Programming and Scripting

Regex in sed to find specific pattern and assign to variable

(5 Replies)
Discussion started by: radioactive9
5 Replies

6. Shell Programming and Scripting

How to use sed to search a particular pattern in a file backward after a pattern is matched.?

Hi, I have two files file1.txt and file2.txt. Please see the attachments. In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to... (9 Replies)
Discussion started by: saurabh kumar
9 Replies

7. Shell Programming and Scripting

Pattern match exclusive return pattern/variable

I have an application(Minecraft Server) that generates a logfile live. Using Crontab and screen I send a 'list' command every minute. Sample Log view: 2013-06-07 19:14:37 <Willrocksyea1> hello* 2013-06-07 19:14:41 <Gromden29> hey 2013-06-07 19:14:42 Gromden29 lost connection:... (1 Reply)
Discussion started by: gatekeeper258
1 Replies

8. Shell Programming and Scripting

sed with complicated variable pattern

Hi, Below is the content of the file how it looks: # EMAIL #export BMS_EMAIL_ENABLED=true export BMS_EMAIL_ENABLED=false #export BMS_EMAIL_SERVER=esasmtp01.kohls.com export BMS_EMAIL_SERVER=esasmtp01.kohls.com.SMTP_SERVICE export BMS_EMAIL_FROM_ADDRESS=ec_notify@kohlsectest.com export... (4 Replies)
Discussion started by: pravintse
4 Replies

9. Shell Programming and Scripting

sed: Find start of pattern and extract text to end of line, including the pattern

This is my first post, please be nice. I have tried to google and read different tutorials. The task at hand is: Input file input.txt (example) abc123defhij-E-1234jslo 456ujs-W-abXjklp From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies

10. Shell Programming and Scripting

Need help in sed command ( Replacing a pattern inside a file with a variable value )

Hello, The following sed command is giving error sed: -e expression #1, char 13: unknown option to `s' The sed command is echo "//-----" | sed "s/\/\/---*/$parChk/g" where parChk="//---ee-" How can i print the variable value from sed command ? And is it possible to replace a... (2 Replies)
Discussion started by: frozensmilz
2 Replies
Login or Register to Ask a Question