Extract fragments from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract fragments from file
# 1  
Old 03-27-2014
[Solved] Extract fragments from file

I have a .xml file that looks something like this :
Code:
<measInfo>
.........
string1
.........
</measInfo>

<measInfo>
......
string2
........
</measInfo>

I want to extract only the 'chunk of file' from '<measInfo>' to '</measInfo>' containing string1 (or a certain string that I choose, so the extraction command should contain a variabe for the string. I'm not very good in awk.. so I need your help.
# 2  
Old 03-27-2014
What have you tried so far?
# 3  
Old 03-27-2014
Well.. it's not really relevant from my point of view. I tried a combination of grep -n (to obtain line number) and cascade of head | tail commands .. which delivers the result, but I need it all to be done in awk.

Now.. from awk all I tried was
Code:
 awk "/<measInfo>/,/<\/measInfo>/"

And this gives me all the chunks from <measInfo> to </measInfo> from the target file. But I need to keep only the chucnk that contains the string I look for.. any idea how to keep each chunk in a variable and then to search each variable at a time for my string.. sort of a looping thing.
I know it can be done with awk.. but I don;t have the chops for it



# 4  
Old 03-27-2014
You can try this in awk:
Code:
awk '/measInfo/ || s {s=s?s"\n"$0:$0} /<\/measInfo>/{if (s ~ /string1/) print s; s=""}' file

It saves the xml block in a variable and prints it when the tag is closed and the string "string1" is found.

Alternatively you can use perl to unset the input record separator and use a multiline regex:
Code:
perl -le 'undef $/; $_=<>; print $1 if /(<measInfo>.*string1.+?<\/measInfo>)/s' file

This User Gave Thanks to Subbeh For This Post:
# 5  
Old 03-27-2014
Quote:
Originally Posted by Subbeh
You can try this in awk:
Code:
awk '/measInfo/ || s {s=s?s"\n"$0:$0} /<\/measInfo>/{if (s ~ /string1/) print s; s=""}' file

It saves the xml block in a variable and prints it when the tag is closed and the string "string1" is found.

Alternatively you can use perl to unset the input record separator and use a multiline regex:
Code:
perl -le 'undef $/; $_=<>; print $1 if /(<measInfo>.*string1.+?<\/measInfo>)/s' file

Superb!
Thank you very much.
# 6  
Old 03-30-2014
One more question..
I tried to use the awk snippet replacing <string1> with a variable and it didn;t work:

Code:
-bash-4.1$ awk '/measInfo/ || s {s=s?s"\n"$0:$0} /<\/measInfo>/{if (s ~ /string1/) print s; s=""}' asd | wc -l
38
-bash-4.1$
-bash-4.1$ targetItem=string1
-bash-4.1$ awk "/measInfo/ || s {s=s?s\"\n\"$0:$0} /<\/measInfo>/{if (s ~ /$targetItem/) print s; s=\"\"}" asd | wc -l
0

I replaced single quotes with double quotes and added "\" before other double quotes inside the snippet.
But it doesn;t work. Id doesn't give me a syntax error but doesn;t return nothing.
I think the variable substitution is not done..
Any idea how can I accomplis this or what am I doing wrong ?
# 7  
Old 03-30-2014
You could try:
Code:
awk '... if (s ~ k) ... }'  k="string1" asd ...


--
If there is always an empty line between those xml segments (only then) you could use:
Code:
awk '$0~k' k="string1" RS= asd

Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Why the results of these two code fragments are not the same?

Code 1: #!/bin/sh for arg1 in "$@" do counter=0 for arg2 in "$@" do if && then counter=$((counter+1)) continue fi (8 Replies)
Discussion started by: johnprogrammer
8 Replies

2. Shell Programming and Scripting

Extract sentence and its details from a text file based on another file of sentences

Hi I have two text files. The first file is TEXTFILEONE.txt as given below: <Text Text_ID="10155645315851111_10155645333076543" From="460350337461111" Created="2011-03-16T17:05:37+0000" use_count="123">This is the first text</Text> <Text Text_ID="10155645315851111_10155645317023456"... (7 Replies)
Discussion started by: my_Perl
7 Replies

3. Shell Programming and Scripting

How to extract start/end times from log file to CSV file?

Hi, I have a log file (log.txt) that which contains lines of date/time. I need to create a script to extract a CSV file (out.csv) that gets all the sequential times (with only 1 minute difference) together by stating the start time and end time of this period. Sample log file (log.txt) ... (7 Replies)
Discussion started by: Mr.Zizo
7 Replies

4. Shell Programming and Scripting

Extract rows from file based on row numbers stored in another file

Hi All, I have a file which is like this: rows.dat 1 2 3 4 5 6 3 4 5 6 7 8 7 8 9 0 4 3 2 3 4 5 6 7 1 2 3 4 5 6 I have another file with numbers like these (numbers.txt): 1 3 4 5 I want to read numbers.txt file line by line. The extract the row from rows.dat based on the... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

5. Shell Programming and Scripting

Create shell script to extract unique information from one file to a new file.

Hi to all, I got this content/pattern from file http.log.20110808.gz mail1 httpd: Account Notice: close igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1 mail1 httpd: Account Information: login sastria9@abc.com proxy sid=gFp4DLm5HnU mail1 httpd: Account Notice: close sastria9@abc.com... (16 Replies)
Discussion started by: Mr_47
16 Replies

6. IP Networking

Solaris 11 Express NAT/Router IP Fragments

Upon replacing my linux router/server with a Solaris one I've noticed very poor network performance. The server itself has no issues connecting to the net, but clients using the server as a router are getting a lot of IP fragments as indicated from some packet sniffing I conducted. Here was my... (3 Replies)
Discussion started by: vectox
3 Replies

7. Solaris

ipfilter blocking ip fragments

For some reason ipfilter is blocking inbound fragmented ip packets (the packets are larger than the interface's MTU) that are encapsulating UDP segments. The connection works, so I know ipfilter is letting some traffic through, it is just a lot slower than it should be. Rules that allow the... (3 Replies)
Discussion started by: ilikecows
3 Replies

8. UNIX for Advanced & Expert Users

fragments in Solaris 8

When discussing inodes and data blocks, I know Solaris creates these data blocks with a total size of 8192b, divided into eight 1024b "fragments." It stores data in "contiguous" fragments and solaris doesn't allow a file to use portions of two different fragments. If the file size permits, then the... (4 Replies)
Discussion started by: manderson19
4 Replies
Login or Register to Ask a Question