Data extraction from .xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data extraction from .xml file
# 1  
Old 05-10-2016
Data extraction from .xml file

Hello,
I'm attempting to extract 13 digit numbers beginning with 978 from a data file with the following command:

Code:
awk '{ for(i=1;i<=NF;i++) if($i ~ /^978/) print $i; }' datafile > outfile

This typically works. However, the new data file is an .xml file, and this command is no longer working for this reason, I imagine.

How can I either modify this command or convert the file so that the command will function?

Thanks so much!

Last edited by Don Cragun; 05-10-2016 at 09:37 PM.. Reason: Add CODE tags.
# 2  
Old 05-10-2016
Without a representative sample of the contents of datafile and a clear statement of where in the file 978 followed by ten other decimal digits is supposed to be matched, we can only make wild guesses at what might meet your requirements...
# 3  
Old 05-10-2016
Sample from the .xml file:

Code:
<PriceAmount>42.97</PriceAmount>
<CurrencyCode>USD</CurrencyCode>
</Price>
</SupplyDetail>
</Product>
<Product>
<RecordReference>9780028608129</RecordReference>
<NotificationType>03</NotificationType>
<RecordSourceType>04</RecordSourceType>
<ProductIdentifier>
<ProductIDType>15</ProductIDType>
<IDTypeName>ISBN-13</IDTypeName>
<IDValue>9780028608129</IDValue>
</ProductIdentifier>
<ProductIdentifier>
<ProductIDType>14</ProductIDType>
<IDTypeName>GTIN-14</IDTypeName>

Desired output:

Code:
9780028608129
9780028608129

Thanks again!
# 4  
Old 05-11-2016
Are you only looking for values found between <RecordReference> tags and between <IDValue> tags, or are you looking for values between any kings of tags?

What operating system are you using?

Does the grep utility on your system have a -o option?
# 5  
Old 05-11-2016
I wish to extract *all* such numbers (beginning with 978) from the file, irrespective of the tags.

Mac OS - El Capitan, XQuartz 2.7.8

Yes, it appears that grep has the -o option.

Thanks!
# 6  
Old 05-11-2016
Try:
Code:
grep -Eo '978[0-9]{10}' datafile

This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 05-11-2016
Perfect!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data extraction and converting into .csv file.

Hi All, I have a data file and need to extract and convert it into csv format: 1) Read and extract the line containing string ending with "----" (file sample_linebyline.txt file) and to make a .csv file from this. 2) To read the flat file flatfile_sample.txt which consists of similar data (... (9 Replies)
Discussion started by: abhi_123
9 Replies

2. Shell Programming and Scripting

Help with tag value extraction from xml file based on a matching condition

Hi , I have a situation where I need to search an xml file for the presence of a tag <FollowOnFrom> and also , presence of partial part of the following tag <ContractRequest _LoadId and if these 2 exist ,then extract the value from the following tag <_LocalId> which is "CW2094139". There... (2 Replies)
Discussion started by: paul1234
2 Replies

3. Shell Programming and Scripting

Help with XML tag value extraction based on condition

sample xml file part <?xml version="1.0" encoding="UTF-8"?><ContractWorkspace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" _LoadId="export_AJ6iAFmh+pQHq1" xsi:noNamespaceSchemaLocation="ContractWorkspace.xsd"> <_LocalId>CW2218471</_LocalId> <Active>true</Active> ... (3 Replies)
Discussion started by: paul1234
3 Replies

4. Shell Programming and Scripting

CSV file data extraction

Hi I am writing a shell script to parse a CSV file , in which i am facing a problem to separate the columns . Could some one help me with it. IN301330/00001 pvavan kumar limited xyz@ttccpp.com IN302148/00002 PRECIOUS SECURITIES (P) LTD viash@yahoo.co.in IN300239/00000 CENTRE india... (8 Replies)
Discussion started by: nanduri
8 Replies

5. Shell Programming and Scripting

Data extraction from .txt file

Hey all, i´ve got the following problem: i´m aquiring data with an instrument and i get data in a .txt file. This is how the txt file looks like: Report of AU program poptau F1P=-49.986ppm F2P=-110.014ppm Target directory for serfile: D:/data/Spect500/nmr/Thoma/882 Linear... (17 Replies)
Discussion started by: expikx
17 Replies

6. Shell Programming and Scripting

data extraction from a file

Hi Freinds, I have a file1.txt in the following format File1.txt I want to get 2 files from the above file filextra.txt should have the lines which are ending with "<" and remaining lines in the filecompare.txt file. Please help. (3 Replies)
Discussion started by: i150371485
3 Replies

7. Shell Programming and Scripting

Help needed XML Field Extraction

I had an immediate work to sort out the error code and error message which are associated within the log. But here im facing an problem to extract 3 different fields from the XML log can some one please help. I tried using different script including awk & nawk, but not getting the desired output. ... (18 Replies)
Discussion started by: raghunsi
18 Replies

8. Shell Programming and Scripting

data extraction from xml file

I have an of xml file as shown below <?xml version='1.0' encoding='ASCII' standalone='yes' ?> <Station Index="10264" > <Number Value="237895890" /> <Position Lat="-29.5" Lon="3.5" /> <MaxDepth Value="-4939" /> <VeloLines Count="24"> <VeloLine Index="0" > <Depth... (3 Replies)
Discussion started by: shashi792
3 Replies

9. Shell Programming and Scripting

Data Extraction From a File

Hi All, I have a requirement where I have to search the file with some text say "Exception". This exception word can be repeated for more then 10 times. Suppose the "Exception" word is repeated at line numbers say x=10, 50, 60, 120. Now I want to extract all the lines starting from x-5 to... (3 Replies)
Discussion started by: rrangaraju
3 Replies

10. UNIX for Advanced & Expert Users

extraction of data from a text file which follows certain pattern

hi everybody, i have a file, in it I need to extract some data that follows a particular pattern.. For example: my file contains like now running Speak225 sep 22 mon 16:34:05 2008 -------------------------------- ... (4 Replies)
Discussion started by: mohkris
4 Replies
Login or Register to Ask a Question