Need help in getting count from xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help in getting count from xml file
# 8  
Old 10-18-2017
The standards say that grep and other text processing utilities produce unspecified behavior when an input file is not a text file. By definition, text files cannot have any lines longer than the LINE_MAX limit on your system. (On most systems, LINE_MAX is set to 2048 bytes (including the <newline> line terminator.) Your sample file includes lines that are more than 6950 bytes long. Unless the grep man page on your system indicates that it can process text file with unlimited line lengths (or at least lines with lengths longer than whatever the maximum line length is in your files), any results you get from a script using:
Code:
grep -hc customer-no file.xml

cannot be trusted.

If you are trying to count unique customer numbers, you'll need something more powerful than grep. If we make the very wild assumption that the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and that awk on your system (another text processing utility) supports line lengths at least as long as the longest lines in your XML files, you could try:
Code:
awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}' *.xml

If awk can't handle lines that long on your system and the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and appears at the start of each of those lines, you could try the following:
Code:
cut -c1-50 *.xml | awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}'

# 9  
Old 10-19-2017
Using other standard utilities, could you try something like this:-
Code:
egrep "^<customer customer-no=\"" *.xml \
 | cut -f1 -d">" \
 | sort \
 | uniq -c

It might be pretty heavy on processing, but it seems to work for me. If egrep is being unpredictable, try putting the cut first instead. That would give cut more lines to process, but I suppose egrep then has shorter lines to consider. I'm not sure which will perform better.



I hope that this helps,
Robin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies

2. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

3. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

4. Shell Programming and Scripting

Split xml file into multiple xml based on letterID

Hi All, We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId. In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file) Header: <?xml version="1.0"... (5 Replies)
Discussion started by: vx04
5 Replies

5. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

6. Shell Programming and Scripting

Comparing delta values of one xml file in other xml file

Hi All, I have two xml files. One is having below input <NameValuePair> <name>Daemon</name> <value>tcp:7474</value> </NameValuePair> <NameValuePair> <name>Network</name> <value></value> </NameValuePair> ... (2 Replies)
Discussion started by: sharsour
2 Replies

7. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

8. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

9. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

10. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies
Login or Register to Ask a Question