Need help in getting count from xml file

10-18-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The standards say that grep and other text processing utilities produce unspecified behavior when an input file is not a text file. By definition, text files cannot have any lines longer than the LINE_MAX limit on your system. (On most systems, LINE_MAX is set to 2048 bytes (including the <newline> line terminator.) Your sample file includes lines that are more than 6950 bytes long. Unless the grep man page on your system indicates that it can process text file with unlimited line lengths (or at least lines with lengths longer than whatever the maximum line length is in your files), any results you get from a script using:

Code:

grep -hc customer-no file.xml

cannot be trusted.

If you are trying to count unique customer numbers, you'll need something more powerful than grep. If we make the very wild assumption that the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and that awk on your system (another text processing utility) supports line lengths at least as long as the longest lines in your XML files, you could try:

Code:

awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}' *.xml

If awk can't handle lines that long on your system and the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and appears at the start of each of those lines, you could try the following:

Code:

cut -c1-50 *.xml | awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}'

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-19-2017

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Using other standard utilities, could you try something like this:-

Code:

egrep "^<customer customer-no=\"" *.xml \
 | cut -f1 -d">" \
 | sort \
 | uniq -c

It might be pretty heavy on processing, but it seems to work for me. If egrep is being unpredictable, try putting the cut first instead. That would give cut more lines to process, but I suppose egrep then has shorter lines to consider. I'm not sure which will perform better.

I hope that this helps,
Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

Shell Programming and Scripting

Need help in getting count from xml file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

Discussion started by: hungryd

2. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

Discussion started by: shubh752

3. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Discussion started by: Narendra921631

4. Shell Programming and Scripting

Split xml file into multiple xml based on letterID

Discussion started by: vx04

5. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Discussion started by: mcosta

6. Shell Programming and Scripting

Comparing delta values of one xml file in other xml file

Discussion started by: sharsour

7. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

Discussion started by: ripat

8. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

Discussion started by: ganesan kulasek

9. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Discussion started by: mjavalkar

10. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

Discussion started by: Gary1978