A sample of one of the many files whose words I want to count is below (the other files are like this one):
Code:
<intervention id='in16'>
<speaker>
<name>O'Brien, Bill</name>
<birth_date>19290125</birth_date>
<birth_place>UK</birth_place>
<status>Mr</status>
<gender>male</gender>
<institution>
<ni country="UK">HC</ni>
</institution>
<constituency country="UK" region="Normanton"/>
<affiliation>
<hc group="NA"/>
<national_party>Lab</national_party>
</affiliation>
</speaker>
<speech id='sp16' language="EN">When considering local government expenditure and finance, will my right hon. Friend
examine the major problem that is developing in many areas because health and social care is funded by the Department
of Health and local government? Will he take into consideration the need for local authorities properly to fund health
and social care?</speech>
</intervention>
<intervention id='in17'>
<speaker>
<name>Raynsford, Nick</name>
<status>Rt Hon</status>
<gender>male</gender>
<institution>
<ni country="UK">HC</ni>
</institution>
<constituency country="UK" region="Greenwich and Woolwich"/>
<affiliation>
<hc group="NA"/>
<national_party>Lab</national_party>
</affiliation>
<post>The Minister for Local and Regional Government</post>
</speaker>
<speech id='sp17' language="EN">My hon. Friend makes a fair point, but he will be aware that under recent
settlements there has been a sustained increase in local government funding, with a 33 per cent. increase
in real terms since 1997. Specifically, the funding that is targeted on social care has increased above the
average, so the Government are well aware of the need and are putting money into local government to ensure
that the needs of communities are met without imposing unreasonable council tax increases.</speech>
</intervention>
<intervention id='in18'>
<speaker>
<name>Spelman, Caroline</name>
<status>Mrs</status>
<gender>female</gender>
<institution>
<ni country="UK">HC</ni>
</institution>
<constituency country="UK" region="Meriden"/>
<affiliation>
<hc group="NA"/>
<national_party>Con</national_party>
</affiliation>
</speaker>
<speech id='sp18' language="EN">Since 1997, council tax has risen by 70 per cent. and average bills are set
to top £1,000--the highest ever. At the same time, council tax receipts to the Treasury have soared by 80
per cent. Does the Minister accept that the Office of the Deputy Prime Minister has been filling the Chancellor's
coffers by stealth and that the sooner it is gone, the sooner we can restore fairness and accountability to local
government? <omit>12 Jan 2005 : Column 284</omit></speech>
</intervention>
<intervention id='in19'>
<speaker>
<name>Raynsford, Nick</name>
<status>Rt Hon</status>
<gender>male</gender>
<institution>
<ni country="UK">HC</ni>
</institution>
<constituency country="UK" region="Greenwich and Woolwich"/>
<affiliation>
<hc group="NA"/>
<national_party>Lab</national_party>
</affiliation>
<post>The Minister for Local and Regional Government</post>
</speaker>
<speech id='sp19' language="EN">In terms of fairness and accountability, when the hon. Lady's party was in power
grants to local government were cut year after year and local authorities were faced with the real problem of
trying to meet local needs without adequate finance. Since this Government have been in power, the grant to local
government has increased by 33 per cent. in real terms, which has enabled councils to budget prudently. If she
were really worried about council tax, she would be talking to Conservative councils, because they had the
unenviable record last year of setting larger increases than Labour councils--5.4 per cent. compared with 4.7
per cent. Labour is leading the way on keeping council tax down.</speech>
</intervention>
I just want to get the nomber of words in the file (and in many other files like this) EXCLUDING XML (<*?>) TAGS.
Please help!!
mc
Last edited by Scott; 03-30-2011 at 09:25 AM..
Reason: Please use code tags
hello, I want to write a script to find all the files that contain 3 specific patterns. example: shows the files containing any line that contain pattern1, pattern2 and pattern3, but the patterns can be in any order as long as they exist in the line.
can I do that with grep?
thank you (1 Reply)
hello experts,
I want to get the value between 2 patterns.
ex. get hello in <line>hello</line>
Any suggestions?
any sed, grek, awk commands? (11 Replies)
Hi,
I need to create a script that does the following:
1. Read the file for the occurrences of "EXECUTE" and "END" strings.
There will be several occurrences of EXECUTE and END strings on the file.
2. The resulting lines in #1, needs to be searched for the word... (11 Replies)
Gurus,
If is my file
<PRODUCT_TYPE>DN</PRODUCT_TYPE><SERVER_NAME>testserver1</SERVER_NAME><FLAVOR>Windows</FLAVOR><OS>Windows NT</OS><CPU>4</CPU>
<PRODUCT_TYPE>PN</PRODUCT_TYPE><SERVER_NAME>testserver2</SERVER_NAME><FLAVOR>Windows</FLAVOR><OS>Windows NT</OS><CPU>3</CPU>
... (6 Replies)
Hi Gurus,
I have a file say for ex. file1 which has 3500 lines in it which are different account numbers and another file (file2) which has 230000 lines in it. I want to read all the lines in file1 and delete all those lines from file2 which has that same pattern as in file1. I am not quite... (4 Replies)
Hi,
i have a directory /u02.i have 2 files in it like abc1.gz abc2.gz i want to store file pattern in a variable like
f1="abc?"
i don't want to take .gz in variable rather i want .gz appended when i need to unzip the file like
gunzip $f1
Can you please help me how to... (3 Replies)
Hi,
i have following lines of code which is properly working.
CAT1="${InputFile}CAT_*0?????"
CAT2="${InputFile}CAT_*0?????"
CountRecords(){
integer i=1
while ]; do
print P$i `nawk 'END {print NR}' $1 ` >> ${OutputPath}result.txt &
i=i+1
shift
done
}
CountRecords "$CAT1"... (8 Replies)
Hi All,
I've been trying solve this with a simple command but not having much luck. I have a file like this:
Line 1: random_description 123/alert/high random_description2 356/alert/slow
Line 2: random_description3 654/alert/medium
Line 3: random_description4 234/alert/critical
I'm... (7 Replies)
Hi,
I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Hello.
For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)