I try to collect first those content like <w c5=".*" hw=".*" pos=".*?">.*</w> in that A00.xml.
I use the following pattern :
egrep "<w c5=".*" hw=".*" pos=".*?">.*</w>" A00.xml
The result is:
<s n="396"><w c5="PNP" hw="we" pos="PRON">We </w><w c5="VVB" hw="make" pos="VERB">make </w><w c5="AT0" hw="the" pos="ART">the </w><w c5="DT0" hw="most" pos="ADJ">most </w><w c5="PRF" hw="of" pos="PREP">of </w>
</s>
First, there is unexpected part <s n=...>
Second, they are not in list form like this:
<w c5="PNP" hw="we" pos="PRON">We </w>
<w c5="VVB" hw="make" pos="VERB">make </w>
<w c5="AT0" hw="the" pos="ART">the </w>
<w c5="DT0" hw="most" pos="ADJ">most </w>
<w c5="PRF" hw="of" pos="PREP">of </w>