Hi All,
I have been working on something that doesn't seem to have a clear regex solution and I just wanted to run it by everyone to see if I could get some insight into the method of solving this problem.
I have a flat text file that contains billing records for users, however the records are stored as XML with each record starting and stopping at <record> and </record> respectively.
What I am trying to do is be able to search for a users id and have it extract the complete record for them.
Sample Data
Quote:
<record>
<recId>xxxxxxxxxxxxxxx</recId>
<created>Wed Dec 17 06:00:16 2008</created>
<userid>jondoe</userid>
<domain>xxxxxxxxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxxx</radIP>
<userIP>0.0.0.0</userIP>
<delta>7598</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>3159</bytesIn>
<bytesOut>563</bytesOut>
<packetsIn>52</packetsIn>
<packetsOut>19</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>xxxxxxxxxxxxxxx</proxyAcctIPAddr>
<proxyAcctAck>1</proxyAcctAck>
<termCause>17</termCause>
<clientIPAddr>xxxxxxxxxxxxxxx</clientIPAddr>
<entityID>955</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>L</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxxx</clientID>
<sessionID>xxxxxxxxxxxxxxxxxxxxxx</sessionID>
<nasID>xxxxx</nasID>
<nasVendor>xxxxxx</nasVendor>
<nasModel>xxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxx</nasPort>
<billingID></billingID>
<startDate>2008/12/17 03:57:06</startDate>
<callingNumber>xxxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>xxxxxxxxxxxxxxxx</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxxxxxxxxxxxx</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
<recId>xxxxxxxxxxxxxxx</recId>
<created>Wed Dec 17 06:00:16 2008</created>
<userid>janedoe</userid>
<domain>xxxxxxxxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxxx</radIP>
<userIP>0.0.0.0</userIP>
<delta>7598</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>3159</bytesIn>
<bytesOut>563</bytesOut>
<packetsIn>52</packetsIn>
<packetsOut>19</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>xxxxxxxxxxxxxxx</proxyAcctIPAddr>
<proxyAcctAck>1</proxyAcctAck>
<termCause>17</termCause>
<clientIPAddr>xxxxxxxxxxxxxxx</clientIPAddr>
<entityID>955</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>L</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxxx</clientID>
<sessionID>xxxxxxxxxxxxxxxxxxxxxx</sessionID>
<nasID>xxxxx</nasID>
<nasVendor>xxxxxx</nasVendor>
<nasModel>xxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxx</nasPort>
<billingID></billingID>
<startDate>2008/12/17 03:57:06</startDate>
<callingNumber>xxxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>xxxxxxxxxxxxxxxx</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxxxxxxxxxxxx</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record><record>
What I would like to be able to do is search for
johndoe and have it spit out all records for
johndoe.
So the output would be the following, however there could be multiple records in the file for this user so it would need to write out the record to a text file or standard output each time it found a record.
Quote:
<record>
<recId>xxxxxxxxxxxxxxx</recId>
<created>Wed Dec 17 06:00:16 2008</created>
<userid>jondoe</userid>
<domain>xxxxxxxxxxxxxxxxxxxx</domain>
<type>260</type>
<nasIP>xxxxxxxxxxxxxxxx</nasIP>
<portType>18</portType>
<radIP>xxxxxxxxxxxxxxx</radIP>
<userIP>0.0.0.0</userIP>
<delta>7598</delta>
<gmtOffset>0</gmtOffset>
<bytesIn>3159</bytesIn>
<bytesOut>563</bytesOut>
<packetsIn>52</packetsIn>
<packetsOut>19</packetsOut>
<proxyAuthIPAddr>0</proxyAuthIPAddr>
<proxyAcctIPAddr>xxxxxxxxxxxxxxx</proxyAcctIPAddr>
<proxyAcctAck>1</proxyAcctAck>
<termCause>17</termCause>
<clientIPAddr>xxxxxxxxxxxxxxx</clientIPAddr>
<entityID>955</entityID>
<entityCtxt>1</entityCtxt>
<backupMethod>L</backupMethod>
<sessionCountInfo></sessionCountInfo>
<clientID>xxxxxxxxxxxxxxx</clientID>
<sessionID>xxxxxxxxxxxxxxxxxxxxxx</sessionID>
<nasID>xxxxx</nasID>
<nasVendor>xxxxxx</nasVendor>
<nasModel>xxxxxxxxxxxx</nasModel>
<nasPort>xxxxxxxx</nasPort>
<billingID></billingID>
<startDate>2008/12/17 03:57:06</startDate>
<callingNumber>xxxxxxxxxxxxxxx</callingNumber>
<calledNumber></calledNumber>
<radiusAttr>xxxxxxxxxxxxxxxx</radiusAttr>
<startAttr></startAttr>
<auditID>xxxxxxxxxxxxxxxxxxxxxxxx</auditID>
<seqNum>0</seqNum>
<accountName></accountName>
</record>
I started with some regex trying to grab <record> then johndoe then </record>
<record>(\s|\S)+johndoe(\s|\S)+</record>
However this is selecting all records if they contain <record> etc and even if I could just extract the portion I want I am not sure how I can have it remember where it left off and keep chewing through the file without creating duplicates.
Since this is being performed on Solairs 10 I wasn't able to use some of the more advanced grep features like grep -B(x) -A(x)
Thanks in advance for any help you can provide