Parsing data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing data
# 1  
Old 03-30-2010
Parsing data

Hi all , I have a file with billing CDR records in it. I need to parse that information (row format) . The purpose is to compare full content. The example I have given below is a single line record but it has two portions, (1) the line start with “!” and end with “1.2.1.8” and (2) second part where the data is separated by two strings (i.e.: <?> and </?>).

Code:
! TICKET NBR : 4 ! GSI : 101 ! 1.2.1.5 04/03/2010 12:33:33 ! 3100.2.104.4 acc_22640827031004 ! 449032008 labs101 ! 3100.2.22.1 CYCL_ST ! 3100.2.20.1 2898381018 ! 1.2.1.8 <A>00092898381018</A><B>Mobility</B><P>10.00</P><P1>0.50</P1><Q></Q><S>Inactive</S><Z></Z><PA></PA><SA></SA><SB></SB><SP>10.00</SP><SQ>0.50</SQ><MS>302612898381018</MS>

Please note I put "-" to separate the output.

Part1 : Each fields are separated by “!” , and I need the value only after x.x.x.x parameter.
Code:
4 - 101 - 04/03/2010 12:33:33 - acc_22640827031004 ….

Part 2:
I need just the values printed between the two fields (i.e.: <?> and </?>)

Code:
  00092898381018 – Mobility - 10.00 -  - 0.50 - -  ….

( Note: if the value is empty , we leave a space)

At the end I like to have the output printed together. Any help will be greatly appreciated.

Thanks
Gamini

more example ...

Code:
! TICKET NBR : 1 ! GSI : 101 ! 1.2.1.5 04/03/2010 14:08:24 ! 3100.2.104.4 acc_22640827031004 ! 449032008 labs101 ! 3100.2.22.1 CYCL_ST ! 3100.2.20.1 2898381018 ! 1.2.1.8 <A>00092898381018</A><B>Mobility</B><P>10.00</P><P1>0.50</P1><Q></Q><S>Active</S><Z></Z><PA></PA><SA></SA><SB></SB><SP>10.00</SP><SQ>0.50</SQ><MS>302612898381018</MS> ! 3100.2.984.45 4 !
! TICKET NBR : 2 ! GSI : 101 ! 1.2.1.5 04/03/2010 15:59:59 ! 3100.2.104.4 acc_1234567890 ! 449032008 labs101 ! 3100.2.22.1 CYCL_ST ! 3100.2.20.1 7805250631 ! 1.2.1.8 <A>007805250631</A><B>Mobility</B><P>0.00</P><P1>0.00</P1><Q></Q><S>Active</S><Z></Z><PA></PA><SA></SA><SB></SB><SP>0.00</SP><SQ>0.00</SQ><MS>000007805250631</MS> ! 3100.2.984.45 4 !
! TICKET NBR : 3 ! GSI : 101 ! 1.2.1.5 04/03/2010 15:26:22 ! 3100.2.104.4 acc_1234567890 ! 449032008 labs101 ! 3100.2.22.1 CYCL_ST ! 3100.2.20.1 7805250631 ! 1.2.1.8 <A>007805250631</A><B>Mobility</B><P>-10.00</P><P1>0.00</P1><Q></Q><S>Inactive</S><Z></Z><PA></PA><SA></SA><SB></SB><SP>-10.00</SP><SQ>0.00</SQ><MS>000007805250631</MS> ! 3100.2.984.45 4 !
! TICKET NBR : 4 ! GSI : 101 ! 1.2.1.5 04/03/2010 12:33:33 ! 3100.2.104.4 acc_22640827031004 ! 449032008 labs101 ! 3100.2.22.1 CYCL_ST ! 3100.2.20.1 2898381018 ! 1.2.1.8 <A>00092898381018</A><B>Mobility</B><P>10.00</P><P1>0.50</P1><Q></Q><S>Inactive</S><Z></Z><PA></PA><SA></SA><SB></SB><SP>10.00</SP><SQ>0.50</SQ><MS>302612898381018</MS> ! 3100.2.984.45 4 !


Last edited by vgersh99; 03-30-2010 at 05:36 PM.. Reason: code tags, PLEASE!
# 2  
Old 03-30-2010
Not simple, but interesting. I want to proceed in different stages:
Here the first to cut the line with '!' delimiter and display the fields.
It's more readable for the next stage and that will give you an idea for the how-to (I'v put your data in 'infile') :
Code:
#!/bin/bash
OLDIFS=$IFS; IFS='!'
while read -a L
do
    for ((i=1; i<${#L[@]}; i++))
    do
        echo "$i - ${L[$i]}"
    done
    echo "------------------"
done < infile

See what's the result, it's a beginning...
# 3  
Old 03-30-2010
something to start with.
nawk -f jay.awk myFile

jay.awk:
Code:
BEGIN {
  FS="</*[^>][^>]*>"
  OFS="-"
}
{
  for(i=2;i<=NF; i+=2)
    printf("%s%c", $i?$i:" ", (i==NF-1)?ORS:OFS)
}

# 4  
Old 03-30-2010
Should this do what's wanted ?
Code:
#!/bin/bash
OLDIFS=$IFS; IFS='!'
while read -a L
do
    LINE=""
    for ((i=1; i<$((${#L[@]}-2)); i++))
    do
        if ((i<3))
        then    LINE+="$(echo "${L[$i]}" | cut -d':' -f2)- "
        else    LINE+="$(echo "${L[$i]}" | cut -d' ' -f3-)- "
        fi
    done
    echo "Part 1 : $LINE"
    echo -n "Part 2 : "
    echo ${L[8]} | cut -d' ' -f3- | sed -e 's/<[^<]*><[^<]*>/ - /g' -e 's/<[^<]*>//g'
    echo "------------------"
done < infile
IFS=$OLDIFS

Result:
Code:
Part 1 :  1 -  101 - 04/03/2010 14:08:24 - acc_22640827031004 - labs101 - CYCL_ST - 2898381018 - 
Part 2 : 00092898381018 - Mobility - 10.00 - 0.50 -  - Active -  -  -  -  - 10.00 - 0.50 - 302612898381018  
------------------
Part 1 :  2 -  101 - 04/03/2010 15:59:59 - acc_1234567890 - labs101 - CYCL_ST - 7805250631 - 
Part 2 : 007805250631 - Mobility - 0.00 - 0.00 -  - Active -  -  -  -  - 0.00 - 0.00 - 000007805250631 
------------------
Part 1 :  3 -  101 - 04/03/2010 15:26:22 - acc_1234567890 - labs101 - CYCL_ST - 7805250631 - 
Part 2 : 007805250631 - Mobility - -10.00 - 0.00 -  - Inactive -  -  -  -  - -10.00 - 0.00 - 000007805250631 
------------------
Part 1 :  4 -  101 - 04/03/2010 12:33:33 - acc_22640827031004 - labs101 - CYCL_ST - 2898381018 - 
Part 2 : 00092898381018 - Mobility - 10.00 - 0.50 -  - Inactive -  -  -  -  - 10.00 - 0.50 - 302612898381018 
------------------

# 5  
Old 03-30-2010
Hello, jaygimini:

No offense, but you were a bit lazy with your help request. There are five lines of sample input and not a single complete example of desired output. It would have been much better if you had, for each input sample, provided the corresponding sample output exactly as it is desired. At least, I, personally, think it would have been very helpful. (For example, I'm not sure if the "-" in the output is desired in the actual output or if you merely used it in the example to make it more readable.)

For the first part of the problem, you state that you only want the data in the exclamation point-delimited field when it follows a word of the form x.x.x.x? Are fields 1 and 2 an exception to that rule, because 4 and 101 are not preceded by an x.x.x.x word?

Unlike your first line of sample data, the four lines at the end of your post have an additional exclamation point-delimited field. Should it be treated like the first part of the line, if it begins with a x.x.x.x word, include it in the output?

Any other special cases that may have been overlooked?

Regards,
Alister

---------- Post updated at 06:27 PM ---------- Previous update was at 05:18 PM ----------

In case it's of any help:
Code:
sed 's/! TICKET NBR : //; s/! GSI : //; s/! \([0-9]\{1,\}\.\)\{3\}[0-9]\{1,\} *\([^!]*\)/\2/g;
s/ *!\( *[^ ]*\)\{0,2\}//g; s/<\([^>]*\)>\([^<]*\)<\/\1> */\2 /g' data

This solution assumes that every exclamation point that appears in the line serves as a delimiter.

Test run on a data file which consists of 5 lines, the same 5 lines you provided in your original post, in the same order as they occurred in your post (the one line near the beginning of it and the four at the end). I originally included the contents of the data file, but they broke the forum layout at my current display resolution. Speaking of being lazy, I suppose I could've attached it Smilie
Code:
4 101 04/03/2010 12:33:33 acc_22640827031004 CYCL_ST 2898381018 00092898381018 Mobility 10.00 0.50  Inactive     10.00 0.50 302612898381018 
1 101 04/03/2010 14:08:24 acc_22640827031004 CYCL_ST 2898381018 00092898381018 Mobility 10.00 0.50  Active     10.00 0.50 302612898381018 4
2 101 04/03/2010 15:59:59 acc_1234567890 CYCL_ST 7805250631 007805250631 Mobility 0.00 0.00  Active     0.00 0.00 000007805250631 4
3 101 04/03/2010 15:26:22 acc_1234567890 CYCL_ST 7805250631 007805250631 Mobility -10.00 0.00  Inactive     -10.00 0.00 000007805250631 4
4 101 04/03/2010 12:33:33 acc_22640827031004 CYCL_ST 2898381018 00092898381018 Mobility 10.00 0.50  Inactive     10.00 0.50 302612898381018 4

Alister

Last edited by alister; 03-30-2010 at 08:07 PM..
# 6  
Old 03-31-2010
Awesome, thanks a lot for the helps.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing Bulk Data

Hi All, :D Actullay I am looking for a smart way :b: to parse files in a directory whose count is around 2000000 :eek: in a single day. Find is working with me but taking a lot of times :confused:, sometimes even a day which is not helping me.:wall: So anyone can help me know a smart... (5 Replies)
Discussion started by: jojo123
5 Replies

2. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Hi folks I have a script I wrote that basically parses a bunch of config and xml files works out were to add in the new content then spits out the data into a new file. It all works - apart from the xml and config file format in the new file with XML files the original XML (that ends up in... (2 Replies)
Discussion started by: dfinch
2 Replies

3. Shell Programming and Scripting

Parsing file data

Hey Guys, I'm a novice at shell scripts and i need some help parsing file data. Basically, I want to write a script that retrieves URLs. Here is what I have so far. #!/bin/bash echo "Please enter start date (format: yyyy-mm-dd):\c" read STARTDATE echo "Please enter end date... (7 Replies)
Discussion started by: silverdust
7 Replies

4. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Experts , Below is the data: --- Physical volumes --- PV Name /dev/dsk/c1t2d0 VG Name /dev/vg00 PV Status available Allocatable yes VGDA 2 Cur LV 8 PE Size (Mbytes) 8 Total PE 4350 Free PE 2036 Allocated PE 2314 Stale PE 0 IO Timeout (Seconds) default --- Physical volumes ---... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

Data parsing

Hi, I do have a data file which is divided into compartments by ---------. I would like to extract (parse) some of the data and numbers either using awk or sed The file has the format: CATGC Best GO enrichment: Genes/ORF that have the motifs (genes are sorted by max(pa+pd+po)): ... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

6. Shell Programming and Scripting

Help in Parsing data

I have below string Transaction_ID:SDP-DM-151204679 , Transaction_DateTime:2011-02-11 00:00:15 GMT+05:30 , Transaction_Mode:WAP , Circle_ID:4 , Circle_Name:BJ ,Zone: , CustomerID:B_31563486 , MSISDN:7870904329 , IMSI:405876122068099 , IMEI: , Sub_Profile:Pre-Paid , CPID:Nazara , CPNAME:Nazara ,... (6 Replies)
Discussion started by: poweroflinux
6 Replies

7. Shell Programming and Scripting

Parsing the data

Hi friends, I need to parse the following data in the given format and get the desired output. I need a function, which takes the input as a parameter and the desired output will be returned from the function. INPUT(single parameter as complete string) A;BCF;DFG;FD ... (3 Replies)
Discussion started by: sumesh.1988
3 Replies

8. Shell Programming and Scripting

More efficent Data Parsing

I am looking for a way to parse out some numbers from text. This is an excerpt from a larger script that I am trying to make run a little smoother. Specifically this script is used to Capture DV video streams on a linux machine from the terminal. The setup does 6 streams at once, and this part... (3 Replies)
Discussion started by: Starcast
3 Replies

9. Shell Programming and Scripting

Parsing the data

Hi I need to parse the following data using shell script Table ----- stage4n_abc 48 stage4o_abcd 4 adashpg_abc_HeartBeat 1 stage4l_asc 168 Can anyone gimme the solution. I want each value to get stored in an array or variable and want the value to be greped from another file.... (1 Reply)
Discussion started by: Archana.Dheepan
1 Replies

10. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies
Login or Register to Ask a Question