Difficult transposing of data from profiles blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Difficult transposing of data from profiles blocks
# 1  
Old 09-11-2012
Difficult transposing of data from profiles blocks

Hello to all,

I really hope some expert or awk guru could help me with this. I don't have how to begin and hope is not so difficult for somebody. I'll expecting how someone could resolve this problem I have to parse this.

I have blocks of parameters for each MSISDN and I would like to extract them and put for each MSISDN all the parameters in a single line with different columns.

- Each block begins with <HGSDP and ends with END.
- I need the value below NAM
- I need the values below "PERMANENT SUBSCRIBER DATA" where in some blocks appear more parameters than within others.
- I need the values below "VLR ADDRESS" and below "SGSN NUMBER" if its present.

The input is:
Code:
<HGSDP:MSISDN=99946129512,ALL;
HLR SUBSCRIBER DATA

SUBSCRIBER IDENTITY
MSISDN           IMSI             STATE          AUTHD
99946129512      123450016625444  CONNECTED      AVAILABLE

NAM
0

PERMANENT SUBSCRIBER DATA
SUD
CAT-10       DBSG-1       TSMO-0       STYPE-13
BS26-1       TS11-1       TS21-1       TS22-1
SOSDCF-7     HOLD-1       MPTY-1       CLIP-1
SCHAR-0-0

AMSISDN            BS       BC
NONE   

LOCATION DATA
VLR ADDRESS       MSRN            MSC NUMBER          LMSID
4-99945269537                     99945269537        

SGSN NUMBER
UNKNOWN

PACKET DATA PROTOCOL CONTEXT DATA
APNID   PDPADD                EQOSID  VPAA  PDPCH    PDPTY  PDPID
    1                          537    YES            IPV4    1
    2                           24    YES            IPV4    2
    3                           23    YES            IPV4    3
                               537    YES            IPV4   10

END
<HGSDP:MSISDN=99949091700,ALL;
HLR SUBSCRIBER DATA

SUBSCRIBER IDENTITY
MSISDN           IMSI             STATE          AUTHD
99949091700      123450011753067  CONNECTED      AVAILABLE

NAM
1

PERMANENT SUBSCRIBER DATA
SUD
CAT-15       DBSG-4       STYPE-21     BS26-1
TS11-1       TS21-1       TS22-3       BS3G-1
SOSDCF-5     HOLD-1       SCPU-3       CLIP-1

AMSISDN            BS       BC
NONE   

LOCATION DATA
VLR ADDRESS       MSRN            MSC NUMBER          LMSID
4-99945269547                     99945269547        
PURGED

PACKET DATA PROTOCOL CONTEXT DATA
NONE

END
<HGSDP:MSISDN=99948748519,ALL;
HLR SUBSCRIBER DATA

SUBSCRIBER IDENTITY
MSISDN           IMSI             STATE          AUTHD
99948748519      123450019654001  CONNECTED      AVAILABLE

NAM
0

PERMANENT SUBSCRIBER DATA
SUD
BS26-1       TS11-1       TS21-1       TS22-1
PTR-78       CLIP-1       SCHAR-2-4

AMSISDN            BS       BC
NONE   

LOCATION DATA
VLR ADDRESS       MSRN            MSC NUMBER          LMSID
4-99945269545                     99945269545        

SGSN NUMBER
4-99945269564    

PACKET DATA PROTOCOL CONTEXT DATA
APNID   PDPADD                EQOSID  VPAA  PDPCH    PDPTY  PDPID
    1                          537    YES            IPV4    1
    2                           24    YES            IPV4    2
                               537    YES            IPV4   10

END

and the output desired is:
Code:
MSISDN,IMSI,STATE,NAM,BS26,BS3G,CAT,CLIP,DBSG,HOLD,MPTY,PTR,SCHAR,SCPU,SOSDCF,STYPE,TS11,TS21,TS22,TSMO,VLR ADDRESS,STATE,SGSN NUMBER,STATE
99946129512,123450016625444,CONNECTED,0,1,,10,1,1,1,1,,0-0,,7,13,1,1,1,0,4-99945269537,,UNKNOWN,
99949091700,123450011753067,CONNECTED,1,1,1,15,1,4,1,,,,3,5,21,1,1,3,,4-99945269547,PURGED,,
99948748519,123450019654001,CONNECTED,0,1,,,1,,,,78,2-4,,,,1,1,1,,4-99945269545,,4-99945269564,

In the output all the parameters are as headers ordered alphabetically, and if some value is not present in some block, should appear empty in the corresponding column.

Many thanks in advance for help on this guys.
# 2  
Old 09-11-2012
How about this:

Code:
awk '
    FNR!=NR&&FNR==1 {
        printf "MSISDN,IMSI,STATE,NAM"
        for(i in tags) printf ",%s", i;
        printf ",VLR ADDRESS,MSC NUMBER,SGSN NUMBER\n"
    }
    /^<HGSDP/ {split("",psd);split("",ld);split("",si);nam="";vlr="";}
    $1=="NAM" {nam=$2}
    $1=="SGSN NUMBER" {sgsn=$2}
    /^SUBSCRIBER IDENTITY/ { split($3,si," "); }
    /^LOCATION DATA/ { split($3,ld," "); }
    /^PERMANENT SUBSCRIBER DATA/ {
        v=split($0,vals," ");
        for(i=3;i<=v;i++) {
           p=index(vals[i],"-")
           if(p) psd[substr(vals[i],1,p-1)]=substr(vals[i],p+1)
           if(p&&FNR==NR) tags[substr(vals[i],1,p-1)];
        }
    }
    /^END/ && FNR!=NR{
        printf "%s,%s,%s,%s",si[1],si[2],si[3],nam
        for(i in tags) printf ",%s", psd[i]
        printf ",%s,%s,%s\n", ld[1], ld[2], sgsn}
    ' FS="\n" RS="" infile infile

# 3  
Old 09-11-2012
Hello Chubler_XL,

That's so great, it's pretty close. Only it seems that when some parameter is not present
is putting the previous value in memory for that parameter and should leaving it in blank if it not present.

For example you could see in the output of your script that the 2nd line has values of the 1rst one and the
3rd line has values of the 2nd line.

PD: I forgot to say that as in my sample sometimes below the VLR ADDRESS or SGSN ADDRESS appear the number and below the word PURGED.
So, if PURGED appear for VLR ADDRESS or SGSN ADDRESS or both, I would like to have it mentioned in the output in one/two more columnn
like in the sample output that says STATE next to VLR ADDRESS or SGSN ADDRESS .

Thanks in advance for such help.

Last edited by Ophiuchus; 09-11-2012 at 02:57 AM..
# 4  
Old 09-11-2012
How about this update:

Code:
awk '
    FNR!=NR&&FNR==1 {
        printf "MSISDN,IMSI,STATE,NAM"
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for(i in tags) printf ",%s", i;
        printf ",VLR ADDRESS,STATE,SGSN NUMBER,STATE\n"
        split("",psd);split("",ld);split("",si);nam="";vs="";ss="";sgsn=""
    }
    $1=="NAM" {nam=$2}
    $1=="SGSN NUMBER" {gsub(/ *$/,"",$2);sgsn=$2;ss=$3}
    /^SUBSCRIBER IDENTITY/ { split($3,si," ")}
    /^LOCATION DATA/ { split($3,ld," "); vs=$4}
    /^PERMANENT SUBSCRIBER DATA/ {
        v=split($0,vals," ");
        for(i=3;i<=v;i++) {
           p=index(vals[i],"-")
           if(p) psd[substr(vals[i],1,p-1)]=substr(vals[i],p+1)
           if(p&&FNR==NR) tags[substr(vals[i],1,p-1)];
        }
    }
    /^END/ && FNR!=NR{
        printf "%s,%s,%s,%s",si[1],si[2],si[3],nam
        for(i in tags) printf ",%s", psd[i]
        printf ",%s,%s,%s,%s\n", ld[1], vs, sgsn, ss
        split("",psd);split("",ld);split("",si);nam="";vs="";ss="";sgsn=""}
    ' FS="\n" RS="" infile infile

Edit:
The line with PROCINFO is to sort columns by name
this will only work with gnu awk;
other awk implementations will have unsorted columns.

Last edited by Chubler_XL; 09-11-2012 at 03:41 PM..
These 2 Users Gave Thanks to Chubler_XL For This Post:
# 5  
Old 09-11-2012
Wow, a great effort chuber..it works great...and Ophiuchus :good effort in explaining the requirement too... Smilie
# 6  
Old 09-11-2012
Hello Chubler_XL,

Jajaja I only can be surprised how it works just just just perfect! You're really good with this!.

I hope you can help me in one more thing.

I would like to print the values below "PACKET DATA PROTOCOL CONTEXT DATA", printing 3 more
fields (APNID,EQOSID,PDPID) shown in blue at the end of the current output in the format below.

Code:
MSISDN,IMSI,STATE,NAM,BS26,BS3G,CAT,CLIP,DBSG,HOLD,MPTY,PTR,SCHAR,SCPU,SOSDCF,STYPE,TS11,TS21,TS22,TSMO,VLR ADDRESS,STATE,SGSN NUMBER,STATE,APNID,EQOSID,PDPID
99946129512,123450016625444,CONNECTED,0,1,,10,1,1,1,1,,0-0,,7,13,1,1,1,0,4-99945269537,,UNKNOWN,,1|2|3,537|24|23|537,1|2|3|10
99949091700,123450011753067,CONNECTED,1,1,1,15,1,4,1,,,,3,5,21,1,1,3,,4-99945269547,PURGED,,,,,
99948748519,123450019654001,CONNECTED,0,1,,,1,,,,78,02-abr,,,,1,1,1,,4-99945269545,,4-99945269564,,1|2,537|24|537,1|2|10

I mean, put the APNID, EQOSID and PSPID values in the last 3 fields, but if there are more than one APNID (like is the usual)
then separate them with "|" within the same field.

For example, for the first block, there are 3 APNIDs, 4 EQOSIDs and 4 PDPIDs, so the 3 last fields for the first block (first line) is:

Code:
APNID,EQOSID,PDPID
1|2|3,537|24|23|537,1|2|3|10

I hope make sense.

Thanks in advance again for your great help!
# 7  
Old 09-11-2012
Wow, getting a little more complex now...Try this:

Code:
awk '
    function addval(cur, line, start, len) {
       if(!start) return cur;
       val=substr(line,start,len)
       gsub(/ /,"",val)
       if(val=="") return cur;
       return (length(cur)?cur"|":"")val
    }
    FNR!=NR&&FNR==1 {
        printf "MSISDN,IMSI,STATE,NAM"
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for(i in tags) printf ",%s", i;
        printf ",VLR ADDRESS,STATE,SGSN NUMBER,STATE,APNID,EQOSID,PDPID\n"
        split("",psd);split("",ld);split("",si);nam="";vs="";ss="";sgsn=""
        apnid="";eqosid="";pdpid="";
    }
    $1=="NAM" {nam=$2}
    $1=="SGSN NUMBER" {gsub(/ *$/,"",$2);sgsn=$2;ss=$3}
    /^SUBSCRIBER IDENTITY/ { split($3,si," ")}
    /^LOCATION DATA/ { split($3,ld," "); vs=$4}
    /^PERMANENT SUBSCRIBER DATA/ {
        v=split($0,vals," ");
        for(i=3;i<=v;i++) {
           p=index(vals[i],"-")
           if(p) psd[substr(vals[i],1,p-1)]=substr(vals[i],p+1)
           if(p&&FNR==NR) tags[substr(vals[i],1,p-1)];
        }
    }
    $1~"PACKET DATA PROTOCOL" {
        apst=index($2,"APNID")
        eqst=index($2,"EQOSID")
        pdst=index($2,"PDPID")
        for(i=3;i<=NF;i++) {
            apnid=addval(apnid,$i,apst,5)
            eqosid=addval(eqosid,$i,eqst,6)
            pdpid=addval(pdpid,$i,pdst,5)
        }
    }
    /^END/ && FNR!=NR{
        printf "%s,%s,%s,%s",si[1],si[2],si[3],nam
        for(i in tags) printf ",%s", psd[i]
        printf ",%s,%s,%s,%s,%s,%s,%s\n", ld[1], vs, sgsn, ss, apnid, eqosid, pdpid
        apnid="";eqosid="";pdpid="";
        split("",psd);split("",ld);split("",si);nam="";vs="";ss="";sgsn=""}
    ' FS="\n" RS="" infile infile

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Transposing data based on 1st column

I do have a big tab delimited file of the following format aa 344 456 aa 34 67 bb 34 90 bb 23 100 bb 1 89 d 0 12 e 45 678 e 78 90 e 56 90 .... .... .... I would like to transpose the data based on the category on column one and get the output file in the following tab delimited... (8 Replies)
Discussion started by: Kanja
8 Replies

2. UNIX for Dummies Questions & Answers

Delete data blocks based on missing combinations

Hello masters, I am filtering data based on completeness. A (Name , Group) combination in File2 is only complete when it has data for all subgroups specified in File1. All incomplete (Name , Group) combinations do not appear in the output. So for example , Name1 Group 1 in File2 is... (6 Replies)
Discussion started by: senhia83
6 Replies

3. Shell Programming and Scripting

Transposing X and Y axis of CSV data

Hello list, I have a source CSV data file as follows: PC_NAME,MS11-040,MS11-039,MS11-038,MS11-035 abc123,Not Applicable,Not Applicable,Not Applicable,Not Applicable abc987,Not Applicable,Not Applicable,Not Applicable,Not Applicable tnt999,Not Applicable,Not Applicable,Applicable,Not... (2 Replies)
Discussion started by: landossa
2 Replies

4. Shell Programming and Scripting

Extracting data blocks from file

Hi all, I want to extract blocks of data from a file depending on the contents of that block. The input file(table) has several blocks each starting with 'gene' in the first column. I want to extract only those blocks which do not have the expression '_T02' in the second column. Input file ... (3 Replies)
Discussion started by: newbie83
3 Replies

5. Shell Programming and Scripting

Help for a Perl newcomer! Transposing data from columns to rows

I have to create a Perl script which will transpose the data output from my experiment, from columns to rows, in order for me to analyse the data. I am a complete Perl novice so any help would be greatly appreciated. The data as it stands looks like this: Subject Condition Fp1 ... (12 Replies)
Discussion started by: Sarah_W
12 Replies

6. Shell Programming and Scripting

transposing square matrixs or blocks in a big file

Hi I do have a big file of the following format a b c d e f g 2 3 5 6 6 6 7 3 4 5 6 7 9 0 4 5 7 8 9 9 0 1 2 4 5 6 7 8 3 5 6 7 2 3 4 5 6 7 4 3 2 4 5 4 5 6 3 5 5 r h i j k l m 2 3 4 5 6 7 8 4 5 7 8 9 9 0 3 5 6 7 2 3 4 2 3 5 6 6 6 7 5 5 7 8 9 2 3 1 2... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

7. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

8. UNIX for Dummies Questions & Answers

Transposing data output

Hi, I've just created a shell script that produces the following output: hd1 hd3 hd9 /optnonaix/esp /optnonaix/app/oracle /u06 (564.67) (675.97) (678.90) I would like the output to be as hd1 /optnonaix/esp (564.67) hd3 /optnonaix/app/oracle (675.97) hd9 /u06 (678.90) Need some... (2 Replies)
Discussion started by: bazzabogan
2 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies

10. Shell Programming and Scripting

Delete blocks with no data..

Hi, I tried this but could not get it... here is what I need I have an xml where I get all the data in blocks but some times I get empty blocks with no data...shown below..I need to delete only those blocks with no data, I tried couple of ways but could not do it..any help is appreciated...... (1 Reply)
Discussion started by: mgirinath
1 Replies
Login or Register to Ask a Question