Difficult transposing of data from profiles blocks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Difficult transposing of data from profiles blocks
# 8  
Old 09-12-2012
Hello Chubler_XL,

I cannot say anything more than Awesome! what do you eat? jaja

It can become a little bit complex extracting 5 more parameters that I've suppressed from the sample input, to have it all completely mapped I can show you if you want to increase the challenge Smilie if not is ok, is much much more that I thought to get in help.

Please if you have a little chance, please may you explain the parts of the script that you think must be more difficult to understand for newbies like me Smilie?

For example,

Why you use the input file twice?
How is the logic of the script?
How you separate and store for each block each parameter? and things like that.

I hope you could explain a little bit, your solution is a complete class/treaty of awk for me and for many for sure!

Thanks again for help other and your great contributions for many people.

---------- Post updated at 05:33 PM ---------- Previous update was at 01:32 AM ----------

Hello again Chubler_XL,

In addition to what I've mentioned in my previous post, I've tested in real
files and works fine for files that have up to 1 million of lines (it took 3:58 min to process).

But when I used an input file that have 4,252,871 lines, the processing took 1:31 hours SmilieSmilie

I'm not sure why the time in processing grew up in such manner.

I don't know if there is something that could be done in order to speed up a little more the
processing of big files like that of 4 million lines.

Below the times for different files:

Code:
With 100,000 lines (3 seconds)
$ . Script.sh 
Wed, Sep 12, 2012  1:13:43 PM
Wed, Sep 12, 2012  1:13:46 PM

With 500,000 lines(52 seconds)
$ . Script.sh
Wed, Sep 12, 2012  1:15:30 PM
Wed, Sep 12, 2012  1:16:22 PM

With 1,000,000 lines(3:58 minutes)
$ . Script.sh
Wed, Sep 12, 2012  1:27:37 PM
Wed, Sep 12, 2012  1:31:35 PM

With 4,252,871 lines(1:31 hours)
$ . Script.sh
Wed, Sep 12, 2012  1:38:32 PM
Wed, Sep 12, 2012  3:10:24 PM

Thanks for your help again.

Regards
# 9  
Old 09-13-2012
OK I'll start with performance, It's probably related GNU array sorting try removing the PROCINFO["sorted_in"] = "@ind_str_asc" line and process a mid-sized file.

If it's a lot quicker we can try turning off array sorting until the processing is done.

OK now for a quick description on what this script is doing.

I'll start with the last line first:
Code:
' FS="\n" RS="" infile infile


As you said the file is processed twice and the reason for this is the heading line. This appears as the first output line and it must know at this point all the heading strings that the document uses. We could read the whole document into memory then output the headings and the data in the END block but this would be a poor solution for big input files.



Pass 1 - Build tags array

To simplify code pass 1 does pretty much the same work as pass 2, most of this is unnecessary and could be skipped by adding FNR!=NR to all the conditions except for the "PERMANENT SUBSCRIBER DATA" (as this is where tags is built).



FS=”\n” RS=””

RS is record separator and FS is field separator. When a blank RS is used a blank line operates as the record separator, so for:

Code:
SUBSCRIBER IDENTITY
MSISDN           IMSI             STATE          AUTHD
99949091700      123450011753067  CONNECTED      AVAILABLE
 
NAM
1

Awk passes 1 record with field 1 as “SUBSCRIBER IDENTITY” and field 2 as “MSISDN IMSI STATE AUTHD”, and field 3 with the data.
Then a second record appears with field 1 as "NAM" and field 2 as "1".

As you can see awk has done a lot of the work for us now as each part of the document comes in as an individual record.
# 10  
Old 09-14-2012
Hello Chubler_XL,

Many thanks for your explanation. I'll study it in depth seeing your code.

Regarding the performance tests removing the line PROCINFO["sorted_in"] = "@ind_str_asc". The time was better with an input file of 1 million lines, 1:20 minutes,
but with an input of 4 millions of lines, the script stucked again. After 30 minutes
any output was shown Smilie.

I'm not sure why with a file 4 times bigger (4 million line) than the other (1 million lines), the processing decrease in such manner.

Thanks in advance for your great help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Transposing data based on 1st column

I do have a big tab delimited file of the following format aa 344 456 aa 34 67 bb 34 90 bb 23 100 bb 1 89 d 0 12 e 45 678 e 78 90 e 56 90 .... .... .... I would like to transpose the data based on the category on column one and get the output file in the following tab delimited... (8 Replies)
Discussion started by: Kanja
8 Replies

2. UNIX for Dummies Questions & Answers

Delete data blocks based on missing combinations

Hello masters, I am filtering data based on completeness. A (Name , Group) combination in File2 is only complete when it has data for all subgroups specified in File1. All incomplete (Name , Group) combinations do not appear in the output. So for example , Name1 Group 1 in File2 is... (6 Replies)
Discussion started by: senhia83
6 Replies

3. Shell Programming and Scripting

Transposing X and Y axis of CSV data

Hello list, I have a source CSV data file as follows: PC_NAME,MS11-040,MS11-039,MS11-038,MS11-035 abc123,Not Applicable,Not Applicable,Not Applicable,Not Applicable abc987,Not Applicable,Not Applicable,Not Applicable,Not Applicable tnt999,Not Applicable,Not Applicable,Applicable,Not... (2 Replies)
Discussion started by: landossa
2 Replies

4. Shell Programming and Scripting

Extracting data blocks from file

Hi all, I want to extract blocks of data from a file depending on the contents of that block. The input file(table) has several blocks each starting with 'gene' in the first column. I want to extract only those blocks which do not have the expression '_T02' in the second column. Input file ... (3 Replies)
Discussion started by: newbie83
3 Replies

5. Shell Programming and Scripting

Help for a Perl newcomer! Transposing data from columns to rows

I have to create a Perl script which will transpose the data output from my experiment, from columns to rows, in order for me to analyse the data. I am a complete Perl novice so any help would be greatly appreciated. The data as it stands looks like this: Subject Condition Fp1 ... (12 Replies)
Discussion started by: Sarah_W
12 Replies

6. Shell Programming and Scripting

transposing square matrixs or blocks in a big file

Hi I do have a big file of the following format a b c d e f g 2 3 5 6 6 6 7 3 4 5 6 7 9 0 4 5 7 8 9 9 0 1 2 4 5 6 7 8 3 5 6 7 2 3 4 5 6 7 4 3 2 4 5 4 5 6 3 5 5 r h i j k l m 2 3 4 5 6 7 8 4 5 7 8 9 9 0 3 5 6 7 2 3 4 2 3 5 6 6 6 7 5 5 7 8 9 2 3 1 2... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

7. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

8. UNIX for Dummies Questions & Answers

Transposing data output

Hi, I've just created a shell script that produces the following output: hd1 hd3 hd9 /optnonaix/esp /optnonaix/app/oracle /u06 (564.67) (675.97) (678.90) I would like the output to be as hd1 /optnonaix/esp (564.67) hd3 /optnonaix/app/oracle (675.97) hd9 /u06 (678.90) Need some... (2 Replies)
Discussion started by: bazzabogan
2 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies

10. Shell Programming and Scripting

Delete blocks with no data..

Hi, I tried this but could not get it... here is what I need I have an xml where I get all the data in blocks but some times I get empty blocks with no data...shown below..I need to delete only those blocks with no data, I tried couple of ways but could not do it..any help is appreciated...... (1 Reply)
Discussion started by: mgirinath
1 Replies
Login or Register to Ask a Question