Data manipulation, Please help..


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data manipulation, Please help..
# 1  
Old 03-24-2016
Data manipulation, Please help..

Hello,

I have a huge set of data that needs to be reformatted.
Here is a simple example to explain the process.
I have number n=5 and a input with many numbers separated with comma:

Code:
0.49876577,-0.03160753,0.60992502,0.00361167,0.01401017,0.54274066,-0.04881392,-0.01454749,0.02569629,0.04952051,-0.01458124,-0.14468019,0.13155655,0.01912278,0.15125587

after the process, the output looks like:
Code:
   1          0.49876577
   2         -0.03160753          0.60992502
   3          0.00361167          0.01401017          0.54274066
   4         -0.04881392         -0.01454749          0.02569629          0.04952051
   5         -0.01458124         -0.14468019          0.13155655          0.01912278
   5          0.15125587

That starts from 1 and end with 5 (max n) with maximum 4 numbers in one line. So, if the n >4, that means we need more lines to print all numbers relating to n. Here, 5 is 4+1; if n goes up to 9, that will be three lines with 4+ 4+1. all these three lines begin with number n and followed by the numbers from the input.

If n is small, we could do it simply by hand. However, if n goes up to 100, it's a nightmare to manipulate.

Thanks so much for your kind help!

Zhen

Here is a link for an input with n=45 and the corresponding output
Code:
http://s000.tinyupload.com/?file_id=83062136554540326051
http://s000.tinyupload.com/?file_id=26820665341591171842

# 2  
Old 03-24-2016
Any attempts/ideas/thoughts from your side?
# 3  
Old 03-24-2016
Quote:
Originally Posted by RudiC
Any attempts/ideas/thoughts from your side?
Hi RudiC, thanks for reply.
I think it could be done by a loop combining with awk.
However, I'm not a skillful user with awk.
I have tried for several hours without any progress. The difficulty comes from the limit of 4 numbers when n> 4. I cannot figure out any clue on this. Probably, FORTRAN or other advanced language is a better choice to do this.
Zhen
# 4  
Old 03-24-2016
Hello liuzhencc,

Could you please try following and let me know if this helps you. Let's say we have following Input_file where I have added few lines to test it better.
Code:
 cat Input_file
 .49876577,-0.03160753,0.60992502,0.00361167,0.01401017,0.54274066,-0.04881392,-0.01454749,0.02569629,0.04952051,-0.01458124,-0.14468019,0.13155655,0.01912278,0.15125587,.49876577,-0.03160753,0.60992502,0.00361167

Then following is the code for same.
Code:
awk -vn=5 -F"," '{num=split($0, A,",");for(i=1;i<=num;i++){for(k=1;k<=i;k++){q++;Q=Q?Q OFS A[q]:A[q]};if(Q){i=i>n?n:i;print i OFS Q;Q=""}}}'   Input_file

Output will be as follows.
Code:
1 .49876577
2 -0.03160753 0.60992502
3 0.00361167 0.01401017 0.54274066
4 -0.04881392 -0.01454749 0.02569629 0.04952051
5 -0.01458124 -0.14468019 0.13155655 0.01912278 0.15125587
5 .49876577 -0.03160753 0.60992502 0.00361167

If above doesn't meet your requirements completely then please post more meaningful Input_file with more specific requirement details on same too.
EDIT: Adding a non-one liner form of solution as follows now.
Code:
awk -vn=5 -F"," '{
                        num=split($0, A,",");
                                                for(i=1;i<=num;i++){
                                                                        for(k=1;k<=i;k++){
                                                                                                q++;
                                                                                                Q=Q?Q OFS A[q]:A[q]
                                                                                         };
                                                                        if(Q)            {
                                                                                                i=i>n?n:i;
                                                                                                print i OFS Q;
                                                                                                Q=""
                                                                                         }
                                                                   }
                 }
                '   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 03-24-2016 at 07:57 AM.. Reason: Added a little edited Input_file from me. Added a non-one liner form of solution now.
# 5  
Old 03-24-2016
Quote:
Originally Posted by RavinderSingh13
Hello liuzhencc,

Could you please try following and let me know if this helps you.
Code:
awk -vn=5 -F"," '{num=split($0, A,",");for(i=1;i<=num;i++){for(k=1;k<=i;k++){q++;Q=Q?Q OFS A[q]:A[q]};if(Q){i=i>n?n:i;print i OFS Q;Q=""}}}'   Input_file

Output will be as follows.
Code:
1 .49876577
2 -0.03160753 0.60992502
3 0.00361167 0.01401017 0.54274066
4 -0.04881392 -0.01454749 0.02569629 0.04952051
5 -0.01458124 -0.14468019 0.13155655 0.01912278 0.15125587.49876577
5 -0.03160753 0.60992502 0.00361167

If above doesn't meet your requirements completely then please post more meaningful Input_file with more specific requirement details on same too.

Thanks,
R. Singh
Hi Singh, thanks for the reply.
It does the job partially because the maximum numbers in one line must be limited to 4. That means the n=1 with 1 number; n=2 with 2; n=3 with 3; n=4 with 4; when n > 4, there will be multiple line printed with 4 numbers in one line. For example, if n=6, that prints "6 x x x x";next line "6 x x".
I posted a large input and output here:
TinyUpload.com - best file hosting solution, with no limits, totaly free TinyUpload.com - best file hosting solution, with no limits, totaly free it's too many numbers to paste here.
Thanks again for your help!
ZHen
# 6  
Old 03-24-2016
I don't get the correlation between the n number, the field count per line, the number printed in front of every line, and the count of input elements. Please supply the logics.
# 7  
Old 03-24-2016
Quote:
Originally Posted by RudiC
I don't get the correlation between the n number, the field count per line, the number printed in front of every line, and the count of input elements. Please supply the logics.
Sorry for my English.
The logic is not complex. if we have a number n, the we have 1+2+3+...+n numbers in the input file separated by comma. the output is formatted as follows if n =10,
Code:
1 num1
2 num2 num3
3 num4 num5 num6
4 num7 num8 num9 num10
5 num11 num12 num13 num14
5 num15
6 num16 num17 num18 num19
6 num20 num21
7 num22 num23 num24 num25
7 num26 num27 num28
8 num29 num30 num31 num32
8 num33 num34 num35 num36
9 num37 num38 num39 num40
9 num41 num42 num43 num44
9 num45
10 num46 num47 num48 num49
10 num50 num51 num52 num53
10 num54 num55

because we have 1+2+3+4+5+6+7+8+9+10 == 55 numbers in the input.

input if n=10,
Code:
0.49876577,-0.03160753,0.60992502,0.00361167,0.01401017,0.54274066,-0.04881392,-0.01454749,0.02569629,0.04952051,-0.01458124,-0.14468019,0.13155655,0.01912278,0.15125587,0.02470641,0.13032155,-0.21400060,-0.02522136,-0.14202082,0.22603974,-0.10693609,0.10741174,0.02171361,0.00169569,-0.00450758,-0.00200352,0.11393663,0.11590056,-0.26243613,-0.04396195,0.00711393,-0.01215909,-0.00447005,-0.12119063,0.27560924,0.02263298,-0.04249325,-0.05785220,-0.01299558,0.02103291,0.00718894,-0.02543994,0.04643664,0.05459977,-0.09308653,-0.04780408,-0.08850503,0.00170852,0.00263105,0.00352394,0.00753284,0.00506342,0.01332940,0.09615459

output if n =10
Code:
   1          0.49876577
   2         -0.03160753          0.60992502
   3          0.00361167          0.01401017          0.54274066
   4         -0.04881392         -0.01454749          0.02569629          0.04952051
   5         -0.01458124         -0.14468019          0.13155655          0.01912278
   5          0.15125587
   6          0.02470641          0.13032155         -0.21400060         -0.02522136
   6         -0.14202082          0.22603974
   7         -0.10693609          0.10741174          0.02171361          0.00169569
   7         -0.00450758         -0.00200352          0.11393663
   8          0.11590056         -0.26243613         -0.04396195          0.00711393
   8         -0.01215909         -0.00447005         -0.12119063          0.27560924
   9          0.02263298         -0.04249325         -0.05785220         -0.01299558
   9          0.02103291          0.00718894         -0.02543994          0.04643664
   9          0.05459977
  10         -0.09308653         -0.04780408         -0.08850503          0.00170852
  10          0.00263105          0.00352394          0.00753284          0.00506342
  10          0.01332940          0.09615459

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[Solved] Data manipulation

Hallo Team, I need your help. I have a file that has two colums. See sample below: 105550 0.28 105550 0.24 125550 0.28 125550 0.24 215650 0.28 215650 0.24 315550 0.28 315550 0.24 335550 0.28 335550 0.24 40555 0.21 40555 0.17 415550 0.21 415550 0.17 43555 0.21 43555 0.17 (5 Replies)
Discussion started by: kekanap
5 Replies

2. UNIX for Dummies Questions & Answers

Data Manipulation

Dear Sir, I have file input RGR001|108.28|-2.86489|100-120|RANGGAR RGR002|108.071|-2.69028|80-100|RANNGAR RGR003|108.168|-2.97053|50-80|RANNGAR RGR007|108.192722222|-2.766138889|0-50|RANGGARI want to create files by joining each rows with each rows below Output as below ... (4 Replies)
Discussion started by: radius
4 Replies

3. UNIX for Dummies Questions & Answers

Data manipulation

Hallo Team, I need to manipulate existing data file. Have a look at current data and expected data: Current Data: 27873517141 27873540000 27873515109 27873517140 27873540001 27873540000 27873501343 27873540000 27873517140 27873511292 27873645989 27873540000 27873540000... (7 Replies)
Discussion started by: kekanap
7 Replies

4. Shell Programming and Scripting

Data manipulation using shell

Dear all I have a dataset (in text format,delimited by tab) which have 100 variables (say, var0-var99) and more than 100,000 observations. I want to do the following: 1. for variable var0-var49, I want to add "00" in front of each data (for example, "1" would become "001") 2. for variable... (8 Replies)
Discussion started by: littlewenwen
8 Replies

5. Shell Programming and Scripting

Help with data manipulation script

Y,T,,H05,6,6,0,0 -> TH05_6 D,5,BT,B -> BT_KIOSK P,KQC222 -> KQC222 G,B,2 -> BRANI_GATE_2 fileA TPM658 Y,T,,H05,6,6,0,0 TPM110 D,5,BT,B TPM136 P,KQC222 TPM180 P,BQC913 TPM575 Y,B,,T05,14,14,0,0 IPM760 G,B,2 TPM011 I need to use second column $1,$2,$3,$4..... if first char... (6 Replies)
Discussion started by: ment0smintz
6 Replies

6. UNIX for Dummies Questions & Answers

Script for data manipulation

Hi all! my first post here, so mods -- if this should ideally be in the scripts section, please move there. Thanks! I have data in the following format: key1:value1 key2:value2 key3:value3 A B C D key1:value4 key2:value5 key3:value6 A1 B1 key1: ... and so on I want an output... (2 Replies)
Discussion started by: gnat01
2 Replies

7. Shell Programming and Scripting

Data manipulation from a file

i have a file in follwing format 0110 1020 1011 1032 1020 2005 2003 1050 i want the output in such a way that all non zero numbers will be converted into 1 like this 0110 1010 1011 1011 1010 1001 1001 1010 (3 Replies)
Discussion started by: vaibhavkorde
3 Replies

8. Shell Programming and Scripting

Tricky data manipulation...

Hi everyone.. I am new here, hello.. I hope this doesn't come across to you folks as a stupid question, I'm somewhat new to scripting :) I'm seeking some help in finding a way to manipulate data output for every two characters - example: numbers.lst contains the following output:... (3 Replies)
Discussion started by: explicit
3 Replies

9. UNIX for Dummies Questions & Answers

Data Manipulation

Hello I am currently having problems in mapulating a certain file which contains vaious data. Belos is a sample content Event=<3190> Client IP=<151.111.11.143> DNS=<abc.sbc.com> TransCount=<139> Client IP=<150.222.133.163> DNS=<xyz.yuu.com> TransCount=<3734> Event=<3120> Client... (11 Replies)
Discussion started by: khestoi
11 Replies

10. UNIX for Dummies Questions & Answers

data manipulation script

I have a folder called {homedata} Within this folder there are 12 subfolders 200601.......200612 Within each subfolder there are 8 sets of files Each filename commences with A B C D E F G or H, so {filename}* can be used. I am trying to write a script which will from the top level go... (1 Reply)
Discussion started by: grinder182533
1 Replies
Login or Register to Ask a Question