Help with Data Sorting


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with Data Sorting
# 1  
Old 03-15-2011
Lightbulb Help with Data Sorting

Hi All,
I have a long list made of 4 columns containing entries such as the following example:
Code:
a    b    c    d
0    0    0    0
1    2    1    2
2    5    3    4
3    8    4    6
4    10   9    8
5    15   8    10

So the top row is the header and I need to arrange the data in a way as to verify (using 1 for present or 0 for absent) the occurrence of each entry in the entire grid. My expected output is to keep the header row, place each individual entry in a new single column and then for each of them to put 1 or 0 under a, b, c, or d columns depending on their presence or absence. The output for the above data should look like:
Code:
     a    b    c    d
0    1    1    1    1
1    1    0    1    0
2    1    1    0    1
3    1    0    1    0
4    1    0    1    1
5    1    1    0    0
6    0    0    0    1
8    0    1    1    1    
9    0    0    1    0    
10   0    1    0    1
15   0    1    0    0

Thanks in advance. I'll appreciate your response.

Cheers Smilie
# 2  
Old 03-15-2011
What is present? The row number? How do you determine what to check for?

In other words, why does
2 5 3 4 give 1 1 0 1 ? (rownumber == 2)
# 3  
Old 03-15-2011
Present means existing.

To break it down for example
An entry called 0 is present in all columns a, b, c and d and thus its existence is denoted by 1, 1, 1, and 1 under a, b, c and d;
An entry called 1 is only present in columns a and c, but not in columns b and d. Thus its existence is denoted by a 1 in columns a and c and by a 0 in columns b and d;
Another entry called 2 exists/is present in columns a, b and d but not in column c, and thus its existence is denoted by 1 in columns a, b and d and by a 0 in column c;
the same goes true for other numbers in the entire data set of all columns.
Hope its clear.
# 4  
Old 03-15-2011
Here is a script solution for the example, assuming the file with the data is named file.tmp, and the file separator is a tab
Code:
#!/bin/bash
echo -e " \ta\tb\tc\td"
for i in $(tail -n+2 file.tmp | tr '\t' '\n' | sort -un)
do
  tail -n+2 file.tmp | cut -f1 | grep -w $i 1>/dev/null 2>&1
  let a=1-$?
  tail -n+2 file.tmp | cut -f2 | grep -w $i 1>/dev/null 2>&1
  let b=1-$?
  tail -n+2 file.tmp | cut -f3 | grep -w $i 1>/dev/null 2>&1
  let c=1-$?
  tail -n+2 file.tmp | cut -f4 | grep -w $i 1>/dev/null 2>&1
  let d=1-$?
  echo -e "$i\t$a\t$b\t$c\t$d"
done

This User Gave Thanks to Dahu For This Post:
# 5  
Old 03-15-2011
This worked perfectly !

Could you please comment on the following to let me know what it means. That will be a great help in understanding the code for me to experiment.

Code:
for i in $(tail -n+2 file.tmp | tr '\t' '\n' | sort -un)
do
  tail -n+2 file.tmp | cut -f1 | grep -w $i 1>/dev/null 2>&1
  let a=1-$?

To go one step ahead, in the output, how can I add another column lets say "e" after "d" which contains the sum of values in a, b, c and d. In other words, because 0 was present in all 4 initial columns in file.tmp, it got a value of 1(for existence) in all 4 output columns. How can I generate a column "e" which contains the sum of all these new entries, so inthe case of 0 I should get 4, and so on for the rest of the list.
I know I can do this in excel but I want to learn the command line for this.
Thanks a lot for the useful input.
Smilie
# 6  
Old 03-15-2011
Code:
# cat tst
a    b    c    d
0    0    0    0
1    2    1    2
2    5    3    4
3    8    4    6
4    10   9    8
5    15   8    10
# printf "%s\t%s\t%s\t%s\t%s\n" "" a b c d && awk 'NR>1{for (i=1;i<NF+1;i++) I[$i]=$i;A[$1]=($1==0)?1:$1;B[$2]=($2==0)?1:$2;C[$3]=($3==0)?1:$3;D[$4]=($4==0)?1:$4}END{for (j in I) print I[j],A[j]?1:0,B[j]?1:0,C[j]?1:0,D[j]?1:0}' OFS="\t" tst | sort -k1n
        a       b       c       d
0       1       1       1       1
1       1       0       1       0
2       1       1       0       1
3       1       0       1       0
4       1       0       1       1
5       1       1       0       0
6       0       0       0       1
8       0       1       1       1
9       0       0       1       0
10      0       1       0       1
15      0       1       0       0
#

This User Gave Thanks to ctsgnb For This Post:
# 7  
Old 03-15-2011
could you please comment on the code to make me understand a bit

thanks for your help though Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How AS 400 sorting data?

Hi Gurus, I have a requests to sort data based on AS 400 sorting order. below is example: the data is sorted by ascending order. could anybody explain how AS 400 sort data? IMM00007 07918607 1242 423 (3 Replies)
Discussion started by: green_k
3 Replies

2. Shell Programming and Scripting

Sorting the data with date

Hi, PFB the data: C_Random_130417 Java_Random_130518 Perl_Random_120519 Perl_Random_120528 so the values are ending with year,i.e.,130417 i want to sort the values with date. i want the output like this: Perl_Random_120519 Perl_Random_120528 C_Random_130417 Java_Random_130518 can... (5 Replies)
Discussion started by: arindam guha
5 Replies

3. Shell Programming and Scripting

Sorting the Data

My actual data looks like below i have given only format. i can't give exact data format of my requirement due to some reasons. I this set of data lines about 5000 I need to come up with information in below exact format of my data set : Line<space>Number1<space>"somedata":... (1 Reply)
Discussion started by: ckaramsetty
1 Replies

4. UNIX for Dummies Questions & Answers

Sorting data

Hello guys. I need help figuring this one out. It's probably really easy. Thanks in advance! I have a file say for example containing this: Rice Food Carrots Food Beans Food Plates Kitchen Fork Kitchen Knives Kitchen I need: Food Rice, Carrots, Beans Kitchen Plates, Fork,... (7 Replies)
Discussion started by: visuelz
7 Replies

5. UNIX for Dummies Questions & Answers

Help with Data Sorting Command

Hi, I have a problem on data sorting, example my file as below: 123 123/789 aaa bbb ccc ddd (adf) 112 112/123 aaa bbb ccc (ade) 102 1a3/7g9 (adf)03 110 12b/129 aaa bbb ccc ddd fff(a8f)03 117 42f/8c9 aaa bbb ccc ddd (adf) 142 120/tyu fff... (7 Replies)
Discussion started by: 793589
7 Replies

6. UNIX for Dummies Questions & Answers

Sorting data from a to z

Hi, Let's say I have these 3 columns; NGC1234 6 9 SL899 4 1 NGC1075 8 3 SL709 5 2 And I want to sort the data according to the first column (from a to z) like having them as: NGC1075 8 3 NGC1234 6 9 SL709 5 2 SL899 4 1 Can that be done... (2 Replies)
Discussion started by: cosmologist
2 Replies

7. Shell Programming and Scripting

PERL data - sorting

Hello, I have a page where multiple fields and their values are displayed. But I am able to sort only a few fields. When I looked into the issue, it is seen that the for each row of info , an unique id is generated and id.txt is generated and saved. Only those fields which are inside that id.txt... (3 Replies)
Discussion started by: eagercyber
3 Replies

8. UNIX for Dummies Questions & Answers

sorting data from who by IP

Hello. I have an RS/6000 running AIX 4 and I need to be able to see if there are any users that are logged on more than once from the same terminal so I can kick them off to make room for other terminals. 64 connections is the limit. Currently I am doing this: who | more and then manually... (11 Replies)
Discussion started by: raidzero
11 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies
Login or Register to Ask a Question