Data reformat and rearrangement problem asking


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data reformat and rearrangement problem asking
# 1  
Old 12-19-2011
Data reformat and rearrangement problem asking

Input file:
Code:
dependent    general_process    
dependent    general_process    
regulation    general_process    
-    -    
template    component    
food    component    
binding    data_rearrangement    
binding    data_rearrangement    
specific_activity    data_rearrangement    
-    -    
regulation    general_process    
regulation    general_process    
practise    general_process

Desired output:
Code:
regulation    3    binding    2    template    1    -    2
dependent    2    specific_activity    1    food    1

Command try
Code:
awk '$2=="general_process"{print $1}' input.txt | sort | uniq -c | sort -nrk1 | head -2 | awk '{print $2"\t"$1"\t"}' > tmp1.txt
awk '$2=="data_rearrangement"{print $1}' input.txt | sort | uniq -c | sort -nrk1 | head -2 | awk '{print $2"\t"$1"\t"}' > tmp2.txt
awk '$2=="component"{print $1}' input.txt | sort | uniq -c | sort -nrk1 | head -2 | awk '{print $2"\t"$1"\t"}' > tmp3.txt
awk '$2=="-"{print $1}' input.txt | sort | uniq -c | sort -nrk1 | head -2 | awk '{print $2"\t"$1"\t"}' > tmp4.txt
paste -d '\t' tmp1.txt tmp2.txt tmp3.txt tmp4.txt > desired_output.txt

The way I try seems like no efficient because it keeps on repeating calling the input.txt
Hopefully can get better way to figure it out.
# 2  
Old 12-19-2011
Is this what you're looking for?
Code:
perl -ane '$x{$F[0]}++;END{for(keys %x){print "$_ $x{$_}\n"}}' inputfile

Or, in the weird format as per desired output:
Code:
perl -ane '$x{$F[0]}++;END{$i=1;for(keys %x){($i%4==0)?print "$_ $x{$_}\n":print "$_ $x{$_}\t";$i++}}' inputfile


Last edited by balajesuri; 12-19-2011 at 07:36 AM..
# 3  
Old 12-19-2011
Thanks for reply, balejesuri Smilie
But it seems like given different output result when compared with desired output result Smilie
# 4  
Old 12-20-2011
1. Please explain the logic that is being used to arrive at the desired output from input.
2. Why are you considering only the first two lines by using head -2 in your commands?
3. There is no entry for "practise" in your output. Why so?
4. In what order is your desired output printed?
# 5  
Old 12-20-2011
For the column 1 and column 2, I would like to print out the top 2 hit of "general_process", column 3 and column 4, I would like to print out the top 2 hit of "data_rearrangement", column 5 and column 6, I would like to print out the top 2 hit of "component", column 7 and column 8, I would like to print out the top 2 hit of "-"
# 6  
Old 12-20-2011
Quote:
Originally Posted by cpp_beginner
For the column 1 and column 2, I would like to print out the top 2 hit of "general_process", column 3 and column 4, I would like to print out the top 2 hit of "data_rearrangement", column 5 and column 6, I would like to print out the top 2 hit of "component", column 7 and column 8, I would like to print out the top 2 hit of "-"
Maybe something like this?

Code:
$
$
$ cat f26
dependent    general_process
dependent    general_process
regulation    general_process
-    -
template    component
food    component
binding    data_rearrangement
binding    data_rearrangement
specific_activity    data_rearrangement
-    -
regulation    general_process
regulation    general_process
practise    general_process
$
$
$
$ perl -ane '$x{$F[1]}{$F[0]}++;
             END {
               foreach $i (0..1) {
                 foreach $k qw(general_process data_rearrangement component -) {
                   @keys = sort {$x{$k}{$b} <=> $x{$k}{$a}} keys %{$x{$k}};
                   printf ("%-20s %3d    ", $keys[$i], $x{$k}{$keys[$i]}) if defined $x{$k}{$keys[$i]};
                 }
                 print "\n";
               }
             }
            ' f26
regulation             3    binding                2    food                   1    -                      2
dependent              2    specific_activity      1    template               1
$
$

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 7  
Old 12-20-2011
Thanks durden_tyler, I just try your perl code.
It gives the following output result:
Code:
regulation             3    binding                2    food                   1    
dependent              2    specific_activity      1    template               1

It seems like lack of the "- 2" ?
Thanks for verification.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat data set

Input file 4CL1 O24145 CoA1 4CL1 P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777 coumarate 4CL3 Q54P79 coumarate 4CL3 QP7932 coumarate Desired output result 4CL1 O24145#P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777#Q54P79#QP7932 coumarate I... (5 Replies)
Discussion started by: perl_beginner
5 Replies

2. Shell Programming and Scripting

Help with reformat data structure

Input file: bv|111259484|pir||T49736_real_data bv|159484|pir||T9736_data_figure bv|113584|prf|T4736|truth bv|113584|pir||T4736_truth Desired output: bv|111259484|pir|T49736|real_data bv|159484|pir|T9736|data_figure bv|113584|prf|T4736|truth bv|113584|pir|T4736|truth Once the... (8 Replies)
Discussion started by: perl_beginner
8 Replies

3. Shell Programming and Scripting

Help with data rearrangement based on share same content

Input file data_2 USA data_2 JAPAN data_3 UK data_4 Brazil data_5 Singapore data_5 Indo data_5 Thailand data_6 China Desired output file data_2 USA/JAPAN data_3 UK data_4 Brazil data_5 Singapore/Indo/Thailand data_6 China I would like to merge all data content that share same... (2 Replies)
Discussion started by: perl_beginner
2 Replies

4. Shell Programming and Scripting

Help with reformat input data

Input file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Desired output file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Command that I try: (4 Replies)
Discussion started by: perl_beginner
4 Replies

5. Shell Programming and Scripting

Help with reformat data content

input file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score Desired output file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score ... (6 Replies)
Discussion started by: perl_beginner
6 Replies

6. Shell Programming and Scripting

Reformat the data of a file.

I have a file which have data like A.txt a 1Jan I am in a1. 1Jan I was born. 2Jan I am here. 3Jan I am in a3. b 1Jan I am in b1. c 2Jan I am in c2. d 2Jan I am in d2. 5jan I am in d5. date in the file might be vary evertime. (9 Replies)
Discussion started by: samkhu
9 Replies

7. Shell Programming and Scripting

Rearrangement of data content problem

Input data: >sample_1 WETYUPVLGK DGGHHHWETY QPERTTGGLO >sample_2 WRRTTOOLLP MKMKNJUTYE DLGLTTOC . . Desired output: >sample_1 WETYUP VLGKDG GHHHWE (8 Replies)
Discussion started by: patrick87
8 Replies

8. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC... (5 Replies)
Discussion started by: manishabh
5 Replies

9. Shell Programming and Scripting

Reformat Data (Perl)

I am new to Perl. I need to reformat a data file as the last part of a script I am working on. I am stuck on this. Here is the current format: CUSTOMER Filename 09/04/07-08:49 CUSTOMER Filename 09/04/07-08:52 CUSTOMER Filename 09/04/07-08:52 CUSTOMER2 Filename 09/04/07-08:49 CUSTOMER2... (3 Replies)
Discussion started by: flood
3 Replies

10. Shell Programming and Scripting

help reformat data with awk

I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far: # This is an awk program to convert the dates from... (4 Replies)
Discussion started by: climbak
4 Replies
Login or Register to Ask a Question