Visit The New, Modern Unix Linux Community


delete the same column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting delete the same column
# 1  
delete the same column

Code:
var1 var2 var3 var4 var5 var6....
A G A T G T
G A A A A A
A A A A A A
G A G T A T

let's call this "file1.txt"

we can see that var2 and var5 have the idential values.
I would like to make a script so that if any variables (could be more than 2) have the identical values, i want to print the varible names.

for harder problem,
i want to write script so that i can print variables whose values are 95% identical.

Thanks,

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data.

Last edited by zaxxon; 07-12-2012 at 11:12 AM.. Reason: code tags
# 2  
What have you tried so far?
# 3  
Code:
[root@node3 ~]# cat infile 
var1 var2 var3 var4 var5 var6
A G A T G T
G A A A A A
A A A A A A
G A G T A T
[root@node3 ~]# cat matrix_transpose 
#!/bin/bash

transpose() 
{ 
  awk ' 
      { 
         if (max_nf<NF) 
             max_nf=NF 
         max_nr=NR 
         for(x=1; x<=NF; ++x) 
             matrix[x, NR]=$x 
      } 
END   { 
         for (x=1; x<=max_nf; ++x) { 
              for (y=1; y<=max_nr; ++y) 
                   printf("%s ", matrix[x, y]) 
              printf("\n") 
         } 
      }' ${1} 
} 

transpose ${1} | \
awk   '
      {
         ++a[$2FS$3FS$4FS$5]
         b[$2FS$3FS$4FS$5]=b[$2FS$3FS$4FS$5]FS$1
      }
END   {
         for(i in a) 
            if(a[i]>1) 
               print b[i]
      }' 
[root@node3 ~]# bash matrix_transpose infile 
 var2 var5
 var4 var6

# 4  
uh..i have to say i don't quite understand your script Smilie
can you explain little bit?
and i need a script for bigger data set, so it won't be 6*6 table.
# 5  
This should work for any size...
Code:
$ cat file1.txt
var1 var2 var3 var4 var5 var6
A G A T G T
G A A A A A
A A A A A A
G A G T A T

$ awk 'NR == 1 {
               split($0, var, FS)
               next
       }

       {
               for (x = 1; x < NF; x++)
                       for (y = x + 1; y <= NF; y++)
                               if ($x == $y)
                                       cnt[x, y]++
       }

       END {
               tot = NR - 1
               for (x = 1; x < NF; x++)
                       for (y = x + 1; y <= NF; y++)
                               if (cnt[x, y] >= perc / 100 * tot)
                                       print var[x], var[y]

       } '  perc=95 file1.txt > file2.txt

$ cat file2.txt
var2 var5
var4 var6

$

This User Gave Thanks to Ygor For This Post:

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #1008
Difficulty: Medium
A power series will converge for some values of the variable x and may diverge for others.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete part of a column

I want to delete a part of the 4th column from the given file below: <Text Text_ID="10155645315851111_10155645333076543" From="460350337461111" Created="2011-03-16T17:05:37+0000" use_count="123">This is the first text</Text> <Text Text_ID="10155645315851111_10155645317023456"... (2 Replies)
Discussion started by: my_Perl
2 Replies

2. Shell Programming and Scripting

Delete last characters in each column

I need to delete the last 11 characters from each number and they are all in the same line (each is in a different column): -6.89080901827020800000 3.49348891708562325136 1.47988367839905286876 -2.29707635413510400000 -3.49342364708562325136 -4.43758473239905286876 -2.29707635413510400000... (14 Replies)
Discussion started by: rogeriog.em
14 Replies

3. Shell Programming and Scripting

Delete a row if either of column value is zero

Hi, My input file is this way 1.1 0.0 2.4 3.5 7.9 1.8 22.3 4.7 8.9 0.9 1.3 0.0 3.4 5.6 0.0 1.1 2.2 0.0 0.0 1.1 0.0 0.0 3.4 5.6 I would like to delete the entire row, if either of 2nd and 3rd columns are 0.0. Please note that my values are all decimal values. So, my output would... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

4. Shell Programming and Scripting

Delete Space between last and first column

HI Input A R04 tion=1 Li 51599 R08 tiea=1 Li 51995 R11 ocatea=1 Li 51992 R12 nArea=1 Li ... (2 Replies)
Discussion started by: asavaliya
2 Replies

5. Shell Programming and Scripting

Delete certain column awk

Hi experts, I have a file, where inside this file contains multiple columns (up to 2000 columns): 0.05 0.54 2.02 0.21 1.39 2.92 0.31 1.75 3.34 I want to delete column 3,6,9,12,15,18,21... etc (any columns where can be divided from value 3), so that results is like: 0.05 0.54 0.21 1.39... (6 Replies)
Discussion started by: guns
6 Replies

6. Shell Programming and Scripting

delete a column in XML file

I have any XML ouput file(file name TABLE.xml), where the data is loaded in A SINGLE LINE, I need need help in writting a ksh shell script which gives me the output which should not have column S_NO This is my input file which has S_NO column. <?xml version="1.0"... (4 Replies)
Discussion started by: pred55
4 Replies

7. Shell Programming and Scripting

delete first column

Dear All, I need to delete the first column, which blank in the tab delimited file and i need to print all the other columns with out disturbing the format of the file. I have tried with the followings and its not working for my case awk '{$1=""}1' file > newfilecut -d, -f2- dataCan anyone... (6 Replies)
Discussion started by: Fredrick
6 Replies

8. Shell Programming and Scripting

Delete first row last column

Hi All, I am having following file and I want to delete 1 row last column. Current File Content: ================ procedure test421 put_line procedure test321 test421 procedure test521 test321 procedure test621 test521 Expected File Content: =========================== procedure... (3 Replies)
Discussion started by: susau_79
3 Replies

9. UNIX for Dummies Questions & Answers

to delete a column in unix

Hai Please let me know the command in unix(command mode ) from a file that deletes the whole column (not 1st and last column )& replace with numbers starting from 1. Regards suneetha. (8 Replies)
Discussion started by: gaddesuneetha
8 Replies

10. Shell Programming and Scripting

Search for by column and delete line

I have a file with thousands of lines. I need to search for a specific value in a specific field and delete the lines that match. example. abcdXX1234567 abcdXY1234567 abcdXX1234567 abcdXX1234567 If there is an XY in position 5 and 6 then remove that line. Any suggestions would... (4 Replies)
Discussion started by: thudak
4 Replies

Featured Tech Videos