remove column based on the same value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove column based on the same value
# 1  
Old 09-28-2012
remove column based on the same value

Hello,

I have some problem to remove the columns which have the duplicate value of -9 which is in every row except -9 in some row.

Input file showed in below :
Code:
Col1 Col2 Col3 Col4 Col5 Col6
A  1  A  -9  0  -9
B  2   T  -9  -9  -9
C  3   D  -9  1   -9
D  4   R  -9  2   -9

Output should be showed like this :
Code:
A   1   A  0 
B   2    T  -9
C   3    D  1
D   4    R   2

Col4 and Col6 contain -9 in all, I would like to remove them.
First, I solved this with transposing the input and then use command : sed -i '/-9/d' inputfile but this code removes -9 in every row which contain -9. It is still the mistake.

Please suggest me.
Thank you very much.

Last edited by Franklin52; 09-28-2012 at 08:05 AM.. Reason: Please use code tags for data and code samples
# 2  
Old 09-28-2012
Try:
Code:
awk '{$4=$5}NF=4' infile

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 09-28-2012
Something like:
Code:
cut -d" " -f 4,6 --complement infile >outfile

Then check outfile. If it's fine, then:
Code:
mv outfile file

--
Bye
This User Gave Thanks to Lem For This Post:
# 4  
Old 09-28-2012
Thank you very much Lem and Scrutinizer. This code is worked for known number of column which contain -9.

However, in this case, input data is a lot of columns and rows, then I do not know the certainly number of column which contain -9. I'm not sure that this problem can use shell script or not. How should I solve this case? Thank you very much.
# 5  
Old 09-29-2012
I guess your problem can easily be solved with awk alone, but since I don't know awk... Smilie

A possible alternative solution can be this one: knowing which is the colums delimiter in your file (if it isn't a single char, there are workarounds), you can use cut to select them one by one. If a column is made by "-9"s and nothing else, then if you uniq it you'll get a single "-9". If so, you can remember the colum number (to later discard the column).

Something like:
Code:
#!/bin/bash
infile="$1"
DELIM=$'\t'                         ### this is a tab, but it could be also a space: check your input file,
                                    ### and set it accordingly. It must be one single char only, though.

COLS=$(head -n1 "$infile" | wc -w)  ### How many columns are there in your file?
declare -a DISCARD
for ((n=1;n<=$COLS;n++)); do
 if [[ "$(cut -d "$DELIM" -f $n "$infile" | uniq)" = "-9" ]]; then
  DISCARD+=( $n )
 fi
done
cut -d "$DELIM" -f $(IFS="," echo "${DISCARD[*]}") --complement "$infile"
exit 0

Save it as myscript, and then give it exec permission with: chmod +x myscript.

Usage: ./myscript inputfile or ./myscript inputfile >newfile.
--
Bye

Last edited by Lem; 09-29-2012 at 08:10 AM..
This User Gave Thanks to Lem For This Post:
# 6  
Old 09-29-2012
You could also try this awk version:
Code:
awk '
  NR==FNR{
    if(NR>1){
      for(i=1;i<=NF;i++) if($i==-9) A[i]++
      m++
    }
    next
  }
  {
    for(i=1;i<=NF;i++) if(A[i]==m) $i=x
    $0=$0
    $1=$1
  }
  1
' file file

The file gets read two times.
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 10-01-2012
Thank you very very much Lem and Scrutinizer. I tried them already. These codes are worked. Very useful. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

2. Shell Programming and Scripting

How to remove a line based on contents of the first column?

Good day all. Using basic UNIX/Linux tools, how would you delete a line based on a character found in column 1? For example, if the CITY name contains an 'a' or 'A', delete the line: New York City; New York Los Angeles; California Chicago; Illinois Houston; Texas Philadelphia;... (3 Replies)
Discussion started by: BRH
3 Replies

3. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column. e.g. a 1 a 2 a 3 b 1 c 1 gives b 1 c 1 but requires 11 duplicates before it deletes. Thanks for the help Video tutorial on how to use code tags in The UNIX... (11 Replies)
Discussion started by: informaticist
11 Replies

4. Shell Programming and Scripting

Remove lines based on column value

Hi All, I just need a quick fix here. I need to delete all lines containing "." in the 6th column. Input: 1 1055498 . G T 5.46 . 1 1902377 . C T 7.80 . 1 1031540 . A G 34.01 PASS 1 ... (2 Replies)
Discussion started by: Hkins552
2 Replies

5. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

6. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

7. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

8. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

9. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies
Login or Register to Ask a Question