remove column based on the same value

09-28-2012

Registered User

17, 0

Join Date: Aug 2012

Last Activity: 12 May 2020, 2:19 AM EDT

Posts: 17

Thanks Given: 15

Thanked 0 Times in 0 Posts

remove column based on the same value

Hello,

I have some problem to remove the columns which have the duplicate value of -9 which is in every row except -9 in some row.

Input file showed in below :

Code:

Col1 Col2 Col3 Col4 Col5 Col6
A  1  A  -9  0  -9
B  2   T  -9  -9  -9
C  3   D  -9  1   -9
D  4   R  -9  2   -9

Output should be showed like this :

Code:

A   1   A  0 
B   2    T  -9
C   3    D  1
D   4    R   2

Col4 and Col6 contain -9 in all, I would like to remove them.
First, I solved this with transposing the input and then use command : sed -i '/-9/d' inputfile but this code removes -9 in every row which contain -9. It is still the mistake.

Please suggest me.
Thank you very much.

Last edited by Franklin52; 09-28-2012 at 08:05 AM.. Reason: Please use code tags for data and code samples

awil

View Public Profile for awil

Find all posts by awil

09-28-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try:

Code:

awk '{$4=$5}NF=4' infile

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

09-28-2012

Registered User

182, 38

Join Date: Jun 2012

Last Activity: 23 September 2019, 9:01 AM EDT

Location: Lombardia, Italy

Posts: 182

Thanks Given: 5

Thanked 38 Times in 38 Posts

Something like:

Code:

cut -d" " -f 4,6 --complement infile >outfile

Then check outfile. If it's fine, then:

Code:

mv outfile file

--
Bye

This User Gave Thanks to Lem For This Post:

Lem

View Public Profile for Lem

Find all posts by Lem

09-28-2012

Registered User

17, 0

Join Date: Aug 2012

Last Activity: 12 May 2020, 2:19 AM EDT

Posts: 17

Thanks Given: 15

Thanked 0 Times in 0 Posts

Thank you very much Lem and Scrutinizer. This code is worked for known number of column which contain -9.

However, in this case, input data is a lot of columns and rows, then I do not know the certainly number of column which contain -9. I'm not sure that this problem can use shell script or not. How should I solve this case? Thank you very much.

awil

View Public Profile for awil

Find all posts by awil

09-29-2012

Registered User

182, 38

Join Date: Jun 2012

Last Activity: 23 September 2019, 9:01 AM EDT

Location: Lombardia, Italy

Posts: 182

Thanks Given: 5

Thanked 38 Times in 38 Posts

I guess your problem can easily be solved with awk alone, but since I don't know awk...

A possible alternative solution can be this one: knowing which is the colums delimiter in your file (if it isn't a single char, there are workarounds), you can use cut to select them one by one. If a column is made by "-9"s and nothing else, then if you uniq it you'll get a single "-9". If so, you can remember the colum number (to later discard the column).

Something like:

Code:

#!/bin/bash
infile="$1"
DELIM=$'\t'                         ### this is a tab, but it could be also a space: check your input file,
                                    ### and set it accordingly. It must be one single char only, though.

COLS=$(head -n1 "$infile" | wc -w)  ### How many columns are there in your file?
declare -a DISCARD
for ((n=1;n<=$COLS;n++)); do
 if [[ "$(cut -d "$DELIM" -f $n "$infile" | uniq)" = "-9" ]]; then
  DISCARD+=( $n )
 fi
done
cut -d "$DELIM" -f $(IFS="," echo "${DISCARD[*]}") --complement "$infile"
exit 0

Save it as myscript, and then give it exec permission with: chmod +x myscript.

Usage: ./myscript inputfile or ./myscript inputfile >newfile.
--
Bye

Last edited by Lem; 09-29-2012 at 08:10 AM..

This User Gave Thanks to Lem For This Post:

Lem

View Public Profile for Lem

Find all posts by Lem

09-29-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

You could also try this awk version:

Code:

awk '
  NR==FNR{
    if(NR>1){
      for(i=1;i<=NF;i++) if($i==-9) A[i]++
      m++
    }
    next
  }
  {
    for(i=1;i<=NF;i++) if(A[i]==m) $i=x
    $0=$0
    $1=$1
  }
  1
' file file

The file gets read two times.

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

10-01-2012

Registered User

17, 0

Join Date: Aug 2012

Last Activity: 12 May 2020, 2:19 AM EDT

Posts: 17

Thanks Given: 15

Thanked 0 Times in 0 Posts

Thank you very very much Lem and Scrutinizer. I tried them already. These codes are worked. Very useful.

awil

View Public Profile for awil

Find all posts by awil

Shell Programming and Scripting

remove column based on the same value

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate rows based on one column

Discussion started by: clarissab

2. Shell Programming and Scripting

How to remove a line based on contents of the first column?

Discussion started by: BRH

3. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Discussion started by: informaticist

4. Shell Programming and Scripting

Remove lines based on column value

Discussion started by: Hkins552

5. Shell Programming and Scripting

remove duplicates based on single column

Discussion started by: Diya123

6. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

7. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Discussion started by: Qwerty123

8. Shell Programming and Scripting

Remove duplicate line detail based on column one data

Discussion started by: patrick87

9. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly