Delete columns if a pattern met

02-11-2013

Registered User

101, 0

Join Date: Jul 2009

Last Activity: 22 August 2016, 6:09 PM EDT

Posts: 101

Thanks Given: 15

Thanked 0 Times in 0 Posts

Delete columns if a pattern met

Hi,

I'd like to ask for some help with the following task, please:

there is a big file with a header (this is file.in):

HTML Code:

NAME A_1.X A_1.Y A_1.Z B_1.X B_1.Y B_1.Z
name1 AB 0.11 0.12 BB 0.45 0.67 
name2 BB 0.34 0.56 AA 0.89 0.68

what I need is to recognize a pattern in the header of this file (pattern is in another file) and delete the column with that header

for example, the file with the pattern looks like this (this is file.with.patterns)

HTML Code:

A_1
A_2
C_4
D_7

so, it would recognize A_1 and will delete all the columns containing A_1; thus, the output would look like this (this is file.out):

HTML Code:

NAME B_1.X B_1.Y B_1.Z
name1 BB 0.45 0.67 
name2 AA 0.89 0.68

I am not sure I've got the best approach. What I was thinking to do is to put all the columns whose header does not contain the specified pattern in one output file (so, those columns whose header does match the pattern will be let out, deleted):

HTML Code:

while read i
do
awk 'NR==1{for(a=1,a<=NF;a++) if ($a!~/$i/)f[n++]=a}
{for(a=0;a<=n;i++)printf"%s%s",a?":"",$f[a];print''} file.in >> file.out
done < file.with.patterns

one problem is that I would like to have all the columns whose header does not match the patterns in the file.with.patterns to be in the file.out and I am not sure if append sign (>>) would do that... it didn't really work well so far...

Another option I was thinking about is to establish the number of the columns whose header contains the pattern and then delete them with cut -f, but don't know how to do that.

Any ideas will be greatly appreciated!

Many thanks for your time!

zajtat

View Public Profile for zajtat

Find all posts by zajtat

02-11-2013

Registered User

45, 2

Join Date: Feb 2013

Last Activity: 15 January 2020, 9:59 PM EST

Posts: 45

Thanks Given: 3

Thanked 2 Times in 2 Posts

From your post what i understood is you need a file which contains filtered content of header and rule for filter is given by a file.
let's say rule file is rule.txt (containing A_1 A_2 etc) and file with header is file.html

Code:

head -1 file.html >temp
 
for $pat `cat rule.txt`
grep -vPw '$pat\.[A-Z]' temp >temp1
cat temp1 >temp
done

Last edited by Scrutinizer; 02-11-2013 at 07:55 PM.. Reason: code tags

kg_gaurav

View Public Profile for kg_gaurav

Find all posts by kg_gaurav

02-11-2013

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

try also:

Code:

awk '
NR==FNR {p[$0]=$0; next}
FNR==1 {for (i=1; i<=NF; i++) {s=$i; sub("[.].*","",s); if (p[s]) o[i]=s}}
{l=""; for (i=1; i<=NF; i++) if (!o[i]) l=l $i" "; $0=l;}
1
' file.with.patterns file.in > file.out

This User Gave Thanks to rdrtx1 For This Post:

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

02-11-2013

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

try

Code:

awk '
  FNR==NR{P[$1];next}
  FNR==1{
    for(i=1;i<=NF;i++) {
      c=$i
      sub(/\.[XYZ]$/,"",c)
      if(c in P)S[i]
    }
  }
  { a=x
    for(i=1;i<=NF;i++)
    if(!(i in S)) a=a " " $i;
    print substr(a,2)
  }' file.with.patterns file.in > file.out

Edit: Just a bit late with an awk solution, but this one doesn't put a space on the end of each output line.

Last edited by Chubler_XL; 02-11-2013 at 08:04 PM..

This User Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

02-11-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Another awk:

Code:

awk '
  NR==FNR{
    A[$1]
    next
  }
  {
    s=$1
    for(i=2; i<=NF; i++){
      if(FNR==1) for(j in A) if($i~j) D[i]
      if( ! (i in D) ) s=s OFS $i
    }
    print s
  }
' file.with.patterns file.in

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-12-2013

Registered User

101, 0

Join Date: Jul 2009

Last Activity: 22 August 2016, 6:09 PM EDT

Posts: 101

Thanks Given: 15

Thanked 0 Times in 0 Posts

Thanks a lot for the scripts! They work perfectly on the example file. However, they do nothing for my big file. I am not sure, but may be it is the field separator? the big file's columns are separated by a space, not tab; could that affect the script?

Many thanks in advance!

zajtat

View Public Profile for zajtat

Find all posts by zajtat

02-12-2013

Registered User

1,413, 498

Join Date: Mar 2012

Last Activity: 8 November 2019, 2:39 AM EST

Location: India

Posts: 1,413

Thanks Given: 101

Thanked 498 Times in 474 Posts

Are there any carriage return characters in the 2 files? Post the output of:

Code:

head -2 file.with.patterns|od -bc

and

Code:

head -2 file.in|od -bc

elixir_sinari

View Public Profile for elixir_sinari

Find all posts by elixir_sinari

Shell Programming and Scripting

Delete columns if a pattern met

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to delete all lines before a particular pattern when the pattern is defined in a variable?

Discussion started by: Soham

2. Shell Programming and Scripting

If first pattern is found, look for second pattern. If second pattern not found, delete line

Discussion started by: samgoober

3. Shell Programming and Scripting

Delete if condition met in a column

Discussion started by: jazzyzha

4. UNIX for Dummies Questions & Answers

Sed: delete columns 7,15,16

Discussion started by: Vrc2250

5. Shell Programming and Scripting

Replacing a pattern in different cases in different columns with a single pattern

Discussion started by: sudeep.id

6. Shell Programming and Scripting

sed pattern to delete lines containing a pattern, except the first occurance

Discussion started by: gary_w

7. UNIX for Dummies Questions & Answers

How to delete all columns that start with a specific value

Discussion started by: evelibertine

8. UNIX for Dummies Questions & Answers

How to delete last 3 columns in a file

Discussion started by: reva

9. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Discussion started by: imas

10. UNIX for Dummies Questions & Answers

delete some columns

Discussion started by: su_in99