Delete columns if a pattern met


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete columns if a pattern met
# 1  
Old 02-11-2013
Delete columns if a pattern met

Hi,

I'd like to ask for some help with the following task, please:

there is a big file with a header (this is file.in):

HTML Code:
NAME A_1.X A_1.Y A_1.Z B_1.X B_1.Y B_1.Z
name1 AB 0.11 0.12 BB 0.45 0.67 
name2 BB 0.34 0.56 AA 0.89 0.68
what I need is to recognize a pattern in the header of this file (pattern is in another file) and delete the column with that header

for example, the file with the pattern looks like this (this is file.with.patterns)
HTML Code:
A_1
A_2
C_4
D_7
so, it would recognize A_1 and will delete all the columns containing A_1; thus, the output would look like this (this is file.out):

HTML Code:
NAME B_1.X B_1.Y B_1.Z
name1 BB 0.45 0.67 
name2 AA 0.89 0.68
I am not sure I've got the best approach. What I was thinking to do is to put all the columns whose header does not contain the specified pattern in one output file (so, those columns whose header does match the pattern will be let out, deleted):

HTML Code:
while read i
do
awk 'NR==1{for(a=1,a<=NF;a++) if ($a!~/$i/)f[n++]=a}
{for(a=0;a<=n;i++)printf"%s%s",a?":"",$f[a];print''} file.in >> file.out
done < file.with.patterns
one problem is that I would like to have all the columns whose header does not match the patterns in the file.with.patterns to be in the file.out and I am not sure if append sign (>>) would do that... it didn't really work well so far...


Another option I was thinking about is to establish the number of the columns whose header contains the pattern and then delete them with cut -f, but don't know how to do that.

Any ideas will be greatly appreciated!

Many thanks for your time!
# 2  
Old 02-11-2013
From your post what i understood is you need a file which contains filtered content of header and rule for filter is given by a file.
let's say rule file is rule.txt (containing A_1 A_2 etc) and file with header is file.html

Code:
head -1 file.html >temp
 
for $pat `cat rule.txt`
grep -vPw '$pat\.[A-Z]' temp >temp1
cat temp1 >temp
done


Last edited by Scrutinizer; 02-11-2013 at 07:55 PM.. Reason: code tags
# 3  
Old 02-11-2013
try also:
Code:
awk '
NR==FNR {p[$0]=$0; next}
FNR==1 {for (i=1; i<=NF; i++) {s=$i; sub("[.].*","",s); if (p[s]) o[i]=s}}
{l=""; for (i=1; i<=NF; i++) if (!o[i]) l=l $i" "; $0=l;}
1
' file.with.patterns file.in > file.out

This User Gave Thanks to rdrtx1 For This Post:
# 4  
Old 02-11-2013
try
Code:
awk '
  FNR==NR{P[$1];next}
  FNR==1{
    for(i=1;i<=NF;i++) {
      c=$i
      sub(/\.[XYZ]$/,"",c)
      if(c in P)S[i]
    }
  }
  { a=x
    for(i=1;i<=NF;i++)
    if(!(i in S)) a=a " " $i;
    print substr(a,2)
  }' file.with.patterns file.in > file.out

Edit: Just a bit late with an awk solution, but this one doesn't put a space on the end of each output line.

Last edited by Chubler_XL; 02-11-2013 at 08:04 PM..
This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 02-11-2013
Another awk:
Code:
awk '
  NR==FNR{
    A[$1]
    next
  }
  {
    s=$1
    for(i=2; i<=NF; i++){
      if(FNR==1) for(j in A) if($i~j) D[i]
      if( ! (i in D) ) s=s OFS $i
    }
    print s
  }
' file.with.patterns file.in

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 02-12-2013
Thanks a lot for the scripts! They work perfectly on the example file. However, they do nothing for my big file. I am not sure, but may be it is the field separator? the big file's columns are separated by a space, not tab; could that affect the script?

Many thanks in advance!
# 7  
Old 02-12-2013
Are there any carriage return characters in the 2 files? Post the output of:
Code:
head -2 file.with.patterns|od -bc

and
Code:
head -2 file.in|od -bc

.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to delete all lines before a particular pattern when the pattern is defined in a variable?

I have a file Line 1 a Line 22 Line 33 Line 1 b Line 22 Line 1 c Line 4 Line 5 I want to delete all lines before last occurrence of a line which contains something which is defined in a variable. Say a variable var contains 'Line 1', then I need the following in the output. ... (21 Replies)
Discussion started by: Soham
21 Replies

2. Shell Programming and Scripting

If first pattern is found, look for second pattern. If second pattern not found, delete line

I had a spot of trouble coming up with a title, hopefully you'll understand once you read my problem... :) I have the output of an ldapsearch that looks like this: dn: cn=sam,ou=company,o=com uidNumber: 7174 gidNumber: 49563 homeDirectory: /home/sam loginshell: /bin/bash uid: sam... (2 Replies)
Discussion started by: samgoober
2 Replies

3. Shell Programming and Scripting

Delete if condition met in a column

i have a table like this: id, senderNumber, blacklist ----------------------------- 1 0835636326 Y 2 0373562343 Y 3 0273646833 Y and I want to delete automatically if a new inserted row on another table consist anything on senderNumber column above using a BASH Script I... (9 Replies)
Discussion started by: jazzyzha
9 Replies

4. UNIX for Dummies Questions & Answers

Sed: delete columns 7,15,16

An extension from an earlier question. Now need a sed script to delete columns 7,15 and 16 from an example txt below.. Again, thanks in advance. 98M-01.WAV,98M,01,00:00:49,01:07:36:00,"MIX",,"BOOM-MKH50",,,,,,,,,,"", 98L-01.WAV,98L,01,00:00:51,01:01:45:00,"MIX",,"BOOM-MKH50",,,,,,,,,,"", (7 Replies)
Discussion started by: Vrc2250
7 Replies

5. Shell Programming and Scripting

Replacing a pattern in different cases in different columns with a single pattern

Hi All I am having pipe seperated inputs like Adam|PeteR|Josh|PEter Nick|Rave|Simon|Paul Steve|smith|PETER|Josh Andrew|Daniel|StAlin|peter Rick|PETer|ADam|RAVE i want to repleace all the occurrence of peter (in any case pattern PeteR,PEter,PETER,peter,PETer) with Peter so that output... (5 Replies)
Discussion started by: sudeep.id
5 Replies

6. Shell Programming and Scripting

sed pattern to delete lines containing a pattern, except the first occurance

Hello sed gurus. I am using ksh on Sun and have a file created by concatenating several other files. All files contain header rows. I just need to keep the first occurrence and remove all other header rows. header for file 1111 2222 3333 header for file 1111 2222 3333 header for file... (8 Replies)
Discussion started by: gary_w
8 Replies

7. UNIX for Dummies Questions & Answers

How to delete all columns that start with a specific value

I have this space delimited large text file with more than 1,000,000+ columns and about 100 rows. I want to delete all the columns that start with NA such that: File before modification aa bb cc NA100 dd aa b1 c2 NA101 de File after modification aa bb cc dd aa b1 c2 de How would I... (3 Replies)
Discussion started by: evelibertine
3 Replies

8. UNIX for Dummies Questions & Answers

How to delete last 3 columns in a file

Hii I have a file which contains huge amounts of data.I just want to delete last 3 columns in the without changing its format.The file contains data as shown below PDE 2001 10 29 202148.60 38.92 24.20 33 4.8 MLATH .F. ....... PDE 2001 10 29 203423.57 38.88 24.41 33 3.7 MLATH... (3 Replies)
Discussion started by: reva
3 Replies

9. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies

10. UNIX for Dummies Questions & Answers

delete some columns

I've got a data file like the following format. 196004010000 196004020000 8192 24 ueaag 98.793 18.750 20 ---- - 36 23 9999 314.161773681641 196004020000 196004030000 8192 24 ueaag 98.793 18.750 20 ---- - 36 23 9999 314.71533203125 196004030000 196004040000 8192 24... (7 Replies)
Discussion started by: su_in99
7 Replies
Login or Register to Ask a Question