awk based script to ignore all columns from a file which contains character strings


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk based script to ignore all columns from a file which contains character strings
# 1  
Old 10-16-2012
awk based script to ignore all columns from a file which contains character strings

Hello All,

I have a .CSV file where I expect all numeric data in all the columns other than column headers.
But sometimes I get the files (result of statistics computation by other persons) like below( sample data)

Code:
SNO,Data1,Data2,Data3
1,2,3,4
2,3,4,SOME STRING
3,4,Inf,5
4,5,4,4

I want to remove the columns where ever character strings( I cannot provide the complete list as it is unknown) appear in any row of a column.
So the output will be like this
Code:
SNo,Data1
1,2
2,3
3,4
4,5

I have 10000 columns in my file, so manually deleting all these columns taking lot of time.

Thanks
Sidda
# 2  
Old 10-16-2012
try..


Code:
awk -F, '{ if(NR==1){a=split($0,P,",")};
for(i=1;i<=NF;i++){if($i !~ /[A-z]/ || NR==1){arr[NR,P[i]]=$i}else{K[++t]=i}}}
END{for(i in K){P[K[i]]=0}}
END{for(i=1;i<=NR;i++){
for(j=1;j<=NF;j++){if(arr[i,P[j]] != ""){s=s?s","arr[i,P[j]]:arr[i,P[j]]}}
print s;s=""}
}' file

# 3  
Old 10-16-2012
A bit shorter:
Code:
awk -F, 'NR>1{for (i=1;i<=NF;i++) if ($i~"[^0-9]") print i}' file| sort | uniq | awk -F, 'NR==FNR{a[$0]=1;next}{for (i=1;i<=NF;i++) if (!(i in a)) printf $i",";printf "\n"}' - file

# 4  
Old 10-16-2012
Code:
awk 'FNR==NR{
if(FNR>1 && $0 !~ /^[0-9, \t]$/)
 for(i=1;i<=NF;i++)
  if(!(i in drop) && $i !~ /^[0-9, \t]$/)
   drop[i]
next}
{t="";n=1
for(i=1;i<=NF;i++)
 if(!(i in drop))
  t=(n++ == 1)? $i : t FS $i
$0=t
}1' FS=, file file

# 5  
Old 10-16-2012
Code:
awk -F, '{for (i=1;i<=NF;i++) {
             Ar[NR,i] = $i
             if (NR>1) L[i] = L[i] || $i !~ /^[0-9]*$/
            }
         }
         END {for (j=1;j<=NR;j++)
                {for (i=1;i<=NF;i++)
                  if (!L[i]) printf "%s%s", (i>1)?",":"", Ar[j ,i]
                 print""
                }
             }
        ' file


Last edited by RudiC; 10-16-2012 at 06:11 AM..
# 6  
Old 10-16-2012
Thanks

Hi Pamu/All,
Your code works perfectly.
I am sorry for little confusion here.
My actual input data looks like this
Code:
Type,Data1,Data2,Data3 
A,2,3,4 
B,3,4,SOME STRING 
C,4,Inf,5 
D,5,4,4

Where column header Type is a string which says case of each row. Some times there will be many like Type1,Type2 etc columns and I want to preserve them
and make your code work on the rest of the columns like 4 to last column.
Basically I want to offset column number from where I need to remove columns that have strings.

If possible modify your code to account for my request.

Thanks
Sidda

Last edited by ks_reddy; 10-16-2012 at 06:23 AM.. Reason: Small correction in input.
# 7  
Old 10-16-2012
Quote:
Originally Posted by ks_reddy
But which part in your code specifies, how many columns to exclude starting from 1st column ? ( In this example we printed the column1 as it is right)
Suppose If I want to exclude first 2 columns later, how can I do that ??

Regards
Sidda
Doesn't your input data decide which column to exclude? It is not possible to know the column numbers beforehand (as far as I can understand your requirement).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to ignore mutiple strings when using shell script?

Hi All, I am trying to use below syntax to find ignore multiple locations while searching for a file. find / -name "$serviceitem" ! -size 0 2>&1 |egrep -v "tmp|docker|WinSxS|Permission|HISTORY|alternatives|bearer11ssl|manifest" I tried to assign all the ignore strings to one variable... (2 Replies)
Discussion started by: sravani25
2 Replies

2. Shell Programming and Scripting

awk to ignore multiple rows based on a condition

All, I have a text file(Inputfile.csv) with millions of rows and 100 columns. Check the sample for 2 columns below. Key,Check A,1 A,2 A, A,4 B,0 B,1 B,2 B,3 B,4 .... million rows. My requirement is to delete all the rows corresponding to all the keys which ever has at least one... (4 Replies)
Discussion started by: ks_reddy
4 Replies

3. Shell Programming and Scripting

Delimit file based on character length using awk

Hi, I need help with one problem, I came across recently. I have one input file which I need to delimit based on character length. $ cat Input.txt 12345sda231453 asd760kjol62569 sdasw4g76gdf57 And, There is one comma separated file which mentions "start of the field" and "length... (6 Replies)
Discussion started by: Prathmesh
6 Replies

4. Shell Programming and Scripting

How to remove certain character strings with awk?

Hi all, I need to remove DBPATH= and /db from the string below using awk (or sed, as it also exists on the machine). Input: DBPATH=/some/path/database/db Desired output: /some/path/database Thank you! (8 Replies)
Discussion started by: ejianu
8 Replies

5. Shell Programming and Scripting

awk error - validating strings in columns

can someone please help me fix this command: somecommand.sh | awk -F"---" 'BEGIN{count=0} /P/ && /ERROR/ {if (($3 ~ /^P$/) && ($6 ~ /ERROR/)) {print; count++ }END { print count } ;}' basically, what i'm attempting to do here is print all the matching lines, then, at the end, print the total... (2 Replies)
Discussion started by: SkySmart
2 Replies

6. Shell Programming and Scripting

awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values. This is what my data looks like 1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH 1ku1A02 1 10... (9 Replies)
Discussion started by: viored
9 Replies

7. Shell Programming and Scripting

Help with awk replacing identical columns based on another file

Hello, I am using Awk in UBUNTU 12.04. I have a file like following with three fields and 44706 rows. F1 A A F2 G G F3 A T I have another file like this: AL_1 F1 A A AL_2 F1 A T AL_3 F1 A A AL_1 F2 G G AL_2 F2 G A AL_3 F2 G G BO_1 F1 A A BO_2 F1 A T... (6 Replies)
Discussion started by: Homa
6 Replies

8. Shell Programming and Scripting

awk based script to find the average of all the columns in a data file

Hi All, I need the modification for the below mentioned code (found in one more post https://www.unix.com/shell-programming-scripting/27161-script-generate-average-values.html) to find the average values for all the columns(but for a specific rows) and print the averages side by side. I have... (4 Replies)
Discussion started by: ks_reddy
4 Replies

9. Shell Programming and Scripting

Awk based script to find the median of all individual columns in a data file

Hi All, I have some data like below. Step1,Param1,Param2,Param3 1,2,3,4 2,3,4,5 2,4,5,6 3,0,1,2 3,0,0,0 3,2,1,3 ........ so on Where I need to find the median(arithmetic) of each column from Param1...to..Param3 for each set of Step1 values. (Sort each specific column, if the... (5 Replies)
Discussion started by: ks_reddy
5 Replies

10. Linux

Linux script to remove a character in a file based on position.

Greetings, We have a requirement where we need to loop in a fixed width file in linux and remove a character based on a position for every record. It would highly appreciate if someone can help to automate this. Appreciate your time and help! Regards (3 Replies)
Discussion started by: mailme0205
3 Replies
Login or Register to Ask a Question