Grepping non-alpa-numerics from first column only


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grepping non-alpa-numerics from first column only
# 1  
Old 10-22-2014
Grepping non-alpa-numerics from first column only

I have data in the following tab-separated format (consists of 200 columns all together, this is just a sampling)

Code:
</s> 0.001701 0.002025 0.002264 0.001430 -0.001300 
. -0.205240 0.177341 -0.426209 -0.661049 -0.048884 0.027032 
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
, -0.282318 -0.692638 0.350441 -0.600493 -0.370671 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

I want to remove all lines in which the FIRST column contains a non alpha-numeric column.

The desired result is this:

Code:
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

I have tried this is
Code:
grep

and
Code:
awk

with no success.

Code:
cat INPUT | cut -f 1 | grep -v "[[:punct:]]"

Code:
awk 'NR>1{t=$1;gsub(/[^[:punct:]]/,"");$0=t "\t" $0}1' INPUT

HOw can I solve this?

Last edited by owwow14; 10-22-2014 at 08:06 AM..
# 2  
Old 10-22-2014
Quote:
Originally Posted by owwow14
I have data in the following tab-separated format (consists of 200 columns all together, this is just a sampling)

Code:
</s> 0.001701 0.002025 0.002264 0.001430 -0.001300 
. -0.205240 0.177341 -0.426209 -0.661049 -0.048884 0.027032 
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
, -0.282318 -0.692638 0.350441 -0.600493 -0.370671 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

I want to remove all lines in which the FIRST column contains a non alpha-numeric column.

The desired result is this:

Code:
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

I have tried this is
Code:
grep

and
Code:
awk

with no success.

Code:
cat INPUT | cut -f 1 | grep -v "[[:punct:]]"

Code:
awk 'NR>1{t=$1;gsub(/[^[:punct:]]/,"");$0=t "\t" $0}1' INPUT

HOw can I solve this?
Hello owwow14,

Could you please try following, not tested though.

Code:
awk '{if($1 !~  /[[:punct:]]/ && $1 !~ /[[:digit:]]/) {print $0}}'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 10-22-2014
Great R. Singh! Thanks. As usual, you provided a great solution. It successfully removed all of the unwanted characters.
# 4  
Old 10-22-2014
How about
Code:
grep  "^[[:alnum:]]" file
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

# 5  
Old 10-22-2014
Quote:
Originally Posted by RudiC
How about
Code:
grep  "^[[:alnum:]]" file
the -0.159145 0.084377 0.056968 0.050934 0.160689 
of -0.230698 0.030112 0.021657 -0.091374 0.069027 
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308 
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633 
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229 
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

Hello Rudy,

This above solution will still catch the lines in which puncuation or digits are present first column in between of first column, OP wants to remove the lines completly if first column contains digits or punctuations in it. Following is an example of same.(I just made a change in input file to test it.)
Code:
grep  "^[[:alnum:]]" test24
th<e -0.159145 0.084377 0.056968 0.050934 0.160689
of -0.230698 0.030112 0.021657 -0.091374 0.069027
is -0.074473 -0.245787 0.246335 -0.504011 -0.322308
in -0.086738 -0.004564 0.163076 -0.114565 -0.156633
to 0.178787 0.249158 -0.115754 -0.282477 -0.290229
was -0.293781 -0.435587 -0.142019 -0.624197 -0.103400

Thanks,
R. Singh

Last edited by RavinderSingh13; 10-22-2014 at 08:57 AM.. Reason: grammar correction
# 6  
Old 10-22-2014
Quote:
Originally Posted by RavinderSingh13
Hello owwow14,

Could you please try following, not tested though.

Code:
awk '{if($1 !~  /[[:punct:]]/ && $1 !~ /[[:digit:]]/) {print $0}}'  Input_file

Thanks,
R. Singh

I think Ravinder's solution can be further simplified like this

Code:
awk 'gsub(/^[[:alnum:]]/,"&",$1)' file


Last edited by Akshay Hegde; 10-22-2014 at 09:13 AM..
# 7  
Old 10-22-2014
This depends on what the "first column" is - frequently people call that the first char position. The sample didn't give a hint on how to interpret it. Nevertheless, accepting your argument, try
Code:
grep  "^[[:alnum:]]* " file

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Round values only when it's numerics

Hi, all I have a field in a file looks like this(hundreds of lines): inf 1.24101 -0.185947 -0.349179 inf 0.126597 0.240142 -0.12031And what I expect is: inf 1.241 -0.186 -0.349 inf 0.127 (7 Replies)
Discussion started by: nengcheng
7 Replies

2. UNIX for Dummies Questions & Answers

Grepping al values of a particular column in a file

Name Num_free Num_active Pct_act Max_Used Reuse_cnt Instance_Name --------------------------------- --------------- ----------- ------- ----------- ----------- ------------------------------ additional network memory 0 ... (2 Replies)
Discussion started by: Rajeshneemkar
2 Replies

3. Shell Programming and Scripting

Grepping multiple strings from one column

I have 3-column tab separated data that looks like the following: act of+n-a-large+vn-tell-v 0.067427 act_com of+n+n-a-large-manufacturer-n 0.129922 act-act_com-com in+n-j+vn-pass-aux-restate-v 0.364499666667 com nmod+n-j+ns-invader-n 0.527521 act_com-com obj+n-a-j+vd-contribute-v 0.091413... (2 Replies)
Discussion started by: owwow14
2 Replies

4. Shell Programming and Scripting

Grepping one file column from another file

Hi all, I want to search the second col of a file as a sub-part of 4th col of another file and produce a joint output. In the example, search if B is contained as a sub-part in E:B:C (sub-parts separated by colons). Note the second row is not found doesnt find a match as F isnt there in col 4... (19 Replies)
Discussion started by: newbie83
19 Replies

5. Shell Programming and Scripting

How to extract 4th field if numerics?

I have a file which contains fields comma separated & with each field surrounded by quotes. The 4th field contains either a serial number, the text ABC, the text XYZ or it's blank. I want to only extract records which have a serial number. Here's some sample data: > cat myfile... (4 Replies)
Discussion started by: CHoggarth
4 Replies

6. Programming

Grepping a column from multiple file

I have 20 files that look pretty much like this: 0.01 1 3822 4.97379915032e-14 4.96982253992e-09 0 0.01 3822 1 4.97379915032e-14 4.96982253992e-09 0 0.01 2 502 0.00993165137406 993.165137406 0 0.01 502 2 0.00993165137406 993.165137406 0 0.01 4 33 0.00189645523539 189.645523539 0 0.01 33 4... (5 Replies)
Discussion started by: kayak
5 Replies

7. UNIX for Dummies Questions & Answers

Grepping A Specific Column

Hello, I have a log file that outputs the data below. I would like to grep and display the data where column is equal '148.' I've searched the forum, and couldn't find any answers. I've tried all the grep switches and I get the same result as the log. I'm thinking I might have to use an... (4 Replies)
Discussion started by: ravzter
4 Replies

8. Shell Programming and Scripting

Drop records with non-numerics in field X

I have tab delimited file and need to remove all records prior to sort, that have non-numerics in the Field 2. valid/invalid field 2 data examples are: " 123" valid "1 23" invalid " NOPE" invalid I've tried this awk it does not recognize tab as the delimiter or check... (3 Replies)
Discussion started by: akxeman
3 Replies

9. Shell Programming and Scripting

Conversion to display leading zeros for numerics

I have the following script (which works fine), escept I don't know how to make the MONTH and DAY show up with leading zeros. I have a statement (not in this script) which will show this in a YYYYMMDD format, but the script makes the MONTH and DAY fields show single digits. For today, as an... (4 Replies)
Discussion started by: dsimpg1
4 Replies
Login or Register to Ask a Question