Extracting column if above certain values and repeated over a number of times continuously


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting column if above certain values and repeated over a number of times continuously
# 1  
Old 09-22-2010
Extracting column if above certain values and repeated over a number of times continuously

Hi
I am new to the forum and would like to ask:
i have a file in form with thousands of column
Code:
id.1   A01   A01   A68   A68       
id.2   A5   A5   A3   A3       
1001   0   0   0.136   0.136       
1002   0   0   0.262   0.183       
1003   0   0   0.662   0.662       
1004   0   0   0.758   0.607       
1005   0   0   0.855   0.855       
1006   0   0   0.867   0.867       
1007   0   0   0.872   0.692       
1008   0   0   0.902   0.902       
1009   0   0   0.906   0.906

and thousnds rows.....
Code:
1317   0   0   0.3   1       
1318   0   0   0.3   0.618       
1319   0   0   0.3   1       
1320   0   0   0.3   1

Except for the header 2 rows, i would like to extract the part of the column (together with the header 2 row and first index column)
if
the value of the column is above say 0.35
and is continuous in the same column for say 50 rows

One step further is to see if it is possible to allow for a certain number of interruption, say 50 rows with at most 2 rows have value lower than set


Thanks for suggesting command to solve this

Last edited by vbe; 09-22-2010 at 01:56 PM.. Reason: use code tags, it will keep the format...
# 2  
Old 09-22-2010
Hey try this. But not sure if my understanding of ur text above is correct. If the current script does not work then please elaborate:


Code:
cols_to_print="1,2"

while col_no in 3 4 5 6 7 8 9 10 ##type all columns to include in search
do
y=`cut -f ${field_no}  -d " " filename | uniq -c | sort -b -n -r | \
sed -n -e '1s/[^0-9]*\([0-9][0-9]*\).*/\1/p'`

[ $y -ge 50 ] && cols_to_print=${cols_to_print}",${col_no}"

done

cut -d " " -f ${cols_to_print}


Last edited by vbe; 09-22-2010 at 02:55 PM..
# 3  
Old 09-23-2010
Thank you very much.
The script did not work and returned with

"cut: invalid byte, character or field list"

I run the script on Centos 5.4

newbeeuk

---------- Post updated at 07:43 AM ---------- Previous update was at 05:29 AM ----------

I may not have explained clearly in my first post

Here is the another file for illustration, first 2 rows and first columns are identifier
Code:
id.1 A01 A01 A01 A68 A68
id.2 A5 A5 A5 A3 A3
1001 0.01 0 abc 0.136 0.136
1002 0.02 0 abc 0.262 0.183
1003 0.03 0 abc 0.662 0.662
1004 0.04 0 def 0.758 0.607
1005 0.05 0 def 0.855 0.855
1006 0.06 0 efg 0.867 0.867
1007 0 0 hij 0.872 0.692
1008 0.04 0 def 0.902 0.902
1009 0 0.06 def 0.51 0.906
1010 0 0 efg 0.51 0.906
1011 0 0.04 hij 0.51 0.969
1013 0.05 0 abc 0.07 0.743
1014 0.06 0 def 0.971 0.971
1015 0.07 0 def 0.971 0.743
1021 0 0.51 efg 0.996 0.996
1022 0 0.51 hij 0.996 0.996
1023 0 0.51 abc 0 0.995
1026 0.51 0 def 0 0.996
1027 0.51 0 poi 0 0.996
1028 0 0 sdf 0.05 0.996

I want everything that is greater than k (now assume >0.5)
and is continuous in that particular column for more than j rows (now assume>=3)
than I would like the output in 1 or multiple files in form of

first segment from column 3
Code:
id.1 A01
id.2 A5
1021 0.51
1022 0.51
1023 0.51

second segment from column 4
Code:
id.1 A68
id.2 A3
1003 0.662
1004 0.758
1005 0.855
1006 0.867
1007 0.872
1008 0.902
1009 0.51
1010 0.51
1011 0.51
1013 0.07

third segment also from column 5, split by 1 row with value < 0.5
Code:
id.1 A68
id.2 A3
1014 0.971
1015 0.971
1021 0.996
1022 0.996

and the last from column 6
Code:
id.1 A68
id.2 A3
1003 0.662
1004 0.607
1005 0.855
1006 0.867
1007 0.692
1008 0.902
1009 0.906
1010 0.906
1011 0.969
1013 0.743
1014 0.971
1015 0.743
1021 0.996
1022 0.996
1023 0.995
1026 0.996
1027 0.996
1028 0.996


none is coming from column 4 because all are alphabets and none from column 2 becasuse no continuous 3 row with value > 0.5

Thanks for help

newbeeuk

Last edited by Franklin52; 09-23-2010 at 10:02 AM.. Reason: Please use code tags!
# 4  
Old 09-23-2010
An example for column 3, for other columns you can modify the column number:
Code:
awk "NR<3{print $1, $3; next} $3 > 0.5{print $1, $3}" file

# 5  
Old 09-23-2010
Hi Franklin52,

Thanks for help.
The code failed and returned syntax error.

newbeeuk
# 6  
Old 09-23-2010
Sorry, I used double quotes instead of single quotes, try this for column 3:
Code:
awk 'NR<3{print $1, $3; next} $3 > 0.5{print $1, $3}' file

# 7  
Old 09-23-2010
Thanks Franklin52
It doesn't work either. The output is only first 2 row of column 3.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting values based on line-column numbers from multiple text files

Dear All, I have to solve the following problems with multiple tab-separated text file but I don't know how. Any help would be greatly appreciated. I have access to Linux mint (but not as a professional). I have multiple tab-delimited files with the following structure: file1: 1 44 2 ... (5 Replies)
Discussion started by: Bastami
5 Replies

2. UNIX for Dummies Questions & Answers

Append no of times a column is repeated at the end

Hi folks, Iam working on a bash script, i need to print how many times column 2 repeated at the end of each line. Input.txt COL1 COL2 COL3 COL4 1 XX 45 N 2 YY 34 y 3 ZZ 44 N 4 XX 89 Y 5 XX 45 N 6 YY 84 D 7 ZZ 22 S Output.txt COL1 COL2 COL3 COL4 COL5 1 XX 45 N 3 2 YY 34... (6 Replies)
Discussion started by: tech_frk
6 Replies

3. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

4. Homework & Coursework Questions

Accepting a phrase and counting the number of times that it is repeated in a specific website

1. The problem statement, all variables and given/known data: Develop a shell script that accepts a phrase and counts the number of times that it is repeated in a specific website. Note: Im not sure if it's the whole website, or just a specific page but im guessing its thewhole website. ... (2 Replies)
Discussion started by: Zakerii
2 Replies

5. Shell Programming and Scripting

Choosing between repeated entries based on the "absolute values" of a column

Hello, I was looking for a way to select between the repeated entries (column1) based on the values of absolute values of column 3 (larger value). For example if the same gene id has FC value -2 and 1, I should get the output as -2. Kindly help. GeneID Description FC ... (2 Replies)
Discussion started by: Sanchari
2 Replies

6. Shell Programming and Scripting

Script for extracting data from csv file based on column values.

Hi all, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (3 Replies)
Discussion started by: Vivekit82
3 Replies

7. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

8. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on numerical values of a column

I have a text file where the second column is a list of numbers going from small to large. I want to extract the rows where the second column is smaller than or equal to 0.0001. My input: rs10082730 9e-08 12 46002702 rs2544081 1e-07 12 46015487 rs1425136 1e-06 7 35396742 rs2712590... (1 Reply)
Discussion started by: evelibertine
1 Replies

9. UNIX for Dummies Questions & Answers

To get the total of how many times the pattern is repeated.

Hi, I need the total of how many times the pattern is being repeated in a particular file. I tried with grep as below, but no go. grep -c best sample Contents of Sample file: ----------------------- This is the best site I have never come across. The best one indeed. Very best... (8 Replies)
Discussion started by: venkatesht
8 Replies

10. UNIX for Dummies Questions & Answers

how many times does this repeated sequence exist

need a script to determine daily how many times does the below repeated sequence exist in a log file, and if it shows failure to connect in the same file 200 PORT Command successful. 150 Opening BINARY mode data connection for rgr016.daily.0305. 226 Transfer complete. local: rgr016.daily.0306... (3 Replies)
Discussion started by: lichento
3 Replies
Login or Register to Ask a Question