Getting the non-homogenous letter row from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting the non-homogenous letter row from a text file
# 1  
Old 06-06-2013
Getting the non-homogenous letter row from a text file

I do have a large tab delimited file with the following format

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCCC 23 65 3 4
AAAAAAAAAAAAAAAAaAAAAAAAAAAAAAAAA 24 6 89 90
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTTT 2 4 8 90
TTTT-TTTTTTTTTTTtTTTTTTTTTTTTTTTT 1 34 89 50
GGGGGGGGGGGGGGGGTGGGGGGGGGGGGGGGG 87 6 78 66
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 8 78 45 61
AAAAATAAAAAAGGGAAAAAAAAAAAAAAAAAA 78 8 9 23

Each row/line will have 33 letters and each line will only have multiple occurrences of etters out of a pool of ATGC (also small atgc). some may have also '-'. I would like to extract those lines (rows) that have a non-homogenious letters or if one or more letter is different compared to the rest, grap that entire column.

This is the desired out put.

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCCC 23 65 3 4
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTTT 2 4 8 90
GGGGGGGGGGGGGGGGTGGGGGGGGGGGGGGGG 87 6 78 66
AAAAATAAAAAAGGGAAAAAAAAAAAAAAAAAA 78 8 9 23

Please let me know the best way to do this in awk.
# 2  
Old 06-06-2013
Here's a perl:
Code:
perl -ane '/^(.)/ && ($x = $1); print if ($F[0] !~ /^[$x-]+$/i)' file

# 3  
Old 06-09-2013
Alphabet counting

Thanks That worked.
I would like to get another awk solution. I have a file with the following format

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAAA
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT

Each row/line will have 32 letters and each line will only have multiple occurrences of 2 or more letters out of a pool of ATGC (also small atgc). some may have also '-'. I would like to count the occurrence of each alphabet in a line and output the position number/ numbers of all the counted alphabet.

Desired output is
Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC C 1 2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 G 6
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAAA A 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32   T 5
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT T 1 3 4 5 7 8 9 10 11 12 13 14 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  G 2 15 16 
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT T 1 2 3 4 6 7 8 9 10 11 12 13 14 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 C 15

Please let me know the best way to do this in awk.

---------- Post updated at 09:30 AM ---------- Previous update was at 05:41 AM ----------

Is there a way to do it in either perl or awk??? looking forward to see suggestions
# 4  
Old 06-09-2013
Try:
Code:
perl -F -lape '%posits=(); 
map { push @{$posits{uc($F[$_])}}, $_+1  unless $F[$_] eq "-" } 0..$#F;
$_ .= " " . join(" ", map { $_, @{$posits{$_}} } keys %posits)' file

# 5  
Old 07-15-2013
How do I modify the above code so that it would count the occurrence of the alphabet that is different from the first alphbet in each of the lines and output the position number/ numbers of that different alphabet only.

Desired output

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC G 6
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAA    T 5
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT G 2 15 16
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT C 15

---------- Post updated at 08:20 PM ---------- Previous update was at 04:51 PM ----------

can we do this using awk?
# 6  
Old 07-15-2013
Quote:
Originally Posted by Lucky Ali
How do I modify the above code so that it would count the occurrence of the alphabet that is different from the first alphbet in each of the lines and output the position number/ numbers of that different alphabet only.

Desired output

Code:
CCCCCGCCCCCCCCCCcCCCCCCCCCCCCCCC G 6
AAAATAAAAAAAAAAAaAAAAAAAAAAAAAA    T 5
TGTTTTTTTTTTTTGGtTTTTTTTTTTTTTTT G 2 15 16
TTTT-TTTTTTTTTCTtTTTTTTTTTTTTTTT C 15

---------- Post updated at 08:20 PM ---------- Previous update was at 04:51 PM ----------

can we do this using awk?
How many times do we need to answer the same question for you?

What was wrong with the answer you got to this question a year and a half ago: alphabet counting?
# 7  
Old 07-15-2013
That was based on the smallest occurrence. But here I need the count of the alphabet that is different of the first one. It is not necessary that the different alphabet will have the smallest occurrence in each line.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print every alternate column in row in a text file

Hi, I have a comma separated file. I would like to print every alternate columns into a new row. Example input file: Name : John, Age : 30, DOB : 30-Oct-2018 Example output: Name,Age,DOB John,30,30-Oct-2018 (3 Replies)
Discussion started by: Lini
3 Replies

2. Shell Programming and Scripting

Column to row and position data in a text file

Hi everyone.. I have a list of values in a file... a, b, c, 1, 2, 3, aaaa, bbbbb, I am interested in converting this column to a row.. "text",aaaa, bbbb a,1 (7 Replies)
Discussion started by: manihi
7 Replies

3. Shell Programming and Scripting

Replace specific letter in a file by other letter

Good afternoon all, I want to ask how to change some letter in my file with other letter in spesific line eg. data.txt 1 1 1 0 0 0 0 for example i want to change the 4th line with character 1. How could I do it by SED or AWK. I have tried to run this code but actually did not... (3 Replies)
Discussion started by: weslyarfan
3 Replies

4. Shell Programming and Scripting

Search row by row from one file to another file if match is found print few colums of file 2

this is the requirement list.txt table1 table2 table3 testfile.txt name#place#data#select * from table1 name2#place2#data2#select * from table 10 innerjoin table3 name2#place2#data2#select * from table 10 output name place table1 name2 place table3 i tried using awk (7 Replies)
Discussion started by: vamsekumar
7 Replies

5. Shell Programming and Scripting

Select row from file and text

Hi all! I would like to solve a problem but I have no clue of how do it!I will be grateful if someone could help me! Briefly I have a big file like this: >ENSMUSG00000000204 | ENSMUST00000159637 GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG >ENSMUSG00000000204 |... (3 Replies)
Discussion started by: giuliangiuseppe
3 Replies

6. UNIX for Dummies Questions & Answers

Adding a column to a text file with row numbers

Hi, I would like to add a new column containing the row numbers to a text file. How do I go about doing that? Thanks! Example input: A X B Y C D Output: A X 1 B Y 2 C D 3 (5 Replies)
Discussion started by: evelibertine
5 Replies

7. UNIX for Dummies Questions & Answers

How do you delete cells from a space delimited text file given row and column number?

How do you delete cells from a space delimited text file given row and column number? Letś say the row number is r and the column number is c. Thanks! (5 Replies)
Discussion started by: evelibertine
5 Replies

8. UNIX for Dummies Questions & Answers

find positions of a letter in a text file

Hi, I would like to know how can I get all the positions of a letter, let say letter C in a text file. sample input file: hcck pgog hlhhc desired output file: 2 3 13 Many thanks! (2 Replies)
Discussion started by: fadista
2 Replies

9. Shell Programming and Scripting

how can I bcp out a table into a text file including the header row in the text file

Hi All, I need to BCP out a table into a text file along with the table headers. Normal BCP out command only bulk copies the data, and not the headers. I am using the following command: bcp database1..table1 out file1.dat -c -t\| -b1000 -A8192 -Uuser -Ppassword -efile.dat.err Regards,... (0 Replies)
Discussion started by: shilpa_acc
0 Replies

10. Shell Programming and Scripting

Changing the column for a row in a text file and adding another row

Hi, I want to write a shell script which increments a particular column in a row from a text file and then adds another row below the current row with the incremented value . For Eg . if the input file has a row : abc xyz lmn 89 lm nk o p I would like the script to create something like... (9 Replies)
Discussion started by: aYankeeFan
9 Replies
Login or Register to Ask a Question