09-18-2012
Parse tab delimited file, check condition and delete row
I am fairly new to programming and trying to resolve this problem. I have the file like this.
CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam |
tg93 77 T C T T T T T |
tg93 79 C - C C C - - |
tg93 79 C G C C C C G C |
tg93 80 G A G G G G A A G |
tg93 81 A C A A A A C C C |
tg93 86 C A C C A A A A C |
tg93 105 A G A A A A A G A |
tg93 108 A G A A A A G A A |
tg93 114 T C T T T T T C T |
tg93 131 A C A A A A A A A |
tg93 136 G C C G C C G G G |
tg93 150 CTCTC - CTCTC - CTCTC CTCTC |
In this file, in the heading
CHROM - name POS - position REF - reference ALT - alternate 10 - 16_sample.bam - samplesd I Now i wanted to see how many times the letter in REF and ALT column occured. If either of them is repeated less than two times, i need to delete that row. For example In the first row, i have 'T' in REF and 'C' in ALT . I see in 7 samples, there are 5 T's and 2 blanks and no C. So i need to delete this row. In Second row, REF is 'C' and Alt is '-'. Now in seven samples we have 3 C's, 2 '-'s and 2 blanks. So we keep this row as C and - have repeated more than 2 times. Always we ignore the blanks while counting
The final file after filtering is
#CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam |
tg93 79 C - C C C - - |
tg93 80 G A G G G G A A G |
tg93 81 A C A A A A C C C |
tg93 86 C A C C A A A A C |
tg93 108 A G A A A A G A A |
tg93 136 G C C G C C G G G |
I am able to read the columns in to arrays and display them in the code but i am not sure how to start the loops to read the base and count their occurences and remain the column. Can anyone tell me how i should be proceeding with this? Or it will be helpful if you have any example code i can modify up on.
Thank you for the help !!
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi All,
Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format and "105" if it is not in tab-delimited format.
If the... (9 Replies)
Discussion started by: Mandab
9 Replies
2. Shell Programming and Scripting
I would like to remove characters from column 7 so that from an input file looking like this:
>HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD
I get something like that in an output file:
... (13 Replies)
Discussion started by: matlavmac
13 Replies
3. Shell Programming and Scripting
I have a large text-file with tab-delimited genetic data that looks like:
KSC112 KSC234 0 0 1 1 A G C T
I simply wan to delete the first column, but since the file has 600 000 columns, it is not possible with awk (seems to be limited at 32k columns).
Does anyone have an idea how to do this? (2 Replies)
Discussion started by: andmal
2 Replies
4. UNIX for Dummies Questions & Answers
How do you delete cells from a space delimited text file given row and column number? Letś say the row number is r and the column number is c. Thanks! (5 Replies)
Discussion started by: evelibertine
5 Replies
5. UNIX for Dummies Questions & Answers
Hello gurus,
I have a file in a tab delimited format and a header row. I need a code to delete the header in the file, and convert the file to a fixed width format, with all the columns aligned. Below is a sample of the file:... (4 Replies)
Discussion started by: chumsky
4 Replies
6. Shell Programming and Scripting
Hi,
Can anyone please tell me about how we can delete an entire column from a tab delimited file?
Mu input_file.txt looks like this:
And I want the output as:
I used the below code
nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies
7. Shell Programming and Scripting
Hi all ,
I have a file having 12 columns tab delimited .
I need to read this file and remove the column 3 and column 4 and insert a word in column 3 as "AVIALABLE "
Is there a way to do this . I am trying like below
Thanks
DJ
cat $FILENAME|awk -F"\t" '{ print $1 "\t... (3 Replies)
Discussion started by: Hypesslearner
3 Replies
8. UNIX for Dummies Questions & Answers
Hi, I have a rquirement in unix as below .
I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column.
ex:
Input Text file:
1|A|apple
2|B|bottle
excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies
9. UNIX for Beginners Questions & Answers
Hi there,
I would like to use awk to reformat a tab-delimited file containing three columns as follows:
Data file:
sample 1 173
sample 269 530
sample 687 733
sample 1699 1779
Desired output file:
sample 174..265, 531..686, 734..1698
I need the value... (5 Replies)
Discussion started by: emiley
5 Replies
10. UNIX for Beginners Questions & Answers
Hello Everyone..
I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files
FileP.txt
... (2 Replies)
Discussion started by: YogeshG
2 Replies
LEARN ABOUT DEBIAN
kctape
KCTAPE(1) General Commands Manual KCTAPE(1)
NAME
kctape - handle tape files for KCemu
SYNOPSIS
kctape -t tapefile [command [command_args]]
kctape --help
DESCRIPTION
This manual page documents briefly the kctape command. This manual page was written for the Debian GNU/Linux distribution because the
original program does not have a manual page.
OPTIONS
-h, --help
Show short help text.
-v, --verbose
Be verbose about what's going on.
-t, --tape tapefile
tape archive to process.
-l, --list
List the content of the tapefile.
-c, --create
Create the specified tapefile.
-a, --add file [file ...]
Add new file to tapefile (in KC85/3 mode).
-1, --add1 file [file ...]
Add new file to tapefile (in KC85/1 mode).
-r, --remove filename
Remove file from tapefile.
-x, --extract filename
Extract file to stdout or to the file specified by -o.
-d, --dump filename
Hexdump file from tapefile.
-o, --output filename
Output file for the extract command.
-b, --print-bam
Show internal block allocation map.
-B, --print-block-list
Show internal block list.
SEE ALSO
kcemu(1x), kc2tap(1), kc2wav(1), kc2img(1), kc2raw(1)
AUTHOR
This manual page was written by Torsten Paul <Torsten.Paul@gmx.de>, for the Debian GNU/Linux system (but may be used by others).
KCTAPE(1)