Parsing the test file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing the test file
# 1  
Old 05-13-2014
Parsing the test file

Hello,

I want to retrieve the rows with uniq count(column 4) for every *ref gene(column 7) on the basis of strand(column8 ) and tss(column 5).
If a ref gene has same number of count and it is on negative strand then keep the row with its highest tss and likewise*
If a ref gene has same number of count and it is on positive strand then keep the row with its lowest tss

I am working on the dat of format:
Code:
CHR	TSS-25bp	TSS+25bp	count 	tss	Ensemble transcript	refgene	strand
chr15	79554474	79554524	2	79554499	ENSMUST00000089311	Sun2	-
chr15	79554475	79554525	2	79554500	ENSMUST00000100439	Sun2	-
chr15	79554477	79554527	2	79554502	ENSMUST00000046259	Sun2	-
chr15	79569054	79569104	1	79569079	ENSMUST00000159660	Sun2	-
chr15	79570243	79570293	4	79570268	ENSMUST00000160355	Sun2	-
chr17	44914075	44914125	2	44914100	ENSMUST00000050630	Supt3h	+
chr17	44914248	44914298	3	44914273	ENSMUST00000130623	Supt3h	+
chr17	44914319	44914369	3	44914344	ENSMUST00000127798	Supt3h	+
chr11	87551028	87551078	2	87551053	ENSMUST00000152700	Supt4h1	+
chr11	87551029	87551079	2	87551054	ENSMUST00000141169	Supt4h1	+
chr7	29099891	29099941	2	29099916	ENSMUST00000003527	Supt5h	-
chr11	78020504	78020554	3	78020529	ENSMUST00000108314	Supt6h	-



I would expect this in the output:
Code:
CHR	TSS-25bp	TSS+25bp	count 	tss	Ensemble transcript	refgene	strand
chr15	79554477	79554527	2	79554502	ENSMUST00000046259	Sun2	-
chr15	79569054	79569104	1	79569079	ENSMUST00000159660	Sun2	-
chr15	79570243	79570293	4	79570268	ENSMUST00000160355	Sun2	-
chr17	44914075	44914125	2	44914100	ENSMUST00000050630	Supt3h	+
chr17	44914248	44914298	3	44914273	ENSMUST00000130623	Supt3h	+
chr11	87551028	87551078	2	87551053	ENSMUST00000152700	Supt4h1	+
chr7	29099891	29099941	2	29099916	ENSMUST00000003527	Supt5h	-
chr11	78020504	78020554	3	78020529	ENSMUST00000108314	Supt6h	-


So far I have this ,
Code:
Code:
#!/bin/bash

example=Workbook4.txt
for gene in `cut -f7 example | uniq`
** do
** sign=`grep $gene example | cut -f8 | uniq`
** for count in `grep $gene example | cut -f4 | sort | uniq`
** do
* * * if [ "$sign" == "-" ]
* * * then
* * * grep $gene example | grep $count example | sort -k5 | head -1 ----
* * * else
* * * grep $gene example | grep $count example | sort -k5 | tail -1 ----
** done
** break
done

]

I am not sure about the one in bold. It would be nice if you can help me solving this.*

Thanks for your time
Kirthi

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data. Thanks

Last edited by BCW_123; 05-13-2014 at 02:16 PM.. Reason: code tags
# 2  
Old 05-13-2014
I'm not quite sure what you want. 'On the basis of' doesn't tell us what basis, just that there is one involving certain columns. "unique count" in particular doesn't make sense -- the counts in your resulting output are not all unique.

Do you mean that if, for a particular refgene, if there are more than one with the same count, you want to keep the one with the lowest or highest tss depending on the last column +/- ?
# 3  
Old 05-13-2014
I am sorry when I mean unique count(column 4) its for every refgene(column7).

So Sun2 should have unique count 2,1,and 4.

The way I would like to filter the duplicates is :

If a ref gene has same number of count and it is on negative strand then keep the row with its highest tss(column 5) and likewise*
If a ref gene has same number of count and it is on positive strand then keep the row with its lowest tss.

I hope you got my question.

Thanks for your time
Kirti
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Hit multiple URL from a text file and store result in other test file

Hi, I have a problem where i have to hit multiple URL that are stored in a text file (input.txt) and save their output in different text file (output.txt) somewhat like : cat input.txt http://192.168.21.20:8080/PPUPS/international?NUmber=917875446856... (3 Replies)
Discussion started by: mukulverma2408
3 Replies

2. Shell Programming and Scripting

Problem in test file operator on a ufsdump archive file mount nfs

Hi, I would like to ask if someone know how to test a files if exist the file is a nfs mount ufsdump archive file.. i used the test operator -f -a h almost all test operator but i failed file1=ufs_root_image.dump || echo "files doesn't exist && exit 1 the false file1 is working but... (0 Replies)
Discussion started by: jao_madn
0 Replies

3. Shell Programming and Scripting

How to check weather a string is like test* or test* ot *test* in if condition

How to check weather a string is like test* or test* ot *test* in if condition (5 Replies)
Discussion started by: johnjerome
5 Replies

4. Shell Programming and Scripting

Test on string containing spacewhile test 1 -eq 1 do read a $a if test $a = quitC then break fi d

This is the code: while test 1 -eq 1 do read a $a if test $a = stop then break fi done I read a command on every loop an execute it. I check if the string equals the word stop to end the loop,but it say that I gave too many arguments to test. For example echo hello. Now the... (1 Reply)
Discussion started by: Max89
1 Replies

5. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

6. Shell Programming and Scripting

Finding & Moving Oldest File by Parsing/Sorting Date Info in File Names

I'm trying to write a script that will look in an /exports folder for the oldest export file and move it to a /staging folder. "Oldest" in this case is actually determined by date information embedded in the file names themselves. Also, the script should only move a file from /exports to... (6 Replies)
Discussion started by: nikosey
6 Replies

7. Shell Programming and Scripting

Parsing a configuration Test tile

Team I need help parsing a text file that meet the layout below: high:850:856:214:855:810 med:852:304:310 low:315:240:323:310 I need to read each line and if for example a line start with high in in that same line there is a 850 or any other number then I wan to print it. The same ohld true... (4 Replies)
Discussion started by: edpdgr
4 Replies
Login or Register to Ask a Question