Help with generating a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with generating a script
# 1  
Old 09-14-2011
Help with generating a script

I am a biologist who is new to linux and am having difficulty generating a script to do what I want it to do! I have tried basic grep commands, but even that does not give me back the data I want.

I have many files that are all currently in .xslx and I'm not sure if they need to be .csv or .txt for this to work... Each of these files has ~90,000 lines.

Basically I want a script that will ask me what I am looking for:
Code:
echo Please input gene name you wish to look for
read GeneName

Then I want to look through all of the lines in all of my files (~200) and see if $GeneName is present in field 13 (it may be present multiple times in the one file, or not at all).

If it is present in field 13, I want the entire line/s it is in, plus the file name/s it is found in to be printed to a .txt file.

I have tried using grep command and it just gives me back all lines from the file, not the lines containing $GeneName. I apologise I do not have a better unix background, but I have been trying for days and I just cannot generate it!!! I would appreciate any help.

Kelly
# 2  
Old 09-14-2011
Hi Kelly,
Please provide some sample data from the files.


Regards,
Mayur
# 3  
Old 09-14-2011
So the files all come in this format below:

There is a heading line at the top of each file, and each subsequent line starts with chr. For this example, I would be wanting to identify the lines containing WASH7P in file1.txt and file2.txt (or if this would work with .csv files?)

file1.txt
Code:
chr_name	chr_start	chr_end	ref_base	alt_base	hom_het	snp_quality	tot_depth	alt_depth	dbSNP	dbSNP131	region	gene	change	annotation	dbSNP132	1000genomes	allele freq
chr01	14907	14907	A	G	het	108	52	39	snp131	rs6682375	ncRNA	WASH7P	.	.	rs6682375	.	.
chr01	14930	14930	A	G	het	148	62	44	snp131	rs6682385	ncRNA	WASH7P	.	.	rs6682385	1000g2010nov_all	0.71
chr01	761752	761752	C	T	hom	225	69	69	snp131	rs1057213	ncRNA	NCRNA00115	.	.	rs1057213	1000g2010nov_all	0.544
chr01	761800	761800	A	T	hom	42	11	11	snp131	rs1064272	ncRNA	NCRNA00115	.	.	rs1064272	1000g2010nov_all	0.114

file2.txt
Code:
chr_name	chr_start	chr_end	ref_base	alt_base	hom_het	snp_quality	tot_depth	alt_depth	dbSNP	dbSNP131	region	gene	change	annotation	dbSNP132	1000genomes	allele freq
chr01	17556	17556	C	T	het	43	30	9	.	.	ncRNA	WASH7P	.	.	.	.	.
chr01	69511	69511	A	G	hom	225	106	106	snp131	rs2691305	exonic	OR4F5	nonsynonymous SNV	"OR4F5:NM_001005484:exon1:c.A421G:p.T141A,"	rs2691305	1000g2010nov_all	0.789
chr01	761732	761732	C	T	hom	225	103	102	snp131	rs2286139	ncRNA	NCRNA00115	.	.	rs2286139	1000g2010nov_all	0.537


I would like be asked to type in GeneName (i.e. WASH7P) and the output in a .txt file to be something like:
Code:
file1.txt:chr01	14907	14907	A	G	het	108	52	39	snp131	rs6682375	ncRNA	WASH7P	.	.	rs6682375	.	.
file1.txt:chr01	14930	14930	A	G	het	148	62	44	snp131	rs6682385	ncRNA	WASH7P	.	.	rs6682385	1000g2010nov_all	0.71
file2.txt:chr01	17556	17556	C	T	het	43	30	9	.	.	ncRNA	WASH7P	.	.	.	.	.

Many thanks,

Kelly

Last edited by Franklin52; 09-14-2011 at 04:00 AM.. Reason: Please use code tags for code and data samples, thank you
# 4  
Old 09-14-2011
Hi Kelly,
Go through this and let me know if there's any problem. Use tags while posting your queries. The code below is working fine for me and giving the expected output.

Code:
 echo "the text"
         read Genename
         grep $Genename file1.txt > somefile.txt


Regards,
Mayur
# 5  
Old 09-14-2011
You will need to convert to txt or csv first as awk/grep require text files not binary.

Try this:

Code:
echo Please input gene name you wish to look for
read GeneName
for file in *.txt
do
    awk -v F=$file -vN=$GeneName '$13 ~ N { print F": "$0 }' $file
done

This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 09-14-2011
Hi Kelly,
Missed one detail use following line for grep
Code:
 grep -H $Genename file1.txt > somefile.txt


Regards,
Mayur
This User Gave Thanks to mayursingru For This Post:
# 7  
Old 09-14-2011
To Chubler_XL

Thank you so so much Chubler_XL, this is what happened when I ran your script:
Code:
Please input gene name you wish to look for
SOD1
awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

Any clue why?

Something I forgot to mention, I would like the output file to be $GeneName_date.txt please

Thank you so much again for your help,

Kelly

---------- Post updated at 09:22 PM ---------- Previous update was at 09:16 PM ----------

Dear Mayur,

Thank you so much for providing me with this script, but I am having the same problem as I have had before

When I ran your script
Code:
echo "Please input gene name you wish to look for"
         read GeneName
         grep -H $GeneName *.txt > $GeneName.txt

All it did was concatenate the 6 .txt files I have in that directory? I just want single lines from these files that contain $GeneName.

Thank you again for your help.

Kelly

P.S. Maybe I can email two of my data files?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Is there a way to handle commas inside the data when generating a csv file from shell script?

I am extracting data via sql query and some of the data has commas. Output File must be csv and I cannot update the data in the db (as it is used by other application). Example table FavoriteThings Person VARCHAR2(25), Favorite VARCHAR2(100) Sample Data Greta rain drop on... (12 Replies)
Discussion started by: patk625
12 Replies

2. Shell Programming and Scripting

Random number generating script?

Having a hard time with this. Very new to scripting and linux. Spent all sunday trying to do this. Appreciate some help and maybe help breaking down what the syntax does. Create a Bash program. It should have the following properties • Creates a secret number between 1 and 100 i. The... (3 Replies)
Discussion started by: LINUXnoob15
3 Replies

3. Shell Programming and Scripting

Help with ahem Prime number Generating Script

Can anybody tell me why the second part of this script (Sieve of Eratosthenes) isn't working properly. This isnt coursework or homework just private studies ( Yes Project Euler began it ) I know there are easier ways of doing this too but I want to do it this way.:p Iam using Cygwin on Vista... (3 Replies)
Discussion started by: drewann
3 Replies

4. Shell Programming and Scripting

auto-generating assembly code by variables found by script

Hi everybody I'm working on a list of registers(flip-flops to be exact), now i need to extract some value from this list and use them as arguments to pass them to some assembly code for example i have: 118 chain79 MASTER (FF-LE) FFFF 1975829 /TCK F FD1TQHVTT1 ... (1 Reply)
Discussion started by: Behrouzx77
1 Replies

5. Shell Programming and Scripting

Converting date/time and generating offsets in bash script

Hi all, I need a script to do some date/time conversion. It should take as an input a particular time. It should then generates a series of offsets, in both hour:minute form and number of milliseconds elapsed. For 03:00, for example, it should give back 04:02:07 (3727000ms*) 05:04:14... (2 Replies)
Discussion started by: emdan
2 Replies

6. Shell Programming and Scripting

Help generating a script for next-generation sequencing data

I am not sure if this is entirely possible, but I want to compare data in a particular column in several .txt files and have a new file generated. I am a biologist with limited unix knowledge. There are currently no programs written for this type of analysis. First I would like to define the... (1 Reply)
Discussion started by: kellywilliams
1 Replies

7. Shell Programming and Scripting

Problem with script generating files in directory recursively

I have a script which generates recursively some files in folders for a given root folder. I have checks for permissions and it works for all folders except one(i have 777 permission on it). When i try calling the script in problematic folder(problematic folder being root folder), script works as... (2 Replies)
Discussion started by: bb2
2 Replies

8. UNIX for Dummies Questions & Answers

A shell script or software for generating random passwords

Hi, Is there an shell script/batch file to genarate random passwords which expires after a stipulated time period? Please suggest a software which does this for AIX and windows both else. Thanks. (5 Replies)
Discussion started by: dwiravi
5 Replies

9. Shell Programming and Scripting

Generating millions of record using shell script

Hi All, My requirement is like this. I want to generate records of 1 million lines. If I say lines it means one line will contain some string or numbers like AA,3,4,45,+223424234,Tets,Ghdj,+33434,345453434,........................ upto length lets say 41. ( 41 comma sepearted aplha numneric... (2 Replies)
Discussion started by: Rahil2k9
2 Replies

10. Shell Programming and Scripting

Awk Script for generating a report

Hi all, I have a log file of the below format. 20081016:000042 asdflasjdf asljfljs asdflasjf safjl 20081016:000229 /lask/ajlsdf/askdfjsa 20081016:000229 /lashflas /askdfaslj hsfhsahf 20081016:000304 lasflasj ashfashd 20081016:000304 lajfasdf ashfashdfhs I need to generate a... (3 Replies)
Discussion started by: manoj.naidu
3 Replies
Login or Register to Ask a Question