I am a biologist who is new to linux and am having difficulty generating a script to do what I want it to do! I have tried basic grep commands, but even that does not give me back the data I want.
I have many files that are all currently in .xslx and I'm not sure if they need to be .csv or .txt for this to work... Each of these files has ~90,000 lines.
Basically I want a script that will ask me what I am looking for:
Code:
echo Please input gene name you wish to look for
read GeneName
Then I want to look through all of the lines in all of my files (~200) and see if $GeneName is present in field 13 (it may be present multiple times in the one file, or not at all).
If it is present in field 13, I want the entire line/s it is in, plus the file name/s it is found in to be printed to a .txt file.
I have tried using grep command and it just gives me back all lines from the file, not the lines containing $GeneName. I apologise I do not have a better unix background, but I have been trying for days and I just cannot generate it!!! I would appreciate any help.
There is a heading line at the top of each file, and each subsequent line starts with chr. For this example, I would be wanting to identify the lines containing WASH7P in file1.txt and file2.txt (or if this would work with .csv files?)
file1.txt
Code:
chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth alt_depth dbSNP dbSNP131 region gene change annotation dbSNP132 1000genomes allele freq
chr01 14907 14907 A G het 108 52 39 snp131 rs6682375 ncRNA WASH7P . . rs6682375 . .
chr01 14930 14930 A G het 148 62 44 snp131 rs6682385 ncRNA WASH7P . . rs6682385 1000g2010nov_all 0.71
chr01 761752 761752 C T hom 225 69 69 snp131 rs1057213 ncRNA NCRNA00115 . . rs1057213 1000g2010nov_all 0.544
chr01 761800 761800 A T hom 42 11 11 snp131 rs1064272 ncRNA NCRNA00115 . . rs1064272 1000g2010nov_all 0.114
file2.txt
Code:
chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth alt_depth dbSNP dbSNP131 region gene change annotation dbSNP132 1000genomes allele freq
chr01 17556 17556 C T het 43 30 9 . . ncRNA WASH7P . . . . .
chr01 69511 69511 A G hom 225 106 106 snp131 rs2691305 exonic OR4F5 nonsynonymous SNV "OR4F5:NM_001005484:exon1:c.A421G:p.T141A," rs2691305 1000g2010nov_all 0.789
chr01 761732 761732 C T hom 225 103 102 snp131 rs2286139 ncRNA NCRNA00115 . . rs2286139 1000g2010nov_all 0.537
I would like be asked to type in GeneName (i.e. WASH7P) and the output in a .txt file to be something like:
Code:
file1.txt:chr01 14907 14907 A G het 108 52 39 snp131 rs6682375 ncRNA WASH7P . . rs6682375 . .
file1.txt:chr01 14930 14930 A G het 148 62 44 snp131 rs6682385 ncRNA WASH7P . . rs6682385 1000g2010nov_all 0.71
file2.txt:chr01 17556 17556 C T het 43 30 9 . . ncRNA WASH7P . . . . .
Many thanks,
Kelly
Last edited by Franklin52; 09-14-2011 at 05:00 AM..
Reason: Please use code tags for code and data samples, thank you
Hi Kelly,
Go through this and let me know if there's any problem. Use tags while posting your queries. The code below is working fine for me and giving the expected output.
Code:
echo "the text"
read Genename
grep $Genename file1.txt > somefile.txt
You will need to convert to txt or csv first as awk/grep require text files not binary.
Try this:
Code:
echo Please input gene name you wish to look for
read GeneName
for file in *.txt
do
awk -v F=$file -vN=$GeneName '$13 ~ N { print F": "$0 }' $file
done
This User Gave Thanks to Chubler_XL For This Post:
I am extracting data via sql query and some of the data has commas. Output File must be csv and I cannot update the data in the db (as it is used by other application).
Example
table FavoriteThings
Person VARCHAR2(25),
Favorite VARCHAR2(100)
Sample Data
Greta rain drop on... (12 Replies)
Having a hard time with this. Very new to scripting and linux. Spent all sunday trying to do this. Appreciate some help and maybe help breaking down what the syntax does.
Create a Bash program. It should have the following properties
• Creates a secret number between 1 and 100
i. The... (3 Replies)
Can anybody tell me why the second part of this script (Sieve of Eratosthenes) isn't working properly. This isnt coursework or homework just private studies ( Yes Project Euler began it ) I know there are easier ways of doing this too but I want to do it this way.:p
Iam using Cygwin on Vista... (3 Replies)
Hi everybody
I'm working on a list of registers(flip-flops to be exact), now i need to extract some value from this list and use them as arguments to pass them to some assembly code
for example i have:
118 chain79 MASTER (FF-LE) FFFF 1975829 /TCK F FD1TQHVTT1 ... (1 Reply)
Hi all,
I need a script to do some date/time conversion. It should take as an input a particular time. It should then generates a series of offsets, in both hour:minute form and number of milliseconds elapsed.
For 03:00, for example, it should give back 04:02:07 (3727000ms*) 05:04:14... (2 Replies)
I am not sure if this is entirely possible, but I want to compare data in a particular column in several .txt files and have a new file generated. I am a biologist with limited unix knowledge. There are currently no programs written for this type of analysis.
First I would like to define the... (1 Reply)
I have a script which generates recursively some files in folders for a given root folder.
I have checks for permissions and it works for all folders except one(i have 777 permission on it). When i try calling the script in problematic folder(problematic folder being root folder), script works as... (2 Replies)
Hi,
Is there an shell script/batch file to genarate random passwords which expires after a stipulated time period? Please suggest a software which does this for AIX and windows both else.
Thanks. (5 Replies)
Hi All,
My requirement is like this.
I want to generate records of 1 million lines. If I say lines it means one line will contain some string or numbers like
AA,3,4,45,+223424234,Tets,Ghdj,+33434,345453434,........................ upto length lets say 41. ( 41 comma sepearted aplha numneric... (2 Replies)
Hi all,
I have a log file of the below format.
20081016:000042 asdflasjdf asljfljs asdflasjf safjl
20081016:000229 /lask/ajlsdf/askdfjsa
20081016:000229 /lashflas /askdfaslj hsfhsahf
20081016:000304 lasflasj ashfashd
20081016:000304 lajfasdf ashfashdfhs
I need to generate a... (3 Replies)