Not sure what that "invalid option" error is. What's your Unix/Linux system and version of awk? In short, what's the output of the following commands?
Code:
uname -auname --allawkawk --version
You may want to try this script:
Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done
Assuming the files "file1.txt" and "file2.txt" are tab-delimited files in the current directory, the execution of this script is as follows -
Code:
$
$
$ cat file1.txt
chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth alt_depth dbSNP dbSNP131 freqlenomes
chr01 14907 14907 A G het 108 52 39 snp131 rs6682375 ncRNA WASH7P . . rs6682375 . .
chr01 14930 14930 A G het 148 62 44 snp131 rs6682385 ncRNA WASH7P . . rs6682385 1000g0.71nov_all
chr01 761752 761752 C T hom 225 69 69 snp131 rs1057213 ncRNA NCRNA00115 . . rs1057213 0.5442010nov_all
chr01 761800 761800 A T hom 42 11 11 snp131 rs1064272 ncRNA NCRNA00115 . . rs1064272 0.1142010nov_all
$
$
$ cat file2.txt
chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth alt_depth dbSNP dbSNP131 freqlenomes
chr01 17556 17556 C T het 43 30 9 . . ncRNA WASH7P . . . . .
chr01 69511 69511 A G hom 225 106 106 snp131 rs2691305 exonic OR4F5 nonsynonymous SNV "OR4F5:NM_0010.7892010nov_all421G:p.T141A,"
chr01 761732 761732 C T hom 225 103 102 snp131 rs2286139 ncRNA NCRNA00115 . . rs2286139 0.5372010nov_all
$
$
$ cat search.sh
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done$
$
$ # Now run the script
$
$ . search.sh
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110914.txt
$
$ # Display the content of the output file
$
$ cat WASH7P20110914.txt
file1.txt: chr01 14907 14907 A G het 108 52 39 snp131 rs6682375 ncRNA WASH7P . . rs668.375
file1.txt: chr01 14930 14930 A G het 148 62 44 snp131 rs6682385 ncRNA WASH7P . . rs6680.71g2010nov_all
file2.txt: chr01 17556 17556 C T het 43 30 9 . . ncRNA WASH7P . . . . .
$
$
$
Or you could try the following script that uses Perl -
Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done
The execution of the script:
Code:
$
$
$ rm WASH7P20110914.txt
$
$ # Display the script content
$
$ cat search1.sh
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done$
$
$ # Now run the script
$
$ . search1.sh
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110914.txt
$
$
$ # Display the content of the output file
$
$ cat WASH7P20110914.txt
file1.txt:chr01 14907 14907 A G het 108 52 39 snp131 rs6682375 ncRNA WASH7P . . rs6682375 .
file1.txt:chr01 14930 14930 A G het 148 62 44 snp131 rs6682385 ncRNA WASH7P . . rs6682385 0.71g2010nov_all
file2.txt:chr01 17556 17556 C T het 43 30 9 . . ncRNA WASH7P . . . . .
$
$
$
tyler_durden
This User Gave Thanks to durden_tyler For This Post:
I have many files that are all currently in .xslx and I'm not sure if they need to be .csv or .txt for this to work... Each of these files has ~90,000 lines.
Kelly
Kelly, to use durden_tyler's solution you do need to export the files into tab-delimited text files. I assume you mean excel spreadsheet files (xlsx). The xlsx format is a proprietry binary format (probably a zipped xml document now but still in a proprietry format).
Thank you for your help, unfortunately the script is still not working. I have tried it on the two computers in my laboratory running linux. Here is the command output you suggested from computer 1 (via Terminal on a MacBook Pro):
Code:
$ uname -a
Darwin anzac-172-16-75-136.anzac.edu.au 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
$ uname --all
uname: illegal option -- -
usage: uname [-amnprsv]
$ awk
usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]
$ awk --version
awk version 20070501
And computer2 (running RedHat):
Code:
$ uname -a
Linux neuro.anzac.edu.au 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39
EST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ uname --all
Linux neuro.anzac.edu.au 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39
EST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ awk
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options:
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
-m[fr] val
-W compat --compat
-W copyleft --copyleft
-W copyright --copyright
-W dump-variables[=file] --dump-variables[=file]
-W exec=file --exec=file
-W gen-po --gen-po
-W help --help
-W lint[=fatal] --lint[=fatal]
-W lint-old --lint-old
-W non-decimal-data --non-decimal-data
-W profile[=file] --profile[=file]
-W posix --posix
-W re-interval --re-interval
-W source=program-text --source=program-text
-W traditional --traditional
-W usage --usage
-W version --version
To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.
gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.
Examples:
gawk '{ sum += $1 }; END { print sum }' file
gawk -F: '{ print $1 }' /etc/passwd
$ awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
So when I ran your first script on computer1
Quote:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done
I got the following output again
Code:
$ ./3SNPs_in_gene.sh
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110915.txt
awk: invalid -v option
awk: invalid -v option
awk: invalid -v option
awk: invalid -v option
awk: invalid -v option
awk: invalid -v option
awk: invalid -v option
$
And when I ran script 2 on computer 1:
Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done
The WASH7P20110915.txt file was empty.
Similarly, when I ran both scripts on computer 2, the WASH7P20110915.txt file was empty.
If you could help that would be great - thank you so much already for your help. Also, when the script is looking in *.txt, will that include looking in the $OUT.txt file?
Thank you Corona688, this stopped the -v invalid option, but the output file is still empty.
This is what I am using, but I am still getting an empty output file.
Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
awk -v NAME="$GENE" '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done
I have attached 2 example files. A GENE that is in both (thus will give an output) is SOX13.
#your original file showed this and a normal grep SOX was not working on this
root@bt:/tmp# file file1.txt
file1.txt: Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators
#then I opened it in gedit and saved it once again with Character Encoding as "Current Locale UTF-8" and then it started working.
root@bt:/tmp# gedit file1.txt
root@bt:/tmp# file file1.txt
file1.txt: ASCII text, with very long lines, with CRLF line terminators
I am extracting data via sql query and some of the data has commas. Output File must be csv and I cannot update the data in the db (as it is used by other application).
Example
table FavoriteThings
Person VARCHAR2(25),
Favorite VARCHAR2(100)
Sample Data
Greta rain drop on... (12 Replies)
Having a hard time with this. Very new to scripting and linux. Spent all sunday trying to do this. Appreciate some help and maybe help breaking down what the syntax does.
Create a Bash program. It should have the following properties
• Creates a secret number between 1 and 100
i. The... (3 Replies)
Can anybody tell me why the second part of this script (Sieve of Eratosthenes) isn't working properly. This isnt coursework or homework just private studies ( Yes Project Euler began it ) I know there are easier ways of doing this too but I want to do it this way.:p
Iam using Cygwin on Vista... (3 Replies)
Hi everybody
I'm working on a list of registers(flip-flops to be exact), now i need to extract some value from this list and use them as arguments to pass them to some assembly code
for example i have:
118 chain79 MASTER (FF-LE) FFFF 1975829 /TCK F FD1TQHVTT1 ... (1 Reply)
Hi all,
I need a script to do some date/time conversion. It should take as an input a particular time. It should then generates a series of offsets, in both hour:minute form and number of milliseconds elapsed.
For 03:00, for example, it should give back 04:02:07 (3727000ms*) 05:04:14... (2 Replies)
I am not sure if this is entirely possible, but I want to compare data in a particular column in several .txt files and have a new file generated. I am a biologist with limited unix knowledge. There are currently no programs written for this type of analysis.
First I would like to define the... (1 Reply)
I have a script which generates recursively some files in folders for a given root folder.
I have checks for permissions and it works for all folders except one(i have 777 permission on it). When i try calling the script in problematic folder(problematic folder being root folder), script works as... (2 Replies)
Hi,
Is there an shell script/batch file to genarate random passwords which expires after a stipulated time period? Please suggest a software which does this for AIX and windows both else.
Thanks. (5 Replies)
Hi All,
My requirement is like this.
I want to generate records of 1 million lines. If I say lines it means one line will contain some string or numbers like
AA,3,4,45,+223424234,Tets,Ghdj,+33434,345453434,........................ upto length lets say 41. ( 41 comma sepearted aplha numneric... (2 Replies)
Hi all,
I have a log file of the below format.
20081016:000042 asdflasjdf asljfljs asdflasjf safjl
20081016:000229 /lask/ajlsdf/askdfjsa
20081016:000229 /lashflas /askdfaslj hsfhsahf
20081016:000304 lasflasj ashfashd
20081016:000304 lajfasdf ashfashdfhs
I need to generate a... (3 Replies)