Help with generating a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with generating a script
# 8  
Old 09-14-2011
Not sure what that "invalid option" error is. What's your Unix/Linux system and version of awk? In short, what's the output of the following commands?

Code:
uname -a
uname --all
awk
awk --version

You may want to try this script:

Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done

Assuming the files "file1.txt" and "file2.txt" are tab-delimited files in the current directory, the execution of this script is as follows -

Code:
$
$
$ cat file1.txt
chr_name        chr_start       chr_end ref_base        alt_base        hom_het snp_quality     tot_depth       alt_depth       dbSNP   dbSNP131     freqlenomes
chr01   14907   14907   A       G       het     108     52      39      snp131  rs6682375       ncRNA   WASH7P  .       .       rs6682375       .    .
chr01   14930   14930   A       G       het     148     62      44      snp131  rs6682385       ncRNA   WASH7P  .       .       rs6682385       1000g0.71nov_all
chr01   761752  761752  C       T       hom     225     69      69      snp131  rs1057213       ncRNA   NCRNA00115      .       .       rs1057213    0.5442010nov_all
chr01   761800  761800  A       T       hom     42      11      11      snp131  rs1064272       ncRNA   NCRNA00115      .       .       rs1064272    0.1142010nov_all
$
$
$ cat file2.txt
chr_name        chr_start       chr_end ref_base        alt_base        hom_het snp_quality     tot_depth       alt_depth       dbSNP   dbSNP131     freqlenomes
chr01   17556   17556   C       T       het     43      30      9       .       .       ncRNA   WASH7P  .       .       .       .       .
chr01   69511   69511   A       G       hom     225     106     106     snp131  rs2691305       exonic  OR4F5   nonsynonymous   SNV     "OR4F5:NM_0010.7892010nov_all421G:p.T141A,"
chr01   761732  761732  C       T       hom     225     103     102     snp131  rs2286139       ncRNA   NCRNA00115      .       .       rs2286139    0.5372010nov_all
$
$
$ cat search.sh
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done
$
$
$ # Now run the script
$
$ . search.sh
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110914.txt
$
$ # Display the content of the output file
$
$ cat WASH7P20110914.txt
file1.txt: chr01        14907   14907   A       G       het     108     52      39      snp131  rs6682375       ncRNA   WASH7P  .       .       rs668.375
file1.txt: chr01        14930   14930   A       G       het     148     62      44      snp131  rs6682385       ncRNA   WASH7P  .       .       rs6680.71g2010nov_all
file2.txt: chr01        17556   17556   C       T       het     43      30      9       .       .       ncRNA   WASH7P  .       .       .       .    .
$
$
$

Or you could try the following script that uses Perl -

Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done

The execution of the script:

Code:
$
$
$ rm WASH7P20110914.txt
$
$ # Display the script content
$
$ cat search1.sh
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done
$
$
$ # Now run the script
$
$ . search1.sh
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110914.txt
$
$
$ # Display the content of the output file
$
$ cat WASH7P20110914.txt
file1.txt:chr01 14907   14907   A       G       het     108     52      39      snp131  rs6682375       ncRNA   WASH7P  .       .       rs6682375    .
file1.txt:chr01 14930   14930   A       G       het     148     62      44      snp131  rs6682385       ncRNA   WASH7P  .       .       rs6682385    0.71g2010nov_all
file2.txt:chr01 17556   17556   C       T       het     43      30      9       .       .       ncRNA   WASH7P  .       .       .       .       .
$
$
$

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 9  
Old 09-14-2011
I don't believe anyone has addressed this point:
Quote:
Originally Posted by kellywilliams
I have many files that are all currently in .xslx and I'm not sure if they need to be .csv or .txt for this to work... Each of these files has ~90,000 lines.

Kelly
Kelly, to use durden_tyler's solution you do need to export the files into tab-delimited text files. I assume you mean excel spreadsheet files (xlsx). The xlsx format is a proprietry binary format (probably a zipped xml document now but still in a proprietry format).

Andrew
# 10  
Old 09-14-2011
To durden_tyler

Hi Tyler_Durden,

Thank you for your help, unfortunately the script is still not working. I have tried it on the two computers in my laboratory running linux. Here is the command output you suggested from computer 1 (via Terminal on a MacBook Pro):
Code:
$ uname -a
Darwin anzac-172-16-75-136.anzac.edu.au 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
$ uname --all
uname: illegal option -- -
usage: uname [-amnprsv]
$ awk
usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]
$ awk --version
awk version 20070501

And computer2 (running RedHat):
Code:
$ uname -a
Linux neuro.anzac.edu.au 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39
EST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ uname --all
Linux neuro.anzac.edu.au 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39
EST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ awk
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options:
       -f progfile             --file=progfile
       -F fs                   --field-separator=fs
       -v var=val              --assign=var=val
       -m[fr] val
       -W compat               --compat
       -W copyleft             --copyleft
       -W copyright            --copyright
       -W dump-variables[=file]        --dump-variables[=file]
       -W exec=file            --exec=file
       -W gen-po               --gen-po
       -W help                 --help
       -W lint[=fatal]         --lint[=fatal]
       -W lint-old             --lint-old
       -W non-decimal-data     --non-decimal-data
       -W profile[=file]       --profile[=file]
       -W posix                --posix
       -W re-interval          --re-interval
       -W source=program-text  --source=program-text
       -W traditional          --traditional
       -W usage                --usage
       -W version              --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
       gawk '{ sum += $1 }; END { print sum }' file
       gawk -F: '{ print $1 }' /etc/passwd
$ awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.

So when I ran your first script on computer1
Quote:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
awk -vNAME=$GENE '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done
I got the following output again
Code:
$ ./3SNPs_in_gene.sh 
Please input gene name you wish to look for
WASH7P
The output file is WASH7P20110915.txt
awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

awk: invalid -v option

$

And when I ran script 2 on computer 1:
Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  perl -F"\t" -lane "print \"\$ARGV:\$_\" if \$F[12] eq $GENE" $file >> $OUT
done

The WASH7P20110915.txt file was empty.

Similarly, when I ran both scripts on computer 2, the WASH7P20110915.txt file was empty.

If you could help that would be great - thank you so much already for your help. Also, when the script is looking in *.txt, will that include looking in the $OUT.txt file?

Kelly
# 11  
Old 09-14-2011
There should be a space between "-v" and NAME=$var.

You should also be quoting it so it doesn't split on spaces.

So:
Code:
awk -v NAME="${VAR}"

# 12  
Old 09-14-2011
Code:
awk -v pattern="WASH7P" '$13 ~ pattern {print FILENAME":"$0}' file1.txt file2.txt > out.txt

If you are using solaris, use nawk

--ahamed

---------- Post updated at 03:50 PM ---------- Previous update was at 03:41 PM ----------

or

Code:
grep WASH7P file1.txt file2.txt >> out.txt

--ahamed
# 13  
Old 09-14-2011
Thank you Corona688, this stopped the -v invalid option, but the output file is still empty.

This is what I am using, but I am still getting an empty output file.

Code:
echo Please input gene name you wish to look for
read GENE
OUT="$GENE$(date '+%Y%m%d').txt"
echo "The output file is $OUT"
for file in *.txt
do
  awk -v NAME="$GENE" '$13 ~ NAME { print FILENAME": "$0 }' $file >> $OUT
done

I have attached 2 example files. A GENE that is in both (thus will give an output) is SOX13.

Many thanks,

Kelly
# 14  
Old 09-14-2011
I think there is some issue with file type

Code:
#your original file showed this and a normal grep SOX was not working on this
root@bt:/tmp# file file1.txt 
file1.txt: Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators

#then I opened it in gedit and saved it once again with Character Encoding as "Current Locale UTF-8" and then it started working.
root@bt:/tmp# gedit file1.txt 
root@bt:/tmp# file file1.txt 
file1.txt: ASCII text, with very long lines, with CRLF line terminators

file2.txt has just one single line??

--ahamed
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Is there a way to handle commas inside the data when generating a csv file from shell script?

I am extracting data via sql query and some of the data has commas. Output File must be csv and I cannot update the data in the db (as it is used by other application). Example table FavoriteThings Person VARCHAR2(25), Favorite VARCHAR2(100) Sample Data Greta rain drop on... (12 Replies)
Discussion started by: patk625
12 Replies

2. Shell Programming and Scripting

Random number generating script?

Having a hard time with this. Very new to scripting and linux. Spent all sunday trying to do this. Appreciate some help and maybe help breaking down what the syntax does. Create a Bash program. It should have the following properties • Creates a secret number between 1 and 100 i. The... (3 Replies)
Discussion started by: LINUXnoob15
3 Replies

3. Shell Programming and Scripting

Help with ahem Prime number Generating Script

Can anybody tell me why the second part of this script (Sieve of Eratosthenes) isn't working properly. This isnt coursework or homework just private studies ( Yes Project Euler began it ) I know there are easier ways of doing this too but I want to do it this way.:p Iam using Cygwin on Vista... (3 Replies)
Discussion started by: drewann
3 Replies

4. Shell Programming and Scripting

auto-generating assembly code by variables found by script

Hi everybody I'm working on a list of registers(flip-flops to be exact), now i need to extract some value from this list and use them as arguments to pass them to some assembly code for example i have: 118 chain79 MASTER (FF-LE) FFFF 1975829 /TCK F FD1TQHVTT1 ... (1 Reply)
Discussion started by: Behrouzx77
1 Replies

5. Shell Programming and Scripting

Converting date/time and generating offsets in bash script

Hi all, I need a script to do some date/time conversion. It should take as an input a particular time. It should then generates a series of offsets, in both hour:minute form and number of milliseconds elapsed. For 03:00, for example, it should give back 04:02:07 (3727000ms*) 05:04:14... (2 Replies)
Discussion started by: emdan
2 Replies

6. Shell Programming and Scripting

Help generating a script for next-generation sequencing data

I am not sure if this is entirely possible, but I want to compare data in a particular column in several .txt files and have a new file generated. I am a biologist with limited unix knowledge. There are currently no programs written for this type of analysis. First I would like to define the... (1 Reply)
Discussion started by: kellywilliams
1 Replies

7. Shell Programming and Scripting

Problem with script generating files in directory recursively

I have a script which generates recursively some files in folders for a given root folder. I have checks for permissions and it works for all folders except one(i have 777 permission on it). When i try calling the script in problematic folder(problematic folder being root folder), script works as... (2 Replies)
Discussion started by: bb2
2 Replies

8. UNIX for Dummies Questions & Answers

A shell script or software for generating random passwords

Hi, Is there an shell script/batch file to genarate random passwords which expires after a stipulated time period? Please suggest a software which does this for AIX and windows both else. Thanks. (5 Replies)
Discussion started by: dwiravi
5 Replies

9. Shell Programming and Scripting

Generating millions of record using shell script

Hi All, My requirement is like this. I want to generate records of 1 million lines. If I say lines it means one line will contain some string or numbers like AA,3,4,45,+223424234,Tets,Ghdj,+33434,345453434,........................ upto length lets say 41. ( 41 comma sepearted aplha numneric... (2 Replies)
Discussion started by: Rahil2k9
2 Replies

10. Shell Programming and Scripting

Awk Script for generating a report

Hi all, I have a log file of the below format. 20081016:000042 asdflasjdf asljfljs asdflasjf safjl 20081016:000229 /lask/ajlsdf/askdfjsa 20081016:000229 /lashflas /askdfaslj hsfhsahf 20081016:000304 lasflasj ashfashd 20081016:000304 lajfasdf ashfashdfhs I need to generate a... (3 Replies)
Discussion started by: manoj.naidu
3 Replies
Login or Register to Ask a Question