When it comes to programing and UNIX, I know just enough to be really really dangerous.
I have written a python script to parse through a file that contains ~1 million lines. Depending on whether a certain string is matched, the line is copied into a particular file. For the sake of brevity, the lines are something like this:
Quote:
ABC-1
ABC-1
CCC-33
CCC-33
CCC-33
...
I tried the python code out on a small file, and everything seems to work. However, since the actual file is a massive, I want to double check it with grep to make sure that the total number of ABC-1's in file x is the same number of ABC-1's in file y.
On the command line, I wrote a simple script that will check this for me.
This seems to work just fine.
Problem: The contents of the original file are copied into 10 other files. I want check each file AND since there are ~50 unique strings (i.e. ABC-1), I would like to check for each string. Writing the simple script ~500 times is tedious.
I wrote a bash script but when I execute the file from the command line (
Quote:
bash counts.sh
), I get an error saying wc is an illegal option and that it cannot be found.
Ideally, I think I should make a vector/list/array of file names and a vector/list/array of searchable strings and use a loop that will print out the string, the filename, and the number of times the string occurs in the file...but I don't know how to do that.
So if anyone knows how to re-arrange my 1-liner script - thank you. If anyone can help me with writing a loop script - thank you. Either option would be awesome.
Last edited by errcricket; 10-17-2011 at 12:18 PM..
Reason: better title
Hrm. I dont understand why the bash script thinks wc is an option of grep, but really, this is the wrong approach:
If the file is sorted, you can do:
If it's not sorted, you can sort it just with sort. If you run out of memory, you can split the file into n files, sort each individually and then merge them with sort (see the man page for the merge option). Then the resulting file will be sorted and you just use the uniq command, above.
But lets say you just want to count the number of lines matching a particular string. Try:
The -F ensures your string wont be interpreted as a regular expression pattern.
Thank you otheus. Using your script from the command line works too. As for running the file from the command line...it does not work, but I suspect it has something to do with the execution path. Regardless, I think I am on the correct $Path to solving this problem.
I am wondering if there is a script (if one exists, not confident in my own scripting ability) that is able to bring up specified information from the /var/log/messages. I need to show logged traffic on specific dates and times and protocols (ie. Show all insecure FTP traffic (most likely via... (13 Replies)
Dear all,
Please help with the following.
I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this:
rs1234 A C
rs1236 T G
rs2345 G T
Please use code tags as required by forum rules!
I have a second file, called reference.txt,... (1 Reply)
Hi Experts,
I'm writing script to find out last files and its modified date - unfortunately am having problem with the below script.
Error message:
"grep: sales.txt: No such file or directory"
#!/bin/bash
var=1
var1=`awk '{n++} END {print n}' sales.txt`
while ]
do
prod=$var... (6 Replies)
Hello,
I am trying to create a matrix of 0's and 1's depending on whether a gene and sample name are found in the same line in a file called results.txt. An example of the results.txt file is (tab-delimited):
Sample1 Gene1 ## Gene2 ##
Sample2 Gene2 ## Gene 4 ##
Sample3 Gene3 ... (2 Replies)
Hi guys!
I'm new to the forum and to the Bash coding scene.
I have the following code
paths=/test/a
paths=/test/b
keywords=\"*car*\"
keywords=\"*food*\"
for file in `find paths -type f -ctime -1 -name keywords -print 2>/dev/null`
do
#.... do stuff here for every $file found... (5 Replies)
Hi,
I'm trying to write a script that checks gvfs to see if a mount exists so I can run it from network-manager's status hooks. I thought I'd pipe the output of gvfs-mount -l to grep for the particular mounts I care about. When I do this in a bash script:
cmnd="gvfs-mount -l | grep -i... (4 Replies)
I'm putting together a script that will search my mail archives for emails that meet certain criteria and output the files to a text file.
I can manually cat that text file and pipe it into sendmail and it will work (i.e. cat /pathtofile/foo.txt | sendmail -t me@company.com)
My script sends... (7 Replies)
Hi,
I am very new to bash scripting and I need to write a bash script that takes two arguments, a string and a file. The output should be each line which matches the string *from the beginning of the line*. For example, given a string "ANA" the line starting with "ANABEL" will be printed, but... (9 Replies)
I have written a script and I get error and I don't understand why.
neededParameters=2
numOfParameters=0
correctNum=0
while getopts "s:l:" opt
do
case "$opt" in
s)
serviceName= $OPTARG #errorline 1
numOfParameters= $numOfParameters + 1
;;
l)
... (12 Replies)
I'm trying to make a simple search script but cannot get it right. The script should search for keywords inside files. Then return the file paths in a variable. (Each file path separated with \n).
#!/bin/bash
SEARCHQUERY="searchword1 searchword2 searchword3";
for WORD in $SEARCHQUERY
do
... (6 Replies)