Reducing input file size after pattern search

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Reducing input file size after pattern search
# 8  
Old 04-23-2017
With the code RudiC suggested, RS is already set to the default <newline> character.

Are there any @ characters in your input file other than the 1st character of each (multi-line) record?

Could the strings that you are searching for appear on any line other than the 2nd line in a record?
# 9  
Old 04-23-2017
Quote:
Are there any @ characters in your input file other than the 1st character of each (multi-line) record?
Yes, that's why I originally decided to use ^@M since there is only one ^@M per record -always at the beginning.
Code:
Could the strings that you are searching for appear on any line other than the 2nd line in a record?

No. The DNA sequence is always the second line of each record.
PS. I meant to say FS not RS

Last edited by Xterra; 04-23-2017 at 09:35 PM..
# 10  
Old 04-23-2017
The following was written and tested using a Korn shell, but will work with any POSIX-conforming shell. It does, however, depend on the version of awk that you are using allowing multi-character record separators. (The standard allows awk to use multi-character RS value; but only requires that awk use the 1st character of RS. The GNU awk available on most Linux systems does this, so I assume it will work on your biolinux 8 system:
Code:
#!/bin/ksh
IAm=${0##*/}
if [ $# -ne 1 ]
then	printf 'Usage: %s input_file\n' "$IAm" >&2
	exit 1
fi
file=$1

awk -v strings="string-1 string-2 string-3 string-4" '
BEGIN {	RS = "@M"
	FS = "\n"
	ns = split(strings, s, / /)
}
FNR > 1 {
	for(i = ns; i > 0; i--)
		if(index($2, s[i])) {
			cnt++
			break
		}
	c[i]++
	printf("%s%s", RS, $0) > ("MID-" i ".txt")
}
END {	printf("Total\t%d\n", cnt)
	for(i = 1; i <= ns; i++) {
		close("MID-" i ".txt")
		printf("MID-%d\t%.1f\n", i, cnt ? 100 * c[i] / cnt : 0)
	}
	if(c[0])
		printf("\n%d unmatched record%s written to MID-0.txt\n", c[0],
		    (c[0] > 1) ? "s" : "")
}' "$file"

This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 04-24-2017
Don
Thank you very much! That worked like a charm.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep/awk using a begin search pattern and end search pattern

I have this fileA TEST FILE ABC this file contains ABC; TEST FILE DGHT this file contains DGHT; TEST FILE 123 this file contains ABC, this file contains DEF, this file contains XYZ, this file contains KLM ; I want to have a fileZ that has only (begin search pattern for will be... (2 Replies)
Discussion started by: vbabz
2 Replies

2. Shell Programming and Scripting

Grep command to search pattern corresponding to input from user

One more question: I want to grep "COS_12_TM_4 pattern from a file look likes : "COS_12_TM_4" " ];I am taking scan_out as the input from the user. How to search "COS_12_TM_4" in the file which is corresponds to scan_out (12 Replies)
Discussion started by: Preeti Chandra
12 Replies

3. Shell Programming and Scripting

Search pattern in a file taking input from another file

Hi, Below is my requirement File1: svasjsdhvassdvasdhhgvasddhvasdhasdjhvasdjsahvasdjvdasjdvvsadjhv vdjvsdjasvdasdjbasdjbasdjhasbdasjhdbjheasbdasjdsajhbjasbjasbhddjb svfsdhgvfdshgvfsdhfvsdadhfvsajhvasjdhvsajhdvsadjvhasjhdvjhsadjahs File2: sdh hgv I need a command such that... (8 Replies)
Discussion started by: imrandec85
8 Replies

4. Shell Programming and Scripting

Reducing the decimal points of numbers (3d coordinates) in a file; how to input data to e.g. Python

I have a file full of coordinates of the form: 37.68899917602539 58.07500076293945 57.79100036621094 The numbers don't always have the same number of decimal points. I need to reduce the decimal points of all the numbers (there are 128 rows of 3 numbers) to 2. I have tried to do this... (2 Replies)
Discussion started by: crunchgargoyle
2 Replies

5. Shell Programming and Scripting

How to use sed to search a particular pattern in a file backward after a pattern is matched.?

Hi, I have two files file1.txt and file2.txt. Please see the attachments. In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to... (9 Replies)
Discussion started by: saurabh kumar
9 Replies

6. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

7. Shell Programming and Scripting

How to assign the Pattern Search string as Input Variable

guys, I need to know how to assing pattern matched string as an input command variable. Here it goes' My script is something like this. ./routing.sh <Server> <enable|disable> ## This Script takes an input <Server> variable from this line of the script ## echo $1 | egrep... (1 Reply)
Discussion started by: raghunsi
1 Replies

8. Solaris

reducing to root file size

My root file size has reached 80% and I am looking where all i can reduce the file size . Here is the output of top directories in / . To me none of this looks useful but not sure . We use an appplication and email. Which all can be deleted . Please advise . 2016989 989445 /var 930059 ... (2 Replies)
Discussion started by: Hitesh Shah
2 Replies

9. Programming

reducing size of executeable in C under Unix

Hi, Could any one tell me how to reduce the size of an executable file of C under Unix. thanks (2 Replies)
Discussion started by: useless79
2 Replies

10. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help? (2 Replies)
Discussion started by: frustrated1
2 Replies
Login or Register to Ask a Question