awk to compare each file in two directores by storing in variable


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to compare each file in two directores by storing in variable
# 15  
Old 10-01-2016
The sample inputs and output shown in post #7 in this thread shows headers in both input files and in the output file that are not handled by the code shown in any of your posts. The output shown in post #7 does not contain any data shown in either of the input files in post #7. The format of the output shown in post #7 seems to have complete lines from the input files, but your code only copies three fields from the input files to the output file. So, with absolutely no idea what is really supposed to be done by your awk code and no data that can be used for testing, the following has undergone completely unrealistic testing, but may give you an idea of how to write a shell and awk script that grabs related files from two directories and produces related output files in a third directory.
Code:
#!/bin/ksh
IAm=${0##*/}

InDir1='/home/cmccabe/Desktop/comparison/reference/10bp'
InDir2='/home/cmccabe/Desktop/comparison/validation/files'
OutDir='/home/cmccabe/Desktop/comparison/ref_val'

cd "$InDir1"
for file1 in *.txt
do	# Grab file prefix.
	p=${file1%%_*}

	# Find matching file2.
	file2=$(printf '%s' "$InDir2/$p"_*.vcf)
	if [ ! -f "$file2" ]
	then	printf '%s: No single file matching %s found.\n' "$IAm" \
		    "$file1" >&2
		continue
	fi

	# Create matching output filename.
	out=${file2##*/}
	out=${out%.vcf}_comparison.txt

	printf '%s\t%s\t%s\n' "$InDir1/$file1" "$file2" "$OutDir/$out"
done | awk '
BEGIN {	FS = OFS = "\t"
}
{	in1 = $1
	in2 = $2
	out = $3
	print "Reading from " in1
	while((getline < in1) == 1)
		f1[$2 OFS $4 OFS $5]
	close(in1)
	print "Reading from " in2
	while((getline < in2) == 1)
		f2[$2 OFS $4 OFS $5]
	close(in2)
	print "Writing to " out
	print "Match:" > out
	for(k in f1)
		if(k in f2) {
			print k > out
			delete f1[k]
			delete f2[k]
		}
	print "Missing in Reference but found in IDP:" > out
	for(k in f2) {
		print k > out
		delete f2[k]
	}
	print "Missing in IDP but found in Reference:" > out
	for(k in f1) {
		print k > out
		delete f1[k]
	}
	close(out)
	print "***"
}'

This was written and tested using a Korn shell, but this will also work with bash or any other shell that uses Bourne shell syntax AND performs parameter expansions required by the POSIX standards.
This User Gave Thanks to Don Cragun For This Post:
# 16  
Old 10-03-2016
Thank you very much for all of your help Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Storing file contents to a variable

Hi All, I was trying a shell script. I was unable to store file contents to a variable in the script. I have tried the below but unable to do it. Input = `cat /path/op.diary` Input = $(<op.diary) I am using ksh shell. I want to store the 'op.diary' file contents to the variable 'Input'... (12 Replies)
Discussion started by: am24
12 Replies

2. Shell Programming and Scripting

Storing awk command in a variable

I'm working on a script in which gives certain details in its output depending on user-specified options. So, what I'd like to do is something like: if then awkcmd='some_awk_command' else awkcmd='some_other_awk_command' fi Then, later in the script, we'd do something like: ... (5 Replies)
Discussion started by: treesloth
5 Replies

3. Shell Programming and Scripting

Storing command output in a variable and using cut/awk

Hi, My aim is to get the md5 hash of a file and store it in a variable. var1="md5sum file1" $var1 The above outputs fine but also contains the filename, so somthing like this 243ASsf25 file1 i just need to get the first part and put it into a variable. var1="md5sum file1"... (5 Replies)
Discussion started by: JustALol
5 Replies

4. Shell Programming and Scripting

Storing multiple file paths in a variable

I am working on a script for Mac OS X that, among many other things, gets a list of all the installed Applications. I am pulling the list from the system_profiler command and formatting it using grep and awk. The problem is that I want to be able to use each result individually later in the script.... (3 Replies)
Discussion started by: cranfordio
3 Replies

5. Shell Programming and Scripting

storing a value from another file as a variable[solved]

Hi all, im having snags creating a variable which uses commands like cut and grep. In the instance below im simply trying to take a value from another file and assign it to a variable. When i do this it only prints the $a rather than the actual value. I know its simple but does anyone have any... (1 Reply)
Discussion started by: somersetdan
1 Replies

6. Shell Programming and Scripting

Storing lines of a file in a variable

i want to store the output of 'tail -5000 file' to a variable. If i want to access the contents of that variable, it becomes kinda difficult because when the data is stored in the variable, everything is mushed together. you dont know where a line begins or ends. so my question is, how can i... (3 Replies)
Discussion started by: SkySmart
3 Replies

7. Shell Programming and Scripting

Reading from a file and storing it in a variable

Hi folks, I'm using bash and would like to do the following. I would like to read some values from the file and store it in the variable and use it. My file is 1.txt and its contents are VERSION=5.6 UPDATE=4 I would like to read "5.6" and "4" and store it in a variable in shell... (6 Replies)
Discussion started by: scriptfriend
6 Replies

8. Shell Programming and Scripting

Storing the contents of a file in a variable

There is a file named file.txt whose contents are: +-----------------------------------+-----------+ | Variable_name | Value | +-----------------------------------+-----------+ | Aborted_clients | 0 | | Aborted_connects | 25683... (6 Replies)
Discussion started by: proactiveaditya
6 Replies

9. UNIX Desktop Questions & Answers

problem while storing the output of awk to variable

Hi, i have some files in one directory(say some sample dir) whose names will be like the following. some_file1.txt some_file2.txt. i need to get the last modified file size based on file name pattern like some_ here i am able to get the value of the last modified file size using the... (5 Replies)
Discussion started by: eswarreddya
5 Replies

10. Shell Programming and Scripting

storing output of awk in variable

HI I am trying to store the output of this awk command awk -F, {(if NR==2) print $1} test.sr in a variable when I am trying v= awk -F, {(if NR==2) print $1} test.sr $v = awk -F, {(if NR==2) print $1} test.sr but its not working out . Any suggestions Thanks Arif (3 Replies)
Discussion started by: mab_arif16
3 Replies
Login or Register to Ask a Question