awk to compare each file in two directores by storing in variable


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to compare each file in two directores by storing in variable
# 8  
Old 10-01-2016
What output file is (or output files are) supposed to be created?

It looks like you want to run your awk script 250 times each with a pair of input files, but the output from each of those 250 runs goes to a single output file and that single output file is overwritten (not appended to) each time your awk script is run.

Is the intent to create 250 different output files with a name corresponding to the names of the input files, or do you want one output file containing the concatenated contents of the 250 individual file comparisons? If it is one output file that you want, does there need to be some header to each section of the output specifying the input files processed to produce the following section of data in that output file? And, if so, what is the format of that header?
This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 10-01-2016
The awk does run on pairs and after running the awk is to create 250 different output files with a name corresponding to the names of the input files. Thank you Smilie.

Last edited by cmccabe; 10-01-2016 at 11:15 AM.. Reason: added details
# 10  
Old 10-01-2016
Quote:
Originally Posted by cmccabe
The awk does run on pairs and after running the awk is to create 250 different output files with a name corresponding to the names of the input files. Thank you Smilie.
And what are the names of those output files supposed to be???
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 10-01-2016
If the below are the files being used in the awk, comparing F13_ref_FP_10bp.txt to F13_epilepsy.vcf then the output should be F13_epilepsy_comparison.txt.

H19_ref_FP_10bp.txt to H19_marfan.vcf then the output should be H19_marfan_comparison.txt. The input into the awk and the output is tab delimited. Thank you Smilie.

REF
Code:
F13_ref_FP_10bp.txt
H19_ref_FP_10bp.txt

VAL
Code:
F13_epilepsy.vcf
H19_marfan.vcf

# 12  
Old 10-01-2016
If we go back to post #3 in this thread, you said that you have a file named out containing pairs of input filenames:
Code:
F13_ref_FP_10bp.txt F13_epilepsy.vcf
M29_ref_FP_10bp.txt M29_epilepsy
H68_ref_FP_10bp.txt H68_marfan.vcf
T42_ref_FP_10bp.txt T42_epilepsy.vcf
H19_ref_FP_10bp.txt H19_marfan.vcf
T48_ref_FP_10bp.txt T48_marfan.vcf

and from post #1 we have pairs of filenames:
Code:
S1234_ref.txt S1234_panel.vcf
A5678_ref.txt A5678_panel.vcf
T1111_ref.txt T1111_panel.vcf

From these examples am I correct in assuming that we can skip creation of the out file and just look at /home/cmccabe/Desktop/comparison/validation/files/*_*.txt files and know that there will be a corresponding file in /home/cmccabe/Desktop/comparison/validation/files/ with a name that has the same unique string before the first underscore character and ending in the string .vcf? Note that the name from post #3 quoted above marked in red does not end in .vcf. Was that a typo, or do some names in that directory not end in .vcf?

Is the assumption that there is only one file with the string before the first underscore in both of those directories correct? Or, do you want a script that depends on you creating a file named /home/cmccabe/Desktop/comparison/ref_val/out that contains lines containing three values (file1 filename, file2 filename, and output file filename)?

Is it OK to use an awk script that processes all 250 sets of input files in one invocation instead of invoking awk 250 times?
This User Gave Thanks to Don Cragun For This Post:
# 13  
Old 10-01-2016
All of the REF files end in .txt and are located in a folder at /home/cmccabe/Desktop/comparison/reference/10bp.

All of the VAL files end in .vcf and are located in a folder at /home/cmccabe/Desktop/comparison/validation/files. I did have a typo in post #3.

Quote:
Is the assumption that there is only one file with the string before the first underscore in both of those directories correct?
Yes, there will only be one file in each separate directory with the string before the first _. So there is no need for an out file other then to know which samples were processed.

Quote:
Is it OK to use an awk script that processes all 250 sets of input files in one invocation instead of invoking awk 250 times?
There may not always be 250 sets of input, that # is variable, but yes all of them can be processed at once rather than each set individually. is that what you mean? Thank you Smilie.

Last edited by cmccabe; 10-01-2016 at 02:16 PM.. Reason: added details
# 14  
Old 10-01-2016
Quote:
Originally Posted by cmccabe
.
.
.
it looks like the script reads all the vcf files from REF and puts them in a variable FN.
No, unless the statement on REF's contents in post#1 was not true. Given it IS true, FN assumes three file names ending in .txt

Quote:
How do the txt files from VAL get used by the awk.
They are not. The proposal assumes that for every .txt- file name's prefix ID a respective .vcf file exists, possibly in another path (as mentioned in my post).

Quote:
The awk looks at each REF file and compares it to each VAL file looking for what’s common and what’s different. If a difference is found it identifies which file the missing data came from.
.
.
.
The operation of the awk script is not the topic of this thread, nor is the desired output.

Please! check if any of the proposals hitherto provide the needed input file pairs, might be adaptable to also provide the needed output file name, and comment on their aptitude.

And, please please please, get some structure into your future requests and relieve us from guessing!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Storing file contents to a variable

Hi All, I was trying a shell script. I was unable to store file contents to a variable in the script. I have tried the below but unable to do it. Input = `cat /path/op.diary` Input = $(<op.diary) I am using ksh shell. I want to store the 'op.diary' file contents to the variable 'Input'... (12 Replies)
Discussion started by: am24
12 Replies

2. Shell Programming and Scripting

Storing awk command in a variable

I'm working on a script in which gives certain details in its output depending on user-specified options. So, what I'd like to do is something like: if then awkcmd='some_awk_command' else awkcmd='some_other_awk_command' fi Then, later in the script, we'd do something like: ... (5 Replies)
Discussion started by: treesloth
5 Replies

3. Shell Programming and Scripting

Storing command output in a variable and using cut/awk

Hi, My aim is to get the md5 hash of a file and store it in a variable. var1="md5sum file1" $var1 The above outputs fine but also contains the filename, so somthing like this 243ASsf25 file1 i just need to get the first part and put it into a variable. var1="md5sum file1"... (5 Replies)
Discussion started by: JustALol
5 Replies

4. Shell Programming and Scripting

Storing multiple file paths in a variable

I am working on a script for Mac OS X that, among many other things, gets a list of all the installed Applications. I am pulling the list from the system_profiler command and formatting it using grep and awk. The problem is that I want to be able to use each result individually later in the script.... (3 Replies)
Discussion started by: cranfordio
3 Replies

5. Shell Programming and Scripting

storing a value from another file as a variable[solved]

Hi all, im having snags creating a variable which uses commands like cut and grep. In the instance below im simply trying to take a value from another file and assign it to a variable. When i do this it only prints the $a rather than the actual value. I know its simple but does anyone have any... (1 Reply)
Discussion started by: somersetdan
1 Replies

6. Shell Programming and Scripting

Storing lines of a file in a variable

i want to store the output of 'tail -5000 file' to a variable. If i want to access the contents of that variable, it becomes kinda difficult because when the data is stored in the variable, everything is mushed together. you dont know where a line begins or ends. so my question is, how can i... (3 Replies)
Discussion started by: SkySmart
3 Replies

7. Shell Programming and Scripting

Reading from a file and storing it in a variable

Hi folks, I'm using bash and would like to do the following. I would like to read some values from the file and store it in the variable and use it. My file is 1.txt and its contents are VERSION=5.6 UPDATE=4 I would like to read "5.6" and "4" and store it in a variable in shell... (6 Replies)
Discussion started by: scriptfriend
6 Replies

8. Shell Programming and Scripting

Storing the contents of a file in a variable

There is a file named file.txt whose contents are: +-----------------------------------+-----------+ | Variable_name | Value | +-----------------------------------+-----------+ | Aborted_clients | 0 | | Aborted_connects | 25683... (6 Replies)
Discussion started by: proactiveaditya
6 Replies

9. UNIX Desktop Questions & Answers

problem while storing the output of awk to variable

Hi, i have some files in one directory(say some sample dir) whose names will be like the following. some_file1.txt some_file2.txt. i need to get the last modified file size based on file name pattern like some_ here i am able to get the value of the last modified file size using the... (5 Replies)
Discussion started by: eswarreddya
5 Replies

10. Shell Programming and Scripting

storing output of awk in variable

HI I am trying to store the output of this awk command awk -F, {(if NR==2) print $1} test.sr in a variable when I am trying v= awk -F, {(if NR==2) print $1} test.sr $v = awk -F, {(if NR==2) print $1} test.sr but its not working out . Any suggestions Thanks Arif (3 Replies)
Discussion started by: mab_arif16
3 Replies
Login or Register to Ask a Question