awk to create subdirectory based on match between two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to create subdirectory based on match between two files
# 1  
Old 03-18-2019
awk to create subdirectory based on match between two files

In the below awk I am trying to mkdir based of an exact match between file2 line starting with R_2019.... and file1 line starting with R_2019. When a match is found there is a folder located at /home/cmccabe/run with the same name as the match where each $2 in file1 is a new subdirectory in that folder. There will always be a match to an R_2019...., but there may be more then one. That is there may be 2 or 3 R-2019.... that have matches but they will always be unique. The awk as is does execute but produces nothing so I tried adding cmd_fmt='mkdir -p "%s/%s" to store each new subdirectory in cmd_fmt. Then added -v cmd_fmt="$cmd_fmt" to the start of the awk to create the matched sub-directory but that did not work as expected. I am using ubuntu 14.04 and added comments. Any line in file1 that has Negative in it can be skipped as well and does not need a sub-directory created. Thank you Smilie.



awk
Code:
awk '
    # create an associative array (key/value pairs) based on the file1
    NR==FNR { for(i=2; i<NF; i+=2) a[substr($i,1,7)] = $NF; next } 

    # retrieve the first 7-char of each line in file2 as the key to test against the above hash
    { k = substr($0, 1, 7) }

    # if find k, then print
    k in a { print a[k] "\t" $0 "\t" l }

    # save prev line to 'l' which is the ID
    { l = $0  } 

' RS= file1 RS='\n' file2

file1
Code:
IonCode_0267 Negative_water
IonCode_0255 19-0000-LastName-FirstName
IonCode_xxxx 19-0002-L-F
IonCode_xxxx 19-0003-LaNa-FiNa
IonCode_xxxx 19-0004-La-Fi
IonCode_xxxx Control-Positive-0318
R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

file2
Code:
R_2019_02_15_11_56_40_user_S5-0000-00-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
R_2019_03_12_11_10_20_user_S5-0000-01-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions


Last edited by cmccabe; 03-18-2019 at 09:15 PM.. Reason: fixed format
# 2  
Old 03-18-2019
What I did was add a debug line to your array build code (shown in red below):

Code:
awk '
    # create an associative array (key/value pairs) based on the file1
    NR==FNR { for(i=2; i<NF; i+=2) {
        a[substr($i,1,7)] = $NF
        print "a[" substr($i,1,7)"] = " $NF
    }
    next } 

    # retrieve the first 7-char of each line in file2 as the key to test against the above hash
    { k = substr($0, 1, 7) }

    # if find k, then print
    k in a { print a[k] "\t" $0 "\t" l }

    # save prev line to 'l' which is the ID
    { l = $0  } 

' RS= file1 RS='\n' file2

From the example files we get an array as such:

Code:
a[Negativ] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
a[19-0000] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
a[19-0002] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
a[19-0003] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
a[19-0004] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions
a[Control] = R_2019_03_12_13_59_54_user_S5-0000-000-v5.6_Oncomine_Childhood_Cancer_Research_DNA_and_Fusions

As no lines in file2 start with Control Negativ 19-0000 thru 19-0004 you get no output.
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 03-18-2019
Switching file1 and file2 should match the R_2019...], but the sub-directories are going to created in the same directory where file1 and file2 exsist not in the desired i think. Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. UNIX for Beginners Questions & Answers

How to sort the files by size and based subdirectory un UNIX?

I have the below input data in a file and need to get the output as mentioned below. Need to sort the data by size(Asc/des)/by subdirectory Below is the input which is there in a file: 120 /root/path2/part-00000-d3700305-428d-4b13-8161-42051f4ac5ed-c000.json 532 ... (3 Replies)
Discussion started by: ajarramuk
3 Replies

3. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

5. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

New files based off match or no match

Trying to match $2 in original_targets with $2 of new_targets . If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no... (2 Replies)
Discussion started by: cmccabe
2 Replies

7. Shell Programming and Scripting

awk Parse And Create Multiple Files Based on Field Value

Hello: I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE. The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the... (7 Replies)
Discussion started by: ec012
7 Replies

8. Shell Programming and Scripting

Match files based on either of the two columns awk

Dear Shell experts, I have 2 files with structure: File 1: ID and count head test_GI_count1.txt 1000094 2 10039307 1 10039641 1 10047177 11 10047359 1 1008555 2 10120302 1 10120672 13 10121776 1 10121865 32 And 2nd file: head Protein_gi_GeneID_symbol.txt protein_gi GeneID... (11 Replies)
Discussion started by: smitra
11 Replies

9. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Howdy Folks, I have a list that looks like this: (file2.txt) AAA BBB CCC DDD and there are 24 of these short words. I am matching these patterns to another file with 755795 lines (file1.txt). I have this code for matching: awk -v f2=file2.txt ' BEGIN { while(... (2 Replies)
Discussion started by: heecha
2 Replies

10. Shell Programming and Scripting

Using awk to create files based on a variable name

Hey all, I am parsing a file which have records containing one of a number of files names: ".psd", ".cr2", ".crw" , ".cr", ".xi", ".jpg", ".xif" etc Somewhere on each line there is a value "Namex.psd" "Namex.crw" etc. The position of this name is highly variable I need to output all the ".psd"... (4 Replies)
Discussion started by: C0ppert0p
4 Replies
Login or Register to Ask a Question