awk to update file based on partial match in field1 and exact match in field2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to update file based on partial match in field1 and exact match in field2
# 1  
Old 02-27-2017
awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output.

$1 in database.txt will be in list.txt as a partial match. $2 of database.txt will also be in list.txt.

If the output file and the database.txt match then "all are current", but if a line or lines between the two files does not match the "newer version of line available"

So using the first line in database.txt as an example, refGene is a partial match to the text in bold in list.txt. The $2 between the two files is the same. There may be multiple lines, as in this case, but the dates will always match.

The awk below seems to find the partial match, but that is as far as I get. Thank you Smilie.

database.txt (always two fields separated by a space, first fields contain the name and the second field is the date)
Code:
refGene 20151211
clinvar 20170215
popfreq_all 20150413
dbnsfp 20170123
spidex 20150827

list.txt (file can be variable in length but the name is a partial match in $1 and the date is in $2, file is tab-delimeted)
Code:
hg19_clinvar_20130905.txt.gz	20140527	415781
hg19_clinvar_20130905.txt.idx.gz	20140527	73218
hg19_clinvar_20131105.txt.gz	20140527	580838
hg19_clinvar_20131105.txt.idx.gz	20140527	167090
hg19_clinvar_20140211.txt.gz	20140527	694067
hg19_clinvar_20140211.txt.idx.gz	20140527	181049
hg19_clinvar_20140303.txt.gz	20140527	773948
hg19_clinvar_20140303.txt.idx.gz	20140527	182842
hg19_clinvar_20140702.txt.gz	20140712	1111503
hg19_clinvar_20140702.txt.idx.gz	20140712	367271
hg19_clinvar_20140902.txt.gz	20140911	1503198
hg19_clinvar_20140902.txt.idx.gz	20140911	389069
hg19_clinvar_20140929.txt.gz	20141002	1521398
hg19_clinvar_20140929.txt.idx.gz	20141002	389735
hg19_clinvar_20150330.txt.gz	20150413	1988285
hg19_clinvar_20150330.txt.idx.gz	20150413	426235
hg19_clinvar_20150629.txt.gz	20150724	2211904
hg19_clinvar_20150629.txt.idx.gz	20150724	428773
hg19_clinvar_20151201.txt.gz	20160303	1978309
hg19_clinvar_20151201.txt.idx.gz	20160303	188549
hg19_clinvar_20160302.txt.gz	20160303	2070491
hg19_clinvar_20160302.txt.idx.gz	20160303	195824
hg19_clinvar_20161128.txt.gz	20161205	2762808
hg19_clinvar_20161128.txt.idx.gz	20161205	239561
hg19_clinvar_20170130.txt.gz	20170215	4756134
hg19_clinvar_20170130.txt.idx.gz	20170215	312735
hg19_dbnsfp30a.txt.gz	20151015	2916074880
hg19_dbnsfp30a.txt.idx.gz	20151015	4981998
hg19_dbnsfp31a_interpro.txt.gz	20151223	147102844
hg19_dbnsfp31a_interpro.txt.idx.gz	20151223	2445036
hg19_dbnsfp33a.txt.gz	20170123	3610182452
hg19_dbnsfp33a.txt.idx.gz	20170123	5034641
hg19_popfreq_all_20150413.txt.gz	20150413	1059027804
hg19_popfreq_all_20150413.txt.idx.gz	20150413	212518299
hg19_refGeneMrna.fa.gz	20151211	41379833
hg19_refGene.txt.gz	20151211	5304233
hg19_refGeneVersion.txt.gz	20151211	131417
hg19_spidex.zip	20150827	2991981619

desired output
Code:
refGene 20151211
clinvar 20170215
popfreq_all 20150413
dbnsfp 20170123
spidex 20150827

awk used to generate list.txt
Code:
awk 'FNR==NR{a[$1]; next} {for (i in a) if (index($0, i)) print}' database hg19_avdblist.txt > list

# 2  
Old 02-27-2017
Code:
awk '
        NR == FNR {
                A[$1] = $2
                next
        }
        {
                for ( k in A )
                {
                        if ( $1 ~ k && $2 == A[k] )
                                F[k] = $2
                }
        }
        END {
                for ( k in F )
                        print k, F[k]
        }
' database.txt list.txt

This User Gave Thanks to Yoda For This Post:
# 3  
Old 03-01-2017
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk to update value based on pattern match in another file

In the awk, thanks you @RavinderSingh13, for the help in below, hopefully it is close as I am trying to update the value in $12 of the tab-delimeted file2 with the matching value in $1 of the space delimeted file1. I have added comments for each line as well. Thank you :). awk awk '$12 ==... (10 Replies)
Discussion started by: cmccabe
10 Replies

3. Shell Programming and Scripting

Perl to update field in file based of match to another file

In the perl below I am trying to set/update the value of $14 (last field) in file2, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

awk to update field file based on match

If $1 in file1 matches $2 in file2. Then the value in $2 of file2 is updated to $1"."$2 of file2. The awk seems to only match the two files but not update. Thank you :). awk awk 'NR==FNR{A ; next} $1 in A { $2 = a }1' file1 file2 file1 name version NM_000593 5 NM_001257406... (3 Replies)
Discussion started by: cmccabe
3 Replies

7. Shell Programming and Scripting

Using grep returns partial matches, I need to get an exact match or nothing

I’m trying to modify someone perl script to fix a bug. The piece of code checks that the zone name you want to add is unique. However, when the code runs, it finds a partial match using grep, and decides it already exists, so the “create” command exits. $cstatus = `${ZADM} list -vic | grep... (3 Replies)
Discussion started by: TKD
3 Replies

8. Shell Programming and Scripting

exact string match ; search and print match

I am trying to match a pattern exactly in a shell script. I have tried two methods awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
Discussion started by: bash_in_my_head
8 Replies

9. Shell Programming and Scripting

awk partial match and filter records

Hi, I am having file which contains around 15 columns, i need to fetch column 3,12,14 based on the condition that column 3 starts with 40464 this is the sample data how to achieve that (3 Replies)
Discussion started by: aemunathan
3 Replies

10. Shell Programming and Scripting

csv file to array, match field1, replace in flat file field1 to field2

Hello, i am bit stuck with making script for automatic procedure. Case: Two files. One is flat file, other is csv file. csv file has two column/fields with comma delimited data. Here is what i need (explained way) CSV file: field1 | field2 "hello","byebye" "hello2","byebye2"... (23 Replies)
Discussion started by: frankie_konin
23 Replies
Login or Register to Ask a Question