Sponsored Content
Top Forums Shell Programming and Scripting AWK Matching Fields and Combining Files Post 302301980 by Michelangelo on Sunday 29th of March 2009 04:20:56 PM
Old 03-29-2009
AWK Matching Fields and Combining Files

Hello!

I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm looking for is similar to to this:

File1: *first three columns are coordinates in (x, y, z)*
123 456 678 A B C
234 345 567 D F B
234 456 324 H J K
765 432 987 M N K


File2: *the last three columns are coordinates in (x, y, z)*
45 234 345 567
46 765 432 987
47 111 222 333
48 234 345 567
49 987 765 432
50 444 555 666
51 765 432 987
... and so on

Output file:
45 234 345 567 D F B
46 765 432 987 M N K
48 234 345 567 D F B
51 765 432 987 M N K

File2 has many more entries than File1, and every coordinate in File1 is located somewhere in File2. The problem I am having is how to search through all of File2 finding where each of the individual File1 coordinates is listed, and the number in column 1 of File2 that corresponds to that coordinate.

In a nutshell:
Make new file3
Find where File2($2, $3, $4) is equal to File1($1, $2, $3)
print to file3 File2($1, $2, $3, $4), File1($4, $5, $6)

Thank you!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Matching and combining two files

Hi, How can I match the first two fields of file2 against the first two fields of file1 and where they match combine the two lines. If the name (example-Aidan Rielly) is in file1 but not in file2 then just write the info from file1 to the combined output file. If the name (example-Silvia... (5 Replies)
Discussion started by: p3t3r
5 Replies

2. Shell Programming and Scripting

combining fields in awk

I am using: ps -A -o command,%cpu to get process and cpu usage figures. I want to use awk to split up the columns it returns. If I use: awk '{print "Process: "$1"\nCPU Usage: "$NF"\n"}' the $NF will get me the value in the last column, but if there is more than one word in the... (2 Replies)
Discussion started by: json4639
2 Replies

3. Shell Programming and Scripting

Matching and combining two files

Hi there, I have two files. What I want to do is search for the values in second field of file1 in the 6th field of the file2 and of they match to add the fields 1-5 of the file2 at the end of the line of file1 with a comma before. E.g File1 FWB,CHUAGT87HUMAS/BUD01,REUAIR08KLM... (3 Replies)
Discussion started by: sickboy
3 Replies

4. Shell Programming and Scripting

To get an output by combining fields from two different files

Hi guys, I couldn't find solution to this problem. If anyone knows please help me out. your guidance is highly appretiated. I have two files - FILE1 has the following 7 columns ( - has been added to make columns visible enough else columns are separated by single space) 155.34 - leg - 1... (8 Replies)
Discussion started by: smriti_shridhar
8 Replies

5. Shell Programming and Scripting

AWK- delimiting the strings and matching the fields

Hello, I am newbie in awk. I have just started learning it. 1) I have input file which looks like: {4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 } {10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307} and so on..... 2) In output: a)... (10 Replies)
Discussion started by: kajolo
10 Replies

6. Shell Programming and Scripting

Awk: adding fields after matching $1

Dear AWK-experts! I did get stuck in the task of combining files after matching fields, so I'm still awkward with learning AWK. There are 2 files: one containing 3 columns with ID, coding status, and score for long noncoding RNAs: file1 (1.txt) (>5000 lines) ... (12 Replies)
Discussion started by: kben
12 Replies

7. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

8. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies

9. UNIX for Beginners Questions & Answers

Continued trouble matching fields in different files and selective field printing ([g]awk)

I apologize in advance, but I continue to have trouble searching for matches between two files and then printing portions of each to output in awk and would very much appreciate some help. I have data as follows: File1 PS012,002 PRQ 0 1 1 17 1 0 -1 3 2 1 2 -1 ... (7 Replies)
Discussion started by: jvoot
7 Replies

10. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies
gfs2_tool(8)						      System Manager's Manual						      gfs2_tool(8)

NAME
gfs2_tool - interface to gfs2 ioctl/sysfs calls SYNOPSIS
gfs2_tool COMMAND [OPTION]... DESCRIPTION
gfs2_tool is an interface to a variety of the GFS2 ioctl/sysfs calls. Some of the functions of gfs_tool have been replaced by standard sys- tem tools such as mount and chattr, so gfs2_tool doesn't have as many options as gfs_tool used to. COMMANDS
clearflag Flag File1 File2 ... Clear an attribute flag on a file. This is now obsolete and kept only for backward compatibility, chattr is the preferred way to clear attribute flags. See setflag for available flags. This option will probably be removed at a future date. freeze MountPoint Freeze (quiesce) a GFS2 cluster. gettune MountPoint Print out the current values of the tuning parameters in a running filesystem. A better source of similar (more comprehensive) information is that in the /proc/mounts file. Running the mount command with no arguments will also provide the same information. This option is considered obsolete and will probably be removed at some future date. journals MountPoint Print out information about the journals in a mounted filesystem. lockdump MountPoint Print out information about the locks this machine holds for a given filesystem. This information is also available via the debugfs glock dump file, and accessing that file is the preferred method of obtaining a dump of the glock state. sb device proto [newvalue] View (and possibly replace) the name of the locking protocol in the file system superblock. The file system shouldn't be mounted by any client when you do this. sb device table [newvalue] View (and possibly replace) the name of the locking table in the file system superblock. The file system shouldn't be mounted by any client when you do this. sb device ondisk [newvalue] View (and possibly replace) the ondisk format number in the file system superblock. The file system shouldn't be mounted by any client when you do this. No one should have to use this. sb device multihost [newvalue] View (and possibly replace) the multihost format number in the file system superblock. The file system shouldn't be mounted by any client when you do this. No one should have to use this. sb device uuid [newvalue] View (and possibly replace) the uuid in the file system superblock. The file system shouldn't be mounted by any client when you do this. The new uuid value should be in the standard uuid format. For example: 1AEA8269-15C5-72BD-6D83-8720B17AA4EE sb device all Print out the superblock. setflag Flag File1 File2 ... Set an attribute flag on a file. The currently supported flags are jdata, immutable, appendonly, noatime, and sync. The chattr command is the preferred way to set attributes on files. This option will probably be removed at a future date. The jdata flag causes all the data written to a file to be journaled. If the jdata flag is set for a directory, all files and directories subsequently created within that directory are also journaled. This behavior replaces the old inherit_jdata flag from gfs. Same as chattr +j. The immutable flag marks the file immutable. The behavior is similar to the immutable flag in the ext2/3 filesystems. All write access is denied. Same as chattr +i. The appendonly flag causes all data to be written at the end of the file. Same as chattr +a. The noatime flag disables updates to the file's access time. Same as chattr +A. The sync flag causes data written to the file to be sync'ed to stable storage immediately. Same as chattr +S. settune MountPoint parameter newvalue Set the value of tuning parameter. Use gettune for a listing of tunable parameters. The mount -oremount command is the preferred way to set the values of tunable parameters. At some future stage, when all parameters can be set via mount, this option will be removed. unfreeze MountPoint Unfreeze a GFS2 cluster. version Print out the version of GFS2 that this program goes with. withdraw MountPoint Cause GFS2 to abnormally shutdown a given filesystem on this node. This feature is only useful for testing and should not be used during normal filesystem operation. gfs2_tool(8)
All times are GMT -4. The time now is 04:58 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy