Sponsored Content
Top Forums UNIX for Dummies Questions & Answers awk to match multiple regex and create separate output files Post 302544585 by heecha on Thursday 4th of August 2011 09:22:34 AM
Old 08-04-2011
awk to match multiple regex and create separate output files

Howdy Folks,

I have a list that looks like this:
(file2.txt)

AAA
BBB
CCC
DDD

and there are 24 of these short words.

I am matching these patterns to another file with 755795 lines (file1.txt).

I have this code for matching:

Code:
awk -v f2=file2.txt '
    BEGIN {
        while( (getline<f2) > 0 )   # read and collect records from f2
        {
            key = $1;
            ki = kidx[key]++;        # track number of duplicate keys (0 based)
            k2rec[key,ki] = $0;      # save unique record by key and dup count
        }
        close( f2 );
    }

    {
        key = $1;
        for( i = 0; i < kidx[key]; i++ )          # for each duplicate of key
            printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record
    }
' <file1.txt > output

In this form every line in file1 that is matched goes into the same output file. What I would like to do is change the code such that each of the patterns in file2.txt gets its own file. For example:

all of the lines in file1.txt that match AAA would go in AAA.txt while all of the lines matching BBB would go in BBB.txt.

I'm not exactly sure how to handle controlling the output.

Thanks guys, I appreciate your help.
Robert
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ... (16 Replies)
Discussion started by: JRy
16 Replies

2. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk... (2 Replies)
Discussion started by: hema dhevi
2 Replies

3. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

4. Shell Programming and Scripting

extract DDL - output every match to separate file

Hi, i want to extract the 'CREATE INDEX' or 'CREATE UNIQUE INDEX' statements from a ddl file and output each match to a separate file. i was looking around the net but couldnīt find anything. a possible sed-script could be: sed -n '/CREATE*INDEX*/,/COMMIT/p' filename.ddlbut i couldnīt find out... (11 Replies)
Discussion started by: CactusMoon
11 Replies

5. Shell Programming and Scripting

create separate files from one excel file with multiple sheets

Hi, I have one requirement, create separate files (".csv") from one excel file(xlsx) with multiple sheets. These ".csv" files are my source files. So anybody please suggest me the process. Thanks in Advance. Regards, Harris (3 Replies)
Discussion started by: harris
3 Replies

6. Shell Programming and Scripting

Create Multiple UNIX Files for Multiple SQL Rows output

Dear All, I am trying to write a Unix Script which fires a sql query. The output of the sql query gives multiple rows. Each row should be saved in a separate Unix File. The number of rows of sql output can be variable. I am able save all the rows in one file but in separate files. Any... (14 Replies)
Discussion started by: Rahul_Bhasin
14 Replies

7. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

8. Shell Programming and Scripting

awk to create separate files but not include specific field in output

I am trying to use awk to create (in this example) 3 seperate text file from the unique id in $1 in file, if it starts with the pattern aa. The contents of each row is used to populate each text file except for $1 which is not needed. It seems I am close but not quite get there. Thank you :). ... (3 Replies)
Discussion started by: cmccabe
3 Replies

9. UNIX for Beginners Questions & Answers

Awk: output lines with common field to separate files

Hi, A beginner one. my input.tab (tab-separated): h1 h2 h3 h4 h5 item1 grpA 2 3 customer1 item2 grpB 4 6 customer1 item3 grpA 5 9 customer1 item4 grpA 0 0 customer2 item5 grpA 9 1 customer2 objective: output a file for each customer ($5) with the item number ($1) only if $2 matches... (2 Replies)
Discussion started by: beca123456
2 Replies

10. Shell Programming and Scripting

awk to create subdirectory based on match between two files

In the below awk I am trying to mkdir based of an exact match between file2 line starting with R_2019.... and file1 line starting with R_2019. When a match is found there is a folder located at /home/cmccabe/run with the same name as the match where each $2 in file1 is a new subdirectory in that... (2 Replies)
Discussion started by: cmccabe
2 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES
Example 1 Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.11 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 09:56 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy