awk to match multiple regex and create separate output files Post: 302544585

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers awk to match multiple regex and create separate output files Post 302544585 by heecha on Thursday 4th of August 2011 09:22:34 AM

08-04-2011

Registered User

awk to match multiple regex and create separate output files

Howdy Folks,

I have a list that looks like this:
(file2.txt)

AAA
BBB
CCC
DDD

and there are 24 of these short words.

I am matching these patterns to another file with 755795 lines (file1.txt).

I have this code for matching:

Code:

awk -v f2=file2.txt '
    BEGIN {
        while( (getline<f2) > 0 )   # read and collect records from f2
        {
            key = $1;
            ki = kidx[key]++;        # track number of duplicate keys (0 based)
            k2rec[key,ki] = $0;      # save unique record by key and dup count
        }
        close( f2 );
    }

    {
        key = $1;
        for( i = 0; i < kidx[key]; i++ )          # for each duplicate of key
            printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record
    }
' <file1.txt > output

In this form every line in file1 that is matched goes into the same output file. What I would like to do is change the code such that each of the patterns in file2.txt gets its own file. For example:

all of the lines in file1.txt that match AAA would go in AAA.txt while all of the lines matching BBB would go in BBB.txt.

I'm not exactly sure how to handle controlling the output.

Thanks guys, I appreciate your help.
Robert

heecha

View Public Profile for heecha

Find all posts by heecha

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ...

2. Shell Programming and Scripting

handling multiple files using awk command and wants to get separate out file for each

hai all I am new to the world of shell scripting I wanted to extract two columns from multiple files say around 25 files and i wanted to get the separate outfile for each input file tired using the following command to extract two columns from 25 files awk...

3. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is...

4. Shell Programming and Scripting

extract DDL - output every match to separate file

Hi, i want to extract the 'CREATE INDEX' or 'CREATE UNIQUE INDEX' statements from a ddl file and output each match to a separate file. i was looking around the net but couldn�t find anything. a possible sed-script could be: sed -n '/CREATE*INDEX*/,/COMMIT/p' filename.ddlbut i couldn�t find out...

5. Shell Programming and Scripting

create separate files from one excel file with multiple sheets

Hi, I have one requirement, create separate files (".csv") from one excel file(xlsx) with multiple sheets. These ".csv" files are my source files. So anybody please suggest me the process. Thanks in Advance. Regards, Harris

6. Shell Programming and Scripting

Create Multiple UNIX Files for Multiple SQL Rows output

Dear All, I am trying to write a Unix Script which fires a sql query. The output of the sql query gives multiple rows. Each row should be saved in a separate Unix File. The number of rows of sql output can be variable. I am able save all the rows in one file but in separate files. Any...

7. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ...

8. Shell Programming and Scripting

awk to create separate files but not include specific field in output

I am trying to use awk to create (in this example) 3 seperate text file from the unique id in $1 in file, if it starts with the pattern aa. The contents of each row is used to populate each text file except for $1 which is not needed. It seems I am close but not quite get there. Thank you :). ...

9. UNIX for Beginners Questions & Answers

Awk: output lines with common field to separate files

Hi, A beginner one. my input.tab (tab-separated): h1 h2 h3 h4 h5 item1 grpA 2 3 customer1 item2 grpB 4 6 customer1 item3 grpA 5 9 customer1 item4 grpA 0 0 customer2 item5 grpA 9 1 customer2 objective: output a file for each customer ($5) with the item number ($1) only if $2 matches...

10. Shell Programming and Scripting

awk to create subdirectory based on match between two files

In the below awk I am trying to mkdir based of an exact match between file2 line starting with R_2019.... and file1 line starting with R_2019. When a match is found there is a folder located at /home/cmccabe/run with the same name as the match where each $2 in file1 is a new subdirectory in that...

LEARN ABOUT MINIX

join

JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME

       join - relational database operator

SYNOPSIS

       join [-an] [-e s] [-o list] [-tc] file1 file2

DESCRIPTION

       Join  forms,  on the standard output, a join of the two relations specified by the lines of file1 and file2.  If file1 is `-', the standard
       input is used.

       File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the  first	in
       each line.

       There  is  one line in the output for each pair of lines in file1 and file2 that have identical join fields.  The output line normally con-
       sists of the common field, then the rest of the line from file1, then the rest of the line from file2.

       Fields are normally separated by blank, tab or newline.	In this case, multiple separators count as one, and leading  separators  are  dis-
       carded.

       These options are recognized:

       -an    In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.

       -e s   Replace empty output fields by string s.

       -o list
	      Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
	      field number.

       -tc    Use character c as a separator (tab character).  Every appearance of c in a line is significant.

SEE ALSO

       sort(1), comm(1), awk(1).

BUGS

       With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.

       The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.

7th Edition							  April 29, 1985							   JOIN(1)