Assistance with an awk code to split files but keep the header


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Assistance with an awk code to split files but keep the header
# 1  
Old 12-20-2013
Assistance with an awk code to split files but keep the header

---------- Post updated at 11:48 AM ---------- Previous update was at 11:46 AM ----------

Hello all

I have an awk code that successfully creates separate text files based on the first six letters of the second field. What it doesn't do is preserve the header into each resulting file.

Here is the awk line:

Code:
awk -F "\t" 'NR > 1{f=substr($2,1,6)".txt";print >> f;close(f)}' workbook.txt


Have been reading on the forums that using FNR==1 or NR==1 should do the trick, but if I replace the NR > 1 with either FNR==1 or NR==1, only the header file is saved into a new text file.

The workbook.txt file is generated from software that applies barcodes to client-supplied databases, so the headers will be different from one database to another, so using the line " header = ... (and then putting the what the field heads would be)" would not work.

I have the feeling I've missed something rather simple. Can anybody help me with this issue?
# 2  
Old 12-20-2013
If you show us sample contents of workbook.txt and the output files (names and contents) you want produced as output from your sample input file, we'll be able to help you fix your awk script to do what you want.
# 3  
Old 12-20-2013
here is a sample of the contents. it is actually about 25-30 fields long, depending on what headers the client had in their database. I've redacted it to four fields for this sample. fields 1 and 2 will always be sort order and sort code.

here is what my awk code does so far:

Code:
start with

workbook.txt
Sort_Order    SortCode    Clientno    Clientname
377    BD 001 0886    L61479908    John Smith    
378    BD 001 0886    L74058317    Jane Smith    
379    BD 002 4518    L61508821    Jane Doe    
380    BD 002 4518    L60998465    John Doe    
381    BD 002 4518    L25019592    John Jones    
382    BD 002 4518    L25143489    Jane Jones    

resulting output

BD 001.txt
377    BD 001 0886    L61479908    John Smith    
378    BD 001 0886    L74058317    Jane Smith    

BD 002.txt
379    BD 002 4518    L61508821    Jane Doe    
380    BD 002 4518    L60998465    John Doe    
381    BD 002 4518    L25019592    John Jones    
382    BD 002 4518    L25143489    Jane Jones

here is what I would like to happen:

Code:
start with

workbook.txt
Sort_Order    SortCode    Clientno    Clientname
377    BD 001 0886    L61479908    John Smith    
378    BD 001 0886    L74058317    Jane Smith    
379    BD 002 4518    L61508821    Jane Doe    
380    BD 002 4518    L60998465    John Doe    
381    BD 002 4518    L25019592    John Jones    
382    BD 002 4518    L25143489    Jane Jones    

desired output

BD 001.txt
Sort_Order    SortCode    Clientno    Clientname
377    BD 001 0886    L61479908    John Smith    
378    BD 001 0886    L74058317    Jane Smith    

BD 002.txt
Sort_Order    SortCode    Clientno    Clientname
379    BD 002 4518    L61508821    Jane Doe    
380    BD 002 4518    L60998465    John Doe    
381    BD 002 4518    L25019592    John Jones    
382    BD 002 4518    L25143489    Jane Jones

please let me know if there is any more information that is required

Colly
# 4  
Old 12-20-2013
The awk -F "\t" in your code indicates that the fields in workbook.txt are tab separated, but the sample input you showed us does not contain any tabs. That isn't unusual when text is copied from one window and pasted into the text when you start a thread in this forum. But having tab stops set at column positions 8, 23, and 36 is highly unusual. Is the field separator in your input files really a tab character, or is your field separator four adjacent space characters (as shown in your sample input file).

In your sample input file, all lines with the same 1st six characters of the SortCode field are adjacent. Will that also occur in all of your input files?
# 5  
Old 12-20-2013
hello there.

Yes, the database is tab separated. For the purposes of the forum it was copied and pasted from textwrangler, but rest assured the database is separated by tabs.

To answer the second question, I believe the answer is yes. Put simply, the sort code will be either the letters BR, BD or UR followed by a space and then either three numbers OR two OR three letters. I wrote a GREP that would find them:

Code:
(BD|BR|UR)\s.{2,3}

but was having difficulty implementing the code.

As my first post said, the awk line that I have works for all intents and purposes with the exception of the header not being retained in the resulting file. I am sure that I'm missing something that has to do with FNR==1 or NR==1 instead of NR >1 but my attempts using these codes yield the header line in a file on its own, and then nothing else.

Colly
# 6  
Old 12-21-2013
Assuming that your field separator is a tab character and that SortCode fields with the same values are always adjacent, the following seems to do what you want:
Code:
awk -F '\t' '
NR == 1 {hdr = $0
        next
}
f != substr($2, 1, 6)".txt" {
        if(f) close(f)
        f = substr($2, 1, 6)".txt"
        print hdr > f
}
{       print >> f
}' workbook.txt

If you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of awk.

With three more simple lines of code, it will handle cases where some input lines destined for one of the output files are not adjacent to the others.

Since your sample input file has fields separated by 4 spaces instead of by tabs, this script will not work with your sample input. To make it work with your sample input, change the 1st line of the script to:
Code:
awk -F '    ' '

To make the spacing clear (with each space character replaced by <space>) that is:
Code:
awk<space>-F<space>'<space><space><space><space>'<space>'

Note also that if the 2nd field starts with XX XX rather than XX XXX, your output filenames will not only have an embedded space character, they will also have a trailing space character. That trailing space could be easily eliminated using a different way to create the output filenames (and it would work with any combination of spaces and tabs as field separators).
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 12-21-2013
thank you very much for the solution, and indeed it worked well!

as I said in my previous post, the database is tab delimited. the fact that it is displaying four spaces on-screen is unfortunate as indeed the database is tab delimited.

thank you for noting about the trailing space. in this instance it is not an issue.

Colin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies

2. UNIX for Beginners Questions & Answers

awk assistance - Comparing 2 csv files

Hello all, I have searched high and low for a solution to this, many have come really close but not quite what I'm after. I have 2 files. One contains GUID's, for example: 8121E002-96FE-4C9C-BC5A-6AFF20DACECD 84468F30-F3B7-418B-81F0-0908E80792BF A second file, contains a path to the... (8 Replies)
Discussion started by: tirmUK
8 Replies

3. Shell Programming and Scripting

Split file by column value, each with header

Hello all, I have a csv with with different testcase values in column 5. year,min,max,Instrument,Testcase 2016,201,1003,GEOTROPH-02116,TATA7980 2016,53,1011,GEOTROPH-01963,TATA7980 2016,3,1024,GEOTROPH-02067,TATA7980 2016,203,1027,GEOTROPH-02011,TATA7980... (16 Replies)
Discussion started by: senhia83
16 Replies

4. Shell Programming and Scripting

Adding header to sub files after splitting the main file using AWK

Hi Folks, I have a file like: mainfile.txt: ------------- file1 abc def xyz file1 aaa pqr xyz file2 lmn ghi xyz file2 bbb tuv xyz I need output having two files file1 and file2. file1: ------ Name State Country abc def xyz aaa pqr xyz file2: (3 Replies)
Discussion started by: tanmay.gemini
3 Replies

5. UNIX for Dummies Questions & Answers

Merge all csv files in one folder considering only 1 header row and ignoring header of all others

Friends, I need help with the following in UNIX. Merge all csv files in one folder considering only 1 header row and ignoring header of all other files. FYI - All files are in same format and contains same headers. Thank you (4 Replies)
Discussion started by: Shiny_Roy
4 Replies

6. Shell Programming and Scripting

Need Header for all splitted files - awk

Input file: i have a file and need to split into multiple files based on first column. i need the header for all the splitted files. I'm unable to get the header. $ cat log.txt id,mailtype,value 1252468812,yahoo,3.5 1252468812,hotmail,2.4 1252468819,yahoo,1.2 1252468812,msn,8.9... (6 Replies)
Discussion started by: mannefromdetroi
6 Replies

7. Shell Programming and Scripting

awk split and rename files

I have a file test1.html like below: <dctm_topnav_en_US> <html> ..... </html> <dctm_topnav_en_CA> <html> ..... </html> <dctm_topnav_en_FR> <html> ..... </html> I need to use awk to split this into three file names like en_US.html , en_CA.html, en_FR.html each having content between... (4 Replies)
Discussion started by: vijay52
4 Replies

8. Shell Programming and Scripting

Need assistance with simple shell script to organize files. [Code attached]

I need some help with this shell script for class. All it does is organize your files. It works, but in the log file, it needs to show the new filepaths of the moved files. Heres my log of my output: Starting to organize... movie2.wmv --> movie3.mov --> movie1.mpg --> song1.mp3 --> ... (3 Replies)
Discussion started by: ryandamartini
3 Replies

9. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies

10. Shell Programming and Scripting

awk command to split in to 2 files

Hi, I have a problem in grepping a file for 2 strings and writing them to 2 appropriate files. I need to use the awk command and read the file only once and write to the appropriate file. My file is very huge in size and it is taking a long time using cat command and grep command. Can anyone... (3 Replies)
Discussion started by: m_subra_mani
3 Replies
Login or Register to Ask a Question