awk to create separate files but not include specific field in output

05-09-2018

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

awk to create separate files but not include specific field in output

I am trying to use awk to create (in this example) 3 seperate text file from the unique id in $1 in file, if it starts with the pattern aa. The contents of each row is used to populate each text file except for $1 which is not needed. It seems I am close but not quite get there. Thank you

.

file tab-delimeted

Code:

aa1110-0	12	47259533	47259533	G	A	Comment:heterozygous_snv
aa1110-1	11	23892795	23892799	G	C	Comment:heterozygous_snv
	2	7581601	7581601	T	A	Comment:heterozygous_snv
aa1110-2	1	237837422	237837422	C	TTC	Comment:substitution
	3	7583892	7583892	G	A	Comment: heterozygous snv
		19	23892788	23892799	G	-	Comment:deletion

awk

Code:

awk -F'\t' '/^aa/{                     # if line starts with aa
        if(!w)                          # if negate of w is true
           f=sprintf($1"%d.txt",++n);   # pre increment n, and set up variable f 
        w=1;                            # set variable w = 1
        print >f;                       # write record/row/line to file
        next                            # go to next line
     }
     {                                  # for which does not start with aa  
        close(f);                       # close file
        w=0                             # set w = 0 for next line with aa use newfile
     }
' file

current output is two files with each row in them but $1 as well
Here is one:

Code:

aa1110-0	12	47259533	47259533	G	A	Comment:heterozygous_snv
aa1110-1	11	23892795	23892799	G	C	Comment:heterozygous_snv

awk

Code:

awk '{for(i=2;i<=NF;i++){printf "%s ", $i >> $1".txt"};printf "\n" >> $1".txt"; close($1".txt")}' file

current output is three files with no $1 in them but only one line in them.
Here is the same file as above:

Code:

12	47259533	47259533	G	A	Comment:heterozygous_snv

desired output tab-delimeted

Code:

aa1110-0.txt
12	47259533	47259533	G	A	Comment:heterozygous_snv

aa1110-1.txt
11	23892795	23892799	G	C	Comment:heterozygous_snv
2	7581601	7581601	T	A	Comment:heterozygous_snv

aa1110-2.txt
1	237837422	237837422	C	TTC	Comment:substitution
3	7583892	7583892	G	A	Comment:heterozygous_snv
19	23892788	23892799	G	-	Comment:deletion

Last edited by cmccabe; 05-09-2018 at 02:45 PM.. Reason: fixed format

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

05-09-2018

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Try this:

Code:

awk -F'\t' '
/^aa/{                             # if line starts with aa
   if(f) close(f)                  # close already open file
   f=sprintf($1"%d.txt",++n)       # pre increment n, and set up variable f 
}
f {                                # if file name created
   $1=""                           # blank field #1
   $0=substr($0, 2)                # strip blank #1 field
   print >f;                       # write record/row/line to file
}
' OFS='\t' file

This User Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

05-09-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The following seems to do what I think you want; which assumes you don't want extra whitespace characters added to the ends of your output lines, that you want <tab> delimited output from your <tab> delimited input, and that you just want the contents of field 1 with .txt added as the filename for your output files (with no sequence numbering added to the filenames):

Code:

awk '
BEGIN {	FS = OFS = "\t"
}
/^aa/ {	if(f != "")
		close(f)
	f = $1 ".txt"
}
{	for(i = 2; i <= NF; i++)
		printf("%s%s", $i, (i == NF) ? ORS : OFS) > f
}' file

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-10-2018

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Thank you both very much

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

Shell Programming and Scripting

awk to create separate files but not include specific field in output

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: output lines with common field to separate files

Discussion started by: beca123456

2. Shell Programming and Scripting

awk to parse field and include the text of 1 pipe in field 4

Discussion started by: cmccabe

3. Shell Programming and Scripting

Include pathname in awk output?

Discussion started by: kgolli

4. Shell Programming and Scripting

awk Parse And Create Multiple Files Based on Field Value

Discussion started by: ec012

5. Shell Programming and Scripting

awk assign output of array to specific field-number

Discussion started by: sdf

6. Shell Programming and Scripting

Replace specific field on specific line sed or awk

Discussion started by: crownedzero

7. Shell Programming and Scripting

Compare two files and output difference, by first field using awk.

Discussion started by: charles33

8. UNIX for Dummies Questions & Answers

awk to match multiple regex and create separate output files

Discussion started by: heecha

9. Shell Programming and Scripting

awk command to separate a field

Discussion started by: jake1988

10. Shell Programming and Scripting

how to include field in the output filename of awk

Discussion started by: yahyaaa