Format DATA

12-16-2014

Registered User

137, 1

Join Date: Jul 2008

Last Activity: 16 June 2020, 6:53 AM EDT

Posts: 137

Thanks Given: 78

Thanked 1 Time in 1 Post

Format DATA

Input File

Code:

AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54

Output File

Code:

 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
 ,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
 II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
 ,,,,,clarata_archive,7337.19,4681.66,2655.54
 ,,,,,clarata_sas,7557.19,4681.66,2600.54

Please help!!
Basically replace the common columns except the first one with "," ( the field separator)

For example in above ...the first 5 columns are common for the first 2 and the next 3 records ...

Last edited by rbatte1; 12-22-2014 at 12:08 PM.. Reason: Corrected reverse case

greycells

View Public Profile for greycells

Find all posts by greycells

12-17-2014

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Code:

awk  'x[$1,$2,$3,$4,$5]++{$1=$2=$3=$4=$5=""}1' FS=, OFS=, infile

These 2 Users Gave Thanks to Akshay Hegde For This Post:

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

12-17-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

This works if the patterns don't repeat further down the file. For e.g.

Code:

AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87

it would yield

Code:

AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54
,,,,,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87

Try

Code:

awk     '!x[$1,$2,$3,$4,$5]     {delete x}
         x[$1,$2,$3,$4,$5]++    {$1=$2=$3=$4=$5=""}
         1
        ' FS=, OFS=, file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-17-2014

Registered User

137, 1

Join Date: Jul 2008

Last Activity: 16 June 2020, 6:53 AM EDT

Posts: 137

Thanks Given: 78

Thanked 1 Time in 1 Post

Thanks Rudic .you are right .. but if the all the fields are same ... the output should not print them again at all .. for example in the above input ..the output should be

Code:

AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

but if the input is like this ( on or more field after $5 is different)

Code:

 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS BRONZE,12287.99,3293.98,6946.02

then the output should be

Code:

 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
 ,,,,,NAS BRONZE,12287.99,3293.98,6946.02
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

thanks !

greycells

View Public Profile for greycells

Find all posts by greycells

12-17-2014

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello greycells,

Could you please try following and let us know if this helps you.
Let's say we have input file is as follows.

Code:

cat testt1
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS BRONZE,12287.99,3293.98,6946.02

Code:

sort -k1,1 testt1 | awk -F, '!X[$1,$2,$3,$4,$5] {delete X} X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""} 1' OFS=,

Output will be as follows.

Code:

AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
,,,,,NAS BRONZE,12287.99,3293.98,6946.02
,,,,,NAS SILVER,12287.99,3293.98,6946.02
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

EDIT: Just want to add here a point I have used sort utility in solution which will sort the file's content according to first column and then it will fulfil the request. kindly let us know if you have any other requirements, queries etc for same.

Thanks,
R. Singh

Last edited by RavinderSingh13; 12-17-2014 at 09:28 AM.. Reason: Added a point for solution

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

12-22-2014

Registered User

69, 0

Join Date: Jun 2009

Last Activity: 24 October 2017, 4:19 AM EDT

Posts: 69

Thanks Given: 10

Thanked 0 Times in 0 Posts

I'm trying to understand how awk work with arrays and keen to learn.. can anyone explain how this solution works also reference material will be helpful

sort -k1,1 testt1 | awk -F, '!X[$1,$2,$3,$4,$5] {delete X} X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""} 1' OFS=,

Thanks in advance

r_t_1601

View Public Profile for r_t_1601

Find all posts by r_t_1601

12-22-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

A great way to learn about any utility is to read the manual page for that utility. In this case that could be done by looking at the output from the commands:

Code:

man awk

and:

Code:

man sort

The code Ravinder suggested (reformatted with comments added) is:

Code:

sort -k1,1 file |			# Sort file with the 1st field as the
					# primary sort key using sequences of
					# blanks as the field separators.
awk -F, ' 				# Use awk to process the sorted data
					# with comma as the input field
					# separator.
!X[$1,$2,$3,$4,$5] {delete X}		# If the element of array X with index
					# set to the 1st 5 fields on the line
					# separated by the the contents of the
					# SUBSEP variable has the value zero,
					# delete all elements from the array X.
					# Since array eelements are initialized
					# to zero if no value has been stored,
					# this will happen on the 1st line of a
					# set of lines with the same strings in
					# the 1st five fields on the line.
X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""}	# Increment the value of the element of
					# X corresonding to this line.  If the
					# element of X corresponding to this
					# line had a value greater than zero
					# before it was incremented, set the
					# first five fields to the empty string.
1' OFS=, 				# Print the (possibly updated) line.
					# Set the output field separator to a
					# comma.

Note that the sort utility uses a default field separator of any combination of blanks (i.e., spaces and tabs). While the input uses comma as a field separator. And since there are five fields used to determine which lines are to be grouped, all five of those fields should be included in the primary sort key. That would be:

Code:

sort -t, -k1,5 file

But, since the primary key is the 1st five fields on the line and variable length numeric fields are not part of the key, specifying a field separator and sort key is redundant since the default behavior of sort provides the desired order.

Note that the delete X is not required by the standards, but is available on some versions of awk. Note also that the statement:

Code:

!X[$1,$2,$3,$4,$5] {delete X}

could be removed and still get the same output. But, doing so will cause the amount of memory used by awk to increase slightly for each new group of lines. If there are millions of groups in the input file being processed, this could significantly slow down processing. If there are a few hundred groups, the difference might not be noticed at all.

I don't see the need for arrays here. If you're going to destroy the entire array every time you create a new array element, creating and destroying the array is just overhead. I would simplify the code to:

Code:

sort file | awk -F, '
{	if(p == $1 FS $2 FS $3 FS $4 FS $5)
		$1 = $2 = $3 = $4 = $5 = ""
	else	p = $1 FS $2 FS $3 FS FS $4 FS $5
}
1' OFS=,

which produces exactly the same output (unless your implementation of awk gives you a syntax error for delete array_name) and doesn't depend on non-standard awk features.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Format DATA

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Data format

Discussion started by: radius

2. Shell Programming and Scripting

Script to generate Excel file or to SQL output data to Excel format/tabular format

Discussion started by: dani1234

3. Programming

Transforming data to other format

Discussion started by: bala06

4. Shell Programming and Scripting

How to get data in a specified format

Discussion started by: reva

5. Shell Programming and Scripting

getting the data in some format

Discussion started by: priyanka3006

6. UNIX for Dummies Questions & Answers

Help me to format this data please

Discussion started by: glev2005

7. UNIX for Dummies Questions & Answers

Please help me format this data

Discussion started by: glev2005

8. Shell Programming and Scripting

format the extracted data

Discussion started by: kmanivan82

9. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Discussion started by: Hemamalini

10. Shell Programming and Scripting

format data

Discussion started by: inquirer