Format DATA


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Format DATA
# 1  
Old 12-16-2014
Format DATA

Input File

Code:
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54

Output File

Code:
 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
 ,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
 II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
 ,,,,,clarata_archive,7337.19,4681.66,2655.54
 ,,,,,clarata_sas,7557.19,4681.66,2600.54

Please help!!
Basically replace the common columns except the first one with "," ( the field separator)

For example in above ...the first 5 columns are common for the first 2 and the next 3 records ...

Last edited by rbatte1; 12-22-2014 at 12:08 PM.. Reason: Corrected reverse case
# 2  
Old 12-17-2014
Code:
awk  'x[$1,$2,$3,$4,$5]++{$1=$2=$3=$4=$5=""}1' FS=, OFS=, infile

These 2 Users Gave Thanks to Akshay Hegde For This Post:
# 3  
Old 12-17-2014
This works if the patterns don't repeat further down the file. For e.g.
Code:
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87

it would yield
Code:
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54
,,,,,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87

Try

Code:
awk     '!x[$1,$2,$3,$4,$5]     {delete x}
         x[$1,$2,$3,$4,$5]++    {$1=$2=$3=$4=$5=""}
         1
        ' FS=, OFS=, file

This User Gave Thanks to RudiC For This Post:
# 4  
Old 12-17-2014
Thanks Rudic .you are right .. but if the all the fields are same ... the output should not print them again at all .. for example in the above input ..the output should be

Code:
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

but if the input is like this ( on or more field after $5 is different)
Code:
 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS BRONZE,12287.99,3293.98,6946.02

then the output should be

Code:
 
 AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
,,,,,NAS ARCHIVE,12287.99,3327.12,6912.87
 ,,,,,NAS BRONZE,12287.99,3293.98,6946.02
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

thanks !
# 5  
Old 12-17-2014
Hello greycells,

Could you please try following and let us know if this helps you.
Let's say we have input file is as follows.
Code:
cat testt1
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS SILVER,12287.99,3293.98,6946.02
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_archive,7337.19,4681.66,2655.54
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clarata_sas,7557.19,4681.66,2600.54
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS BRONZE,12287.99,3293.98,6946.02

Code:
sort -k1,1 testt1 | awk -F, '!X[$1,$2,$3,$4,$5] {delete X} X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""} 1' OFS=,

Output will be as follows.
Code:
AU01NAS002,FCNVX133800117,AU01_Melbourne_Australia,ATT,Internal,NAS ARCHIVE,12287.99,3327.12,6912.87
,,,,,NAS BRONZE,12287.99,3293.98,6946.02
,,,,,NAS SILVER,12287.99,3293.98,6946.02
II18NAS001,CK200110400822,II18_Mumb,MFFi(COD),Internal,clar_r5_performance,6667.88,2187.03,4254.13
,,,,,clarata_archive,7337.19,4681.66,2655.54
,,,,,clarata_sas,7557.19,4681.66,2600.54

EDIT: Just want to add here a point I have used sort utility in solution which will sort the file's content according to first column and then it will fulfil the request. kindly let us know if you have any other requirements, queries etc for same.


Thanks,
R. Singh

Last edited by RavinderSingh13; 12-17-2014 at 09:28 AM.. Reason: Added a point for solution
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 12-22-2014
I'm trying to understand how awk work with arrays and keen to learn.. can anyone explain how this solution works also reference material will be helpful

sort -k1,1 testt1 | awk -F, '!X[$1,$2,$3,$4,$5] {delete X} X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""} 1' OFS=,

Thanks in advance
# 7  
Old 12-22-2014
A great way to learn about any utility is to read the manual page for that utility. In this case that could be done by looking at the output from the commands:
Code:
man awk

and:
Code:
man sort

The code Ravinder suggested (reformatted with comments added) is:
Code:
sort -k1,1 file |			# Sort file with the 1st field as the
					# primary sort key using sequences of
					# blanks as the field separators.
awk -F, ' 				# Use awk to process the sorted data
					# with comma as the input field
					# separator.
!X[$1,$2,$3,$4,$5] {delete X}		# If the element of array X with index
					# set to the 1st 5 fields on the line
					# separated by the the contents of the
					# SUBSEP variable has the value zero,
					# delete all elements from the array X.
					# Since array eelements are initialized
					# to zero if no value has been stored,
					# this will happen on the 1st line of a
					# set of lines with the same strings in
					# the 1st five fields on the line.
X[$1,$2,$3,$4,$5]++ {$1=$2=$3=$4=$5=""}	# Increment the value of the element of
					# X corresonding to this line.  If the
					# element of X corresponding to this
					# line had a value greater than zero
					# before it was incremented, set the
					# first five fields to the empty string.
1' OFS=, 				# Print the (possibly updated) line.
					# Set the output field separator to a
					# comma.

Note that the sort utility uses a default field separator of any combination of blanks (i.e., spaces and tabs). While the input uses comma as a field separator. And since there are five fields used to determine which lines are to be grouped, all five of those fields should be included in the primary sort key. That would be:
Code:
sort -t, -k1,5 file

But, since the primary key is the 1st five fields on the line and variable length numeric fields are not part of the key, specifying a field separator and sort key is redundant since the default behavior of sort provides the desired order.

Note that the delete X is not required by the standards, but is available on some versions of awk. Note also that the statement:
Code:
!X[$1,$2,$3,$4,$5] {delete X}

could be removed and still get the same output. But, doing so will cause the amount of memory used by awk to increase slightly for each new group of lines. If there are millions of groups in the input file being processed, this could significantly slow down processing. If there are a few hundred groups, the difference might not be noticed at all.

I don't see the need for arrays here. If you're going to destroy the entire array every time you create a new array element, creating and destroying the array is just overhead. I would simplify the code to:
Code:
sort file | awk -F, '
{	if(p == $1 FS $2 FS $3 FS $4 FS $5)
		$1 = $2 = $3 = $4 = $5 = ""
	else	p = $1 FS $2 FS $3 FS FS $4 FS $5
}
1' OFS=,

which produces exactly the same output (unless your implementation of awk gives you a syntax error for delete array_name) and doesn't depend on non-standard awk features.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Data format

Dear Masters, I have problem with my data result I do vim data result AAA111|^/CANADA|80 BAA111|^/PARIS|60 string with blue colour appears..how can I remove it? So when I do vi, blue string should not appear tks (2 Replies)
Discussion started by: radius
2 Replies

2. Shell Programming and Scripting

Script to generate Excel file or to SQL output data to Excel format/tabular format

Hi , i am generating some data by firing sql query with connecting to the database by my solaris box. The below one should be the header line of my excel ,here its coming in separate row. TO_CHAR(C. CURR_EMP_NO ---------- --------------- LST_NM... (6 Replies)
Discussion started by: dani1234
6 Replies

3. Programming

Transforming data to other format

Dear All I would like to transform data from one format to another format. my Input: 0 0 1 0 1 0.308 0 2 0.554 0 3 0.287 output: Z (0,0)= 1 Z (0,1)=0.308 Z (0,2)=0.554 Z (0,3)=0.287 (2 Replies)
Discussion started by: bala06
2 Replies

4. Shell Programming and Scripting

How to get data in a specified format

Hii , I have a huge set of data stored in file a.dat as shown below a.dat: 081276A BURMA Date: 1976/ 8/12 Centroid Time: 23:26:51.8 GMT Lat= 26.55 Lon= 97.12 Depth= 15.0 Half duration= 2.2 Centroid time minus hypocenter time: 5.6 Moment Tensor: Expo=24 7.840 -2.440... (4 Replies)
Discussion started by: reva
4 Replies

5. Shell Programming and Scripting

getting the data in some format

HI i am writing a shell script to generate some files having data in some format. for this i am using awk -F":" '{printf("%-20s:%-20s:%-20s:%-20s:%-15s:%-3s:%-19s:%-2s:%-20s:%-15s:%-2s\n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11)}' $DIALPBIN/temp1.txt > ${DIALPBIN}/temp2.txt this helps in... (1 Reply)
Discussion started by: priyanka3006
1 Replies

6. UNIX for Dummies Questions & Answers

Help me to format this data please

Good day, I have a script on each machine on our network that will say the computer name and the number of updates needed. Then the script will send a file via scp to a network share with the title hostname.local The contents of the file would be: hostname N (with N being the number of... (11 Replies)
Discussion started by: glev2005
11 Replies

7. UNIX for Dummies Questions & Answers

Please help me format this data

STMC429 (192.168.171.72) 2008-11-24 14:18:09.412 softwareupdate Loading CatalogURL http://creativesus.conair.lan:8088/index.sucatalog No new software available. There are no updates to install STMC444 (3) (192.168.171.116) 2008-11-24 14:14:31.771 softwareupdate Loading CatalogURL... (4 Replies)
Discussion started by: glev2005
4 Replies

8. Shell Programming and Scripting

format the extracted data

I have executed the following code. #! /bin/ksh ############################ # AFI Monitor Script ############################ . /db2/uszlad48/sqllib/db2profile export mondir=/home/bmwdev1/script/krishna/arc export monlog=$mondir/rcbl2_`date +%Y%m%d`.log # connect to DB db2 connect... (2 Replies)
Discussion started by: kmanivan82
2 Replies

9. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies

10. Shell Programming and Scripting

format data

i have this data: GREEN LIST : 12321 34534 GREEN LIST : 45645 --- 23423 WHITE LIST : 23479 34534 75483 76924 12345 --- 12351 56778 --- 23330 GREEN LIST : 23567 the output must be: GREEN LIST : 12321 GREEN LIST : 34534 GREEN LIST : 45645 --- 23423 (2 Replies)
Discussion started by: inquirer
2 Replies
Login or Register to Ask a Question