How to delete 'duplicated' column values and make a delimited file too? Post: 302980680

Sponsored Content

Top Forums Shell Programming and Scripting How to delete 'duplicated' column values and make a delimited file too? Post 302980680 by Don Cragun on Wednesday 31st of August 2016 04:11:55 PM

08-31-2016

Registered User

Hi newbie_01,
Maybe something like this would come closer to what you want. The other suggestions don't seem to convert the output to CSV format and don't seem to maintain the fixed-width fields either.

Code:

#!/bin/ksh
awk '
# Function to extract fields from an input line into an array.
function ef(line,	i) {
	# Extract fileds based on column positions and strip leading and
	# trailing whitespace.
	for(i = 1; i <= nf; i++) {
		array[i] = substr(line, P[i], L[i])
		gsub(/^[[:space:]]*|[[:space:]]*$/, "", array[i])
	}
}

# Function to print a line from an array replacing initial fields that
# duplicate data printed on the previous output line with an empty field.
function pl(	dup, i) {
	# Initialize...
	dup = 1	# Set clear duplicates flag.
	for(i = 1; i <= nf; i++) {
		if(dup && array[i] == last[i])
			printf("%s", (i == nf) ? array[i] ORS : OFS)
		else {	printf("%s%s", array[i], (i == nf) ? ORS : OFS)
			last[i] = array[i]
			dup = 0
		}
	}
}
BEGIN {	# Set output field separator before reading any input lines.
	OFS = ","
}
NR == 1 {
	# Save the heading on the 1st line of the file for later processing.
	hl = $0
	next
}
NR == 2 {
	# Calculate the number of fields, their starting positions, and their
	# lengths from the 2nd header line.
	P[1] = 1
	for(i = 1; i <= NF; i++)
		P[i + 1] = P[i] + 1 + (L[i] = length($i))
	nf = NF

	# Extract the headers from the saved header line.
	ef(hl)
	# Print header.
	pl()
	next
}
{	# Extract data from the current input line and print it.
	ef($0)
	pl()
}' file

This will skip printing any leading fields that duplicate data found on the previous line (except the last field on an input line will always be printed).

With your sample input file, this produces the output:

Code:

CHANGE,REQUESTOR,START,END,STATUS,SERVICE_NO,GROUP,RESOURCE_PERSON
153281,User AAA,2016-07-21 23:00:00,2016-07-22 01:00:00,Closed,466814,Support Number 1,Mars
153282,User ABCDE,2016-07-28 10:00:00,2016-07-28 11:00:00,Closed,466875,Linux,Martian 01
,,,,,466876,DBA,Earthling 01
153283,User BBB,2016-07-28 12:00:00,2016-07-28 15:00:00,Closed,467055,Storage,Jupiter
153286,User WXYZ,2016-07-28 18:00:00,2016-08-02 20:00:00,Closed,466877,DBA,Earthling 02
,,,,,467105,Unix,Martian 02
153287,User ABCDEF,2016-08-01 10:00:00,2016-08-01 11:00:00,Closed,466923,Linux,Martian 01
,,,,,466924,DBA,Earthling 01
153288,User XXX123456,2016-08-12 10:00:00,2016-08-12 11:00:00,Closed,466812,Linux,Martian 01
,,,,,466813,DBA,Earthling 01
153290,User XXXYYYZZZ,2016-08-15 18:30:00,2016-08-15 19:30:00,Closed,467098,Linux,Martian 01
,,,,,467099,DBA,Earthling 01

You haven't said what operating system you're using. If you want to run this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk. (Note that neither awk nor nawk on a Solaris system will not work with this script.)

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete parts of a string of character in one given column of a tab delimited file

I would like to remove characters from column 7 so that from an input file looking like this: >HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD I get something like that in an output file: ...

2. Shell Programming and Scripting

Delete first column in tab-delimited text-file

I have a large text-file with tab-delimited genetic data that looks like: KSC112 KSC234 0 0 1 1 A G C T I simply wan to delete the first column, but since the file has 600 000 columns, it is not possible with awk (seems to be limited at 32k columns). Does anyone have an idea how to do this?

3. Shell Programming and Scripting

Changing one column of delimited file column to fixed width column

Hi, Iam new to unix. I have one input file . Input file : ID1~Name1~Place1 ID2~Name2~Place2 ID3~Name3~Place3 I need output such that only first column should change to fixed width column of 15 characters of length. Output File: ID1<<12 spaces>>Name1~Place1 ID2<<12...

4. UNIX for Dummies Questions & Answers

How do you delete cells from a space delimited text file given row and column number?

How do you delete cells from a space delimited text file given row and column number? Letś say the row number is r and the column number is c. Thanks!

5. UNIX for Dummies Questions & Answers

Extracting rows from a space delimited text file based on the values of a column

I have a space delimited text file. I want to extract rows where the third column has 0 as a value and write those rows into a new space delimited text file. How do I go about doing that? Thanks!

6. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks.

7. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is...

8. UNIX for Dummies Questions & Answers

Sort csv file by duplicated column value

hello, I have a large file (about 1gb) that is in a file similar to the following: I want to make it so that I can put all the duplicates where column 3 (delimited by the commas) are shown on top. Meaning all people with the same age are listed at the top. The command I used was ...

9. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of...

10. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ...