Help with file processing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with file processing using awk
# 8  
Old 03-05-2016
Quote:
Originally Posted by looney
Code:
{ read A; read B; printf "%s\n" "$A" "$B";  sort; } < file | awk '!T[$1]++'

Hi RudiC,

Could you please explain the code and the flow.
The first part:
Code:
{ read A; read B; printf "%s\n" "$A" "$B";  sort; } < file

reads the first line from file into the shell variable A (read A) and the second line from file into the shell variable B (read B), reprints those two lines unchanged (printf '%s\n' "$A" "$B" and sorts the remaining lines from file (sort). This sorts the data in file while keeping the headers (unsorted) at the start of the output.

All of the output from the first part is then piped (|) into the second part:
Code:
awk '!T[$1]++'

which creates an array indexed by the first field on each input line (T[$1]) and sets the value of that array to the number of times that index has been seen (after using the previous value of the variable). In awk, statements consist of a condition section and an action section. In this case the condition is !T[$1]++ and there is no action section. When there is no action section, the default action (print the line) is performed if, and only if, the condition evaluates to a non-zero numeric value or a non-empty string value. Since undefined elements of an array are treated as a zero or an empty string (depending on context), the condition !T[$1]++ is evaluated as follows the first time a line is seen with a given string in the first field:
  1. T[$1] is seen as a zero value
  2. !T[$1] yields a value of 1 for the condition
  3. T[$1]++ increments the value of the array element to 1
and since the condition's value is non-zero, the line is printed. On subsequent lines with the same first field the evaluation is:
  1. T[$1] is seen as a non-zero value (a count of the number of times this value has been seen before)
  2. !T[$1] yields a value of 0 for the condition
  3. T[$1]++ increments the value of the array element
and since the condition's value is zero, the line is not printed.

Note that the sort in the first part grouped all lines with the same 1st field together and (if both an "add" and a "delete" appeared for that value, the line(s) with "add" for that 1st field value will come before any line(s) with "delete".

Nice, simple, straight-forward magic. The only likely problem with this approach is that if the data in your file contains a 1st field value that is identical to the 1st field in one of the header lines, those data lines will be lost. But, as long as you don't have a user named "user", that shouldn't be a problem for you.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 9  
Old 03-06-2016
try also:
Code:
awk 'NR==FNR {a[$1]++; next} $2 ~ /add/ {print; a[$1]=0} ++b[$1]==a[$1]' infile infile

# 10  
Old 03-06-2016
Note that if you don't care about the order of the data lines in the output, you can also try something like:
Code:
awk '
NR < 3 {# Copy header lines unchanged.
	print
	next
}
!($1 in A) && $2 == "add" {
	# Print 1st occurrence of an "add" line for each user.
	print
	A[$1]
	next
}
!($1 in A) && !($1 in D) && $2 == "delete" {
	# If user does not have an "add" line yet and does not have a "delete"
	# line yet, capture the 1st "delete" line.
	D[$1] = $0
}
END {	# After we reach EOF on the input file, print captured "delete" lines
	# for users who did not have an "add" line.
	for(u in D)
		if(!(u in A))
			print D[u]
}' file

which doesn't need sort and only reads the input file once. (It will print all users with "add" in the 2nd field in the order in which they were first found in the input file followed by users who have no "add" but do have a "delete" in the 2nd field in random order.) With your latest sample input file, the output will be:
Code:
user role  location eid
-------------------------
AAA  add  UK  1
DDD  add  FR  3
BBB  add  IN  4
CCC  delete  AU  2


Last edited by Don Cragun; 03-06-2016 at 04:12 PM.. Reason: Remove single quotes in comments.
# 11  
Old 03-06-2016
Combining MadeInGermany's modification in post #4 of my suggestion with the bit that saves the header, you could also try:
Code:
awk 'FNR<3{print; next} !($1 in A) || $2=="add" {A[$1]=$0} END{for(i in A) print A[i]}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Processing a formatted file with awk

Hi - I want to interrogate information about my poker hands, sessions are all recorded in a text file in a particular format. Each hand starts with the string <PokerStars> followed by a unique hand reference and other data like date/time. There is then all the information about each hand. My first... (5 Replies)
Discussion started by: rbeech23
5 Replies

2. UNIX for Dummies Questions & Answers

awk - Rename output file, after processing, same as input file

I have one input file ABC.txt and one output DEF.txt. After the ABC is processed and created output, I want to rename ABC.txt to ABC.orig and DEF to ABC.txt. Currently when I am doing this, it does not process the input file as it cannot read and write to the same file. How can I achieve this? ... (12 Replies)
Discussion started by: High-T
12 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. Shell Programming and Scripting

Help with File Processing (AWK)

Input File: 1234, 2345,abc 1,24141,gw 222,rff,sds 2232145,sdsd,121 Output file to be generated: 000001234,2345,abc 000000001,24141,gw 000000222,rff,sds 002232145,sdsd,121 i.e; the first column is padded to get 9 digits. I tried with following: (3 Replies)
Discussion started by: karumudi7
3 Replies

5. Shell Programming and Scripting

Help with File Processing (AWK)

Input File: 1234, 2345,abc 1,24141,gw 222,rff,sds 2232145,sdsd,121 Output file to be generated: 000001234,2345,abc 000000001,24141,gw 000000222,rff,sds 002232145,sdsd,121 i.e; the first column is padded to get 9 digits. I tried with following: (1 Reply)
Discussion started by: karumudi7
1 Replies

6. Programming

AWK processing of a three-column file

I have a 3-column data file, for which I wish to print certain parts of $3 PHI PSI A(x) -177.5 -177.5 1.0625 -177.5 -172.5 0.55 -177.5 -167.5 0.0478125 -177.5 -162.5 0 -177.5 -157.5 0.284375 -177.5 -152.5 0.187188 -177.5 -147.5 0.236875 -177.5 -142.5 0.383438 -177.5 ... (3 Replies)
Discussion started by: chrisjorg
3 Replies

7. Shell Programming and Scripting

awk help in processing file.

I am trying to process file which has following data #23456789012345 ACNASPSA13N0N0 ACNAPCPA05N0N0 ACNAFATS11N0N0 I want to take out each line from the file and what to put in the file by name which if part of the line starting from offset 10 to 15. It means I want to create three file... (3 Replies)
Discussion started by: ekb
3 Replies

8. UNIX for Dummies Questions & Answers

poor performance processing file with awk

Hello, I'm running a script on AIX to process lines in a file. I need to enclose the second column in quotation marks and write each line to a new file. I've come up with the following: #!/bin/ksh filename=$1 exec >> $filename.new cat $filename | while read LINE do echo $LINE | awk... (2 Replies)
Discussion started by: scooter53080
2 Replies

9. Shell Programming and Scripting

how to change the current file processing to some other random file in awk ?

Hello, say suppose i am processing an file emp.dat the field of which are deptno empno empname etc now say suppose i want to change the file to emp.lst then how can i do it? Here i what i attempted but in vain BEGIN{ system("sort emp.dat > emp.lst") FILENAME="emp.lst" } { print... (2 Replies)
Discussion started by: salman4u
2 Replies

10. Shell Programming and Scripting

processing a file with sed and awk

Hello, I have what is probably a simple task in text manipulation, but I just can't wrap my brain around it. I have a text file that looks something like the following. Note that some have middle initials in the first field and some don't. john.r.smith:john.smith@yahoo.com... (4 Replies)
Discussion started by: manouche
4 Replies
Login or Register to Ask a Question