specifically, I'm needing to look for duplicates in column 3 in csv file, if a duplicate is found, remove "lines" based on duplicates found in column 3. In the instance above line two is removed or filtered.
Does anyone know if the unix uniq command can be utilized or perl? uniq doesn't seen to have a delimiter flag to use only character count or bit.
awk has associative arrays - the key for the mail array is field #3 ($3).
The first time $3 shows up the value of mail[$3] is zero, mail[$3]++ increments that array element to one. The next time $3 is found to have a value of 1. It does not print.
!mail[$3] only evaluates true when mail[$3] == 0, so when it is 1, 2 ,3 ... it evaluates as false.
This seemed to work but I noticed that there seem to be a few duplicated left behind. How does the array know what the delimiter? $3 is the field, but not clear on delimiter. Would the same work with tabs for delimiter?