Using awk to find unique, how to save results?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to find unique, how to save results?
# 1  
Old 07-11-2013
Using awk to find unique, how to save results?

I am very very new to this (as in, I didn't even know awk existed till today)

I have a huuuuge csv file. In column 1, there is a ton of emails. I need to find which emails are unique, and save those rows to a separate file. I also need to find which emails are duplicates, and save a record of those too.

I have found a ton of different lines of code that can seemingly do this. Here are some I came across to find unique:
  • awk '{ a[$1]++ } END { for (b in a) { print b } }' file
  • cut -d',' -f6 file.csv | sort | uniq
  • cut -f 3 | uniq
  • $ awk '{print $4}' testfile | sort -u

However I don't know how to save the output of these things to a file. I tried adding > desktop/output.txt to the end, and while the file always appears on my desktop, it has the same amount of rows that my original csv has, nothing changes at all.

What am I doing wrong?
# 2  
Old 07-11-2013
Can you post few sample lines from your input file?
# 3  
Old 07-11-2013
All of your examples are operating on totally different columns, so I suspect that's part of the problem.

If we knew which column, we could give a more exact answer. Post your data please.
# 4  
Old 07-11-2013
Right now the csv im using to test with just contains emails.

email1@email.com
email2@email.com
email3@email.net
etc...

There are no headers, nothing else in any columns

Oh, and also, I have been changing the examples I found to match my column ($1, right?) and I have been changing the filenames to match mine of course
# 5  
Old 07-11-2013
Code:
sort file.csv | uniq -c | awk '$1==1{print $2}' > unique.csv
sort file.csv | uniq -c | awk '$1>1{print $2}' > duplicate.csv

# 6  
Old 07-11-2013
The first one displayed ~300 results, and there should be ~380k
The second one displayed nothing.

Smilie

Could there be something wrong with my csv file, maybe?
# 7  
Old 07-11-2013
Make sure your csv file isn't full of carriage returns.
Code:
tr -d '\r' < inputfile > outputfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find unique values but only in column 1

Hi All, Does anyone have any suggestions/examples of how i could show only lines where the first field is not duplicated. If the first field is listed more than once it shouldnt be shown even if the other columns make it unique. Example file : 876,RIBDA,EC2 876,RIBDH,EX7 877,RIBDF,E28... (4 Replies)
Discussion started by: mutley2202
4 Replies

2. Shell Programming and Scripting

Script to find min for each unique value

I need a script that will search through multiple files and when the first 2 columns match, print out Columns 1 and 2 and the minimum value. File 1 24.01 -81.01 1.0 24.02 -81.02 1.0 24.03 -81.03 3.0 File 2 24.01 -81.01 5.0 24.02 -81.02 3.0 24.03 -81.03 ... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

3. Shell Programming and Scripting

How to get unique of file1 from file2 and save the output?

Please help, file1.txt 1 2 3 4 5 file2.txt 3 4 5 6 7 All I need is this result.txt 1 2 (17 Replies)
Discussion started by: richmac
17 Replies

4. Shell Programming and Scripting

Awk find in columns with "if then" statement and print results

I have a file1.txt file1.txt F-120009210","Felix","U-M-F-F-F-","white","yes","no","U-M-F-F-F-","Bristol","RI","true" F-120009213","Fluffy","U-F-","white","yes","no","M-F-","Warwick","RI","true" U-120009217","Lity","U-M-","grey","yes","yes","","Fall River","MA","true"... (4 Replies)
Discussion started by: charles33
4 Replies

5. Shell Programming and Scripting

awk - find average interarrival times for each unique page

All, I have a test file as specified below. 1st col is <arrival time> and 2nd col is <Page #>. I want to find the inter-arrival time of requests for each page # (I've done this part already). Once I have this, I want to calculate the average interarrival time. Note, that I am trying to have the... (11 Replies)
Discussion started by: jontjioe
11 Replies

6. UNIX for Dummies Questions & Answers

Find unique IP address in a list

Hello, I got a list of IP address from which I would like to remove the duplicates. I cat the file and pipe it to uniq -u or uniq -c, I got the same output with all the duplicates. Can anybody please tell me how I can remove the duplicates IPs from this file? This is what I used. cat filename |... (3 Replies)
Discussion started by: Pouchie1
3 Replies

7. Shell Programming and Scripting

Perl - save results to output file.

Can any one please help, the code works...I want the output of $result to be saved in an output.txt file which is lcoated in c:\\temp\\output.txt. $filepath="C:\\temp\\ip.txt"; open (HOSTLIST,"$filepath"); @hosts=(<HOSTLIST>); foreach $host(@hosts) { $results = `nslookup... (1 Reply)
Discussion started by: sureshcisco
1 Replies

8. UNIX for Dummies Questions & Answers

Using cURL to save online search results

Hi, I'm attacking this from ignorance because I am not sure how to even ask the question. Here is the mission: I have a list of about 4,000 telephone numbers for past customers. I need to determine how many of these customers are still in business. Obviously, I could call all the numbers.... (0 Replies)
Discussion started by: jccbin
0 Replies

9. Shell Programming and Scripting

Need to find Unique not used Number

Wrote a script to create a hidden account in OS X. It works perfect but I need to check if the UID is already in use before I tried to create the account. dscl . list /Users UniqueID | awk '{print $2}' | while read UIDS do if ; then echo "UID Is Already in USE" i=`expr "$2" - 1` echo... (4 Replies)
Discussion started by: elbombillo
4 Replies

10. UNIX for Dummies Questions & Answers

find results

Hi, how can I get only useful results from find / -size 10000000 without the "Permissions denied" files ? tks C (5 Replies)
Discussion started by: Carmen123
5 Replies
Login or Register to Ask a Question