Remove lines with duplicate first field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove lines with duplicate first field
# 1  
Old 03-17-2012
Remove lines with duplicate first field

Trying to cut down the size of some log files. Now that I write this out it looks more dificult than i thought it would be.

Need a bash script or command that goes sequentially through all lines of a file, and does this:

if field1 (space separated) is the number 2012 print the entire line. Do this DEFINITELY ALWAYS.


if field1 is not the number 2012, follow this rule:

if field1 of current line is same as field1 of previous line, DONT print the line, otherwise DO print the line.


Another way of saying the rule is:
only if field1 of current line is DIFFERENT than field1 of the previous line, print entire line (except 2012, always print lines with 2012 for field1)
# 2  
Old 03-17-2012
Please provide a sample input and desired output.
# 3  
Old 03-18-2012
Per the description:

Code:
awk ' /^2012 / { print; x = "";  next; } $1 != x { x = $1; print; } ' input-file >output-file

# 4  
Old 03-18-2012
sample input/output

---

input
Code:
2012 aaa bbb cccc ddd
2012 eee fff ggg hhh
XYZ aaa bbb ccc ddd
XYZ eee fff ggg hhh <---remove this line
2012 hhh iii jjj
2012 hhh iii 123
ABC mmm nnn ooo
ABC ppp qqq rrr <---remove this line
ABC www xxx yyy <--remove this line
2012 mmm nnn ooo
ABC sss ttt uuu

output
Code:
2012 aaa bbb cccc ddd
2012 eee fff ggg hhh
XYZ aaa bbb ccc ddd
2012 hhh iii jjj
2012 hhh iii 123
ABC mmm nnn ooo
2012 mmm nnn ooo
ABC sss ttt uuu


---
It keeps lines that start with 2012 but gets rid of lines where field1 is the same as field1 of the previous line.

---
Also, thank you agama for the code I will check it out on my data. Really appreciated the replies!! Yall are so awesome! :-)

Last edited by Franklin52; 03-18-2012 at 11:54 AM.. Reason: Please use code tags for data and code samples, thank you
# 5  
Old 03-18-2012
Try this:
Code:
awk '!/^2012/ && $1==s{s=$1;next}{s=$1}1' file

# 6  
Old 03-18-2012
Code:
awk '/^2012/ || $1!=p; {p=$1}' infile

# 7  
Old 03-18-2012
PERFECT

Ok, these are all great! AWK is great for this. It amazes me how smart and elegant everyone is with awk.

Now, what if I want to print a number for each time a duplicate field1 was removed? For example, the above output would be something like:
Code:
2012 aaa bbb cccc ddd
2012 eee fff ggg hhh
XYZ aaa bbb ccc ddd (1)
2012 hhh iii jjj
2012 hhh iii 123
ABC mmm nnn ooo (3)
2012 mmm nnn ooo
ABC sss ttt uuu

Probably going to be dificult and require more of a script than a command. But I still am very pleased with the awk. Thanks everyone TOO MUCH!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. UNIX for Dummies Questions & Answers

Using awk to remove duplicate line if field is empty

Hi all, I've got a file that has 12 fields. I've merged 2 files and there will be some duplicates in the following: FILE: 1. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, 100 2. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, (EMPTY) 3. CDC, 54321, TEST3,... (4 Replies)
Discussion started by: tugar
4 Replies

3. UNIX for Dummies Questions & Answers

Remove Duplicate Lines

Hi I need this output. Thanks. Input: TAZ YET FOO FOO VAK TAZ BAR Output: YET VAK BAR (10 Replies)
Discussion started by: tara123
10 Replies

4. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

5. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

6. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

7. Shell Programming and Scripting

Remove duplicate lines

Hi, I have a huge file which is about 50GB. There are many lines. The file format likes 21 rs885550 0 9887804 C C T C C C C C C C 21 rs210498 0 9928860 0 0 C C 0 0 0 0 0 0 21 rs303304 0 9941889 A A A A A A A A A A 22 rs303304 0 9941890 0 A A A A A A A A A The question is that there are a few... (4 Replies)
Discussion started by: zhshqzyc
4 Replies

8. Shell Programming and Scripting

Remove duplicate lines (the first matching line by field criteria)

Hello to all, I have this file 2002 1 23 0 0 2435.60 131.70 5.60 20.99 0.89 0.00 285.80 2303.90 2002 1 23 15 0 2436.60 132.90 6.45 21.19 1.03 0.00 285.80 2303.70 2002 1 23 ... (6 Replies)
Discussion started by: joggdial3000
6 Replies

9. Shell Programming and Scripting

how to remove duplicate lines

I have following file content (3 fields each line): 23 888 10.0.0.1 dfh 787 10.0.0.2 dssf dgfas 10.0.0.3 dsgas dg 10.0.0.4 df dasa 10.0.0.5 df dag 10.0.0.5 dfd dfdas 10.0.0.5 dfd dfd 10.0.0.6 daf nfd 10.0.0.6 ... as can be seen, that the third field is ip address and sorted. but... (3 Replies)
Discussion started by: fredao
3 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question