Duplicate rows in a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Duplicate rows in a text file
# 1  
Old 03-08-2011
Duplicate rows in a text file

notes: i am using cygwin and notepad++ only for checking this and my OS is XP.
Code:
#!/bin/bash
typeset -i totalvalue=(wc -w /cygdrive/c/cygwinfiles/database.txt)
typeset -i totallines=(wc -l /cygdrive/c/cygwinfiles/database.txt)
typeset -i columnlines=`expr $totalvalue / $totallines`
awk -F' ' -v columnlines=$columnlines '{ if($1==$columnlines) {print $0} }' /cygdrive/c/cygwinfiles/database.txt

this is my first script construction so kindly pls bear with me. i just need ur help. the:

totalvalue is the number of values in the data
totallines is the number of lines in the data
these 2 are needed to count total columns
(pretty lame script and very basic since i dont know much)

if i have a data file who looks like:
Code:
aaa bbb ccc aaa
ccc eee ggg hhh
eee bbb eee eee

will return rows that have duplicates so, the output is
Code:
aaa bbb ccc aaa <two aaa's>
eee bbb eee eee <two eee's>

any help would be appreciated.

ERRORS are returned and I think the errors are in the variables. they seem not to be recognized as integers.
i am returning an error with the msg ")division by 0 (error token is "/c/cygwinfiles/database.txt)

---------- Post updated at 06:17 PM ---------- Previous update was at 06:15 PM ----------

the returned 2nd row contains 3 eee's (sorry for that)

Last edited by Franklin52; 03-08-2011 at 07:29 AM.. Reason: Please use code tags
# 2  
Old 03-08-2011
Code:
awk '{for (i=1;i<=NF;i++) {if ($i in a) {print;break} else {a[$i]}};delete a}' infile

This User Gave Thanks to rdcwayx For This Post:
# 3  
Old 03-08-2011
try this AWK file,you can use it by:
Code:
awk -f awkfile inputfile

Code:
 {
  2    for(i=1;i<=4;i++)a[$i]++
  3    if(a[$1]+a[$2]+a[$3]+a[$4] > 4)
  4       printf "%s <",$0;
  5    for(i=1;i<=4;i++){
  6       if(a[$i]>2){
  7          printf "%d %s's ",a[$i],$i
  8          break;
  9       }else if(a[$i] == 2 && $i != save){
 10          printf "%d %s's ",a[$i],$i
 11          save=$i
 12       }
 13    }
 14    if(a[$1]+a[$2]+a[$3]+a[$4] > 4)
 15       printf ">\n"
 16    delete a
 17    save=""
 18 }

This User Gave Thanks to homeboy For This Post:
# 4  
Old 03-08-2011
Code:
awk '{for (i=1;i<=NF;i++) {if ($i in a) {print;break} else {a[$i]}};delete a}' infile

woah! it worked like a charm! now what ima do now is just to educate myself about these codes. thank you very much rdcwayx!

@homeboy
im thankful also for helping out. i just want to know y i cant't run properly bash scripts in cgywin. ima try this at once and find some program for me for running this in xp.

thank you very much guys.

---------- Post updated at 10:40 PM ---------- Previous update was at 08:01 PM ----------

now i have this odd assumption.
if the data were to be
Code:
As1d Pooa1 982ah
ghqyqt1 ss92 a82ss
Bg1ja Bg1ja 13ss

how can i achieve an output of
Code:
Bg1ja Bg1ja 13ss

meaning that line is duplicate

this in a sense assuming all Alphanumeric chars are used instead of small letters only.
will i use [A-Za-z0-9]? how will i inject it to the code?
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 03-08-2011 at 11:29 AM.. Reason: Please use code tags
# 5  
Old 03-08-2011
Not really understand, with my code, I still can get the line:

Code:
Bg1ja Bg1ja 13ss

Do you ask for case insensitive ?

Code:
awk '{for (i=1;i<=NF;i++) {if (tolower($i) in a) {print;break} else {a[tolower($i)]}};delete a}' infile

This User Gave Thanks to rdcwayx For This Post:
# 6  
Old 03-08-2011
Code:
awk '{for (i=1;i<=NF;i++) {if (tolower($i) in a) {print;break} else {a[tolower($i)]}};delete a}' infile

this is perfect!

this would help me a lot for my database learning in unix.

so it actually analyzes the values as lower case but prints the line itself. thank you again rdcwayx!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get duplicate rows from a csv file

How can i get the duplicates rows from a file using unix, for example i have data like a,1 b,2 c,3 d,4 a,1 c,3 e,5 i want output to be like a,1 c,3 (4 Replies)
Discussion started by: ggupta
4 Replies

2. Shell Programming and Scripting

Removing Duplicate Rows in a file

Hello I have a file with contents like this... Part1 Field2 Field3 Field4 (line1) Part2 Field2 Field3 Field4 (line2) Part3 Field2 Field3 Field4 (line3) Part1 Field2 Field3 Field4 (line4) Part4 Field2 Field3 Field4 (line5) Part5 Field2 Field3 Field4 (line6) Part2 Field2 Field3 Field4... (7 Replies)
Discussion started by: ekbaazigar
7 Replies

3. Shell Programming and Scripting

Duplicate each field in a text file

Hello all, I am searching for a solution to the following problem: Given input such as this: I would like to find a way to output this: Thanks in advance! (4 Replies)
Discussion started by: hydrabane
4 Replies

4. Shell Programming and Scripting

To remove date and duplicate rows from a log file using unix commands

Hi, I have a log file having size of 48mb. For such a large log file. I want to get the message in a particular format which includes only unique error and exception messages. The following things to be done : 1) To remove all the date and time from the log file 2) To remove all the... (1 Reply)
Discussion started by: Pank10
1 Replies

5. HP-UX

How to get Duplicate rows in a file

Hi all, I have written one shell script. The output file of this script is having sql output. In that file, I want to extract the rows which are having multiple entries(duplicate rows). For example, the output file will be like the following way. ... (7 Replies)
Discussion started by: raghu.iv85
7 Replies

6. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

7. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies

8. Shell Programming and Scripting

how to delete duplicate rows in a file

I have a file content like below. "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","",""... (5 Replies)
Discussion started by: vamshikrishnab
5 Replies

9. Shell Programming and Scripting

duplicate rows in a file

hi all can anyone please let me know if there is a way to find out duplicate rows in a file. i have a file that has hundreds of numbers(all in next row). i want to find out the numbers that are repeted in the file. eg. 123434 534 5575 4746767 347624 5575 i want 5575 please help (3 Replies)
Discussion started by: infyanurag
3 Replies

10. Shell Programming and Scripting

duplicate line in a text file

i would like to scan file in for duplicate lines, and print the duplicates to another file, oh and it has to be case insensitive. example line1 line2 line2 line3 line4 line4 outputfile: line2 line4 any ideas (5 Replies)
Discussion started by: nixguy
5 Replies
Login or Register to Ask a Question