Checking for duplicate code


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Checking for duplicate code
# 1  
Old 06-01-2012
Checking for duplicate code

I have a short line of code that checks very rudimentary for duplicate code:
Code:
sort myfile.cpp | uniq -c | grep -v "^.*1 " | grep -v "}"

It sorts the file, counts occurrences of each line, removes single occurrences and removes the ubiquitous closing brace. The language is C++, but is easily extensible to other programming languages.

I would like to make this a bit more advanced. A few examples:

1- Allow for spaces, so that the following lines of output are considered identical:
Code:
   2     for (i = 0; i < N; i++) {
   2        for (i = 0; i < N; i++) {

2- Allow for spaces within the code, so that the following lines of output are considered identical:
Code:
   2     for (i = 0; i < N; i++) {
   2     for ( i = 0; i < N; i++ ) {

If there are easy ways to fix this, I like to hear from you.

I am deliberately not excluding lines of comment, such as those containing "/*" or "*/" or "//", as this would reduce the case to tell developers to document their code better.

Any other one-liner ideas to check for duplicate code are also welcome.
# 2  
Old 06-02-2012
What is your idea of duplicate code? I'm sure you can not script out duplicate code and still keep a functioning, logically ordered execution path by doing that.
# 3  
Old 06-03-2012
I want to be able to spot code that is a candidate for refactoring. There is no intention to script out lines of code.
# 4  
Old 06-03-2012
I'd start with removing blanks before doing anything else. In C/C++ blanks can only serve two functions: to make code easier to read (indentation) or in output (like "printf( " \n");"). Replace in the following "<spc>" and "<tab>" with literal space/tab characters.

Code:
sed 's/[<spc><tab>]*//g'

This removes any space or tab character from the source, including indentation.

An idea you might want to follow is to concatenate lines which do not end in a closing brace or semicolon. Consider the following two lines:

Code:
a=b+c;

a =
b + c;

They are equal to the compiler, but your procedure would count them as different.

You can do this concatenation with a regexp, but it involves a little hold space / pattern space gymnastics:

Code:
sed -n 's/[<spc><tab>]*//g
     $ { x
         G
         s/\n//g
         p
         q
       }
     /[;{}]$/ {
            x
            G
            s/\n//g
            p
            s/.*//
            x
            d
          }
     /[;{}]$/! {
            H
            d
           }' /path/to/input

What it does (i suggest you get a sed-reference if you don't feel familiar with this): at first, all the spaces/tabs are deleted in the first line. Then there are 3 types of lines to handle:

The last line is covered first in the paragraph "$ {..". The content of the hold space is exchanged with the pattern space, then the content of the hold space (the former pattern space content) is copied to the end of the pattern space - we concatenate the line with the former read lines. Next, all the line feeds are deleted (s/\n//g) and the line is printed out, then we quit.

The next type of lines are the ones ending either with a ";" with a "{" or "}". (Braces end expressions too). We do practically the same as with the last line, but after printing the line to output we clear the pattern space and hold space to "flush the buffers". Otherwise portions of the text would be duplicated.

The last type of lines are the one which don't end on braces or semi-colons. We append their content to the hold space, delete the pattern space and start over with the next line.

So, in principle, we are collecting text in the hold space and flush that out on specific occasions (whenever we feel a "program line" is completely read).

I hope this helps.

bakunin

Last edited by bakunin; 06-03-2012 at 01:53 PM..
These 2 Users Gave Thanks to bakunin For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Iterate through a list - checking for a duplicate then report it ot

I have a job that produces a file of barcodes that gets added to every time the job runs I want to check the list to see if the barcode is already in the list and report it out if it is. (3 Replies)
Discussion started by: worky
3 Replies

2. UNIX for Beginners Questions & Answers

Code for checking if certain no of files exists

Hi, I am writing the shell script in ksh to check certain no of files exists,In my case there are 7 files exist like below Sales1_timstamp.csv Sales2_timstamp.csv Sales3_timstamp.csv Sales4_timstamp.csv Sales5_timstamp.csv Sales7_timstamp.csv Sales7_timstamp.csv Once all the files... (4 Replies)
Discussion started by: SRPR
4 Replies

3. Shell Programming and Scripting

REMOVE DUPLICATE IN a ROW AFTER CHECKING THE FIRST SIMILAR NAME

Hi all I have a big file like this in rows and columns from 2 column onwards the next column is desciption of previous column means 3rd columns is description of 2 columns and 5 column is description of 4 column. All cloumns are separated by comma ... (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

awk remove duplicate code

Hi, In a previous, now closed thread, I found the following awk script: awk '{t=$5" "$6" "$7}END{for (i in t){print i,t}}' This code does a great job of removing duplicates by the the first four fields from a 7-field set of columns. I would very very much like to understand how this code... (3 Replies)
Discussion started by: pawelrc
3 Replies

5. UNIX for Dummies Questions & Answers

Nested for loops for checking duplicate files

I am very new to bash scripting and this is my first script. I am trying to write a script that takes an argument d as the directory. It looks through the files to find duplicates and delete them. Here's some sorta-pseudocode but am unsure how to implement it: #! /bin/bash #get... (1 Reply)
Discussion started by: shubham92
1 Replies

6. Shell Programming and Scripting

code checking

i was just wondering how would you check , beside the lock method, if an instance of another code is already running and if it is then output a message to the user saying the program is already running and exit!! the code is in BOURNE SHELLL!!! thanks in advance!! (3 Replies)
Discussion started by: bshell_1214
3 Replies

7. Shell Programming and Scripting

checking duplicate entry in file

Hi i have a file like 110.10 120.10 -1120 110.10 and the lines are having more than 10k. do we have anycommand to check the duplicate entries in the file. I applied the while loop by greping each line with whole file, but it is taking huge amount of time as the file size is large. ... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

8. Shell Programming and Scripting

checking the return code

hi i have a file, i am reading line by line and checking a line contains a string , `grep "Change state" $LINE` if then echo "The line contains---" else echo "The line does not contains---" i need to check the return code , but i am getting an error ... (4 Replies)
Discussion started by: Satyak
4 Replies

9. Shell Programming and Scripting

Error code checking

I'm trying to create a directory from my Perl script. Only if the there was an error I want to let the user know about it. So if the folder exists is ok. This is what I think should work: `mkdir log 2>/dev/null`; if($? == 0 || $? == errorCodeForFileExists) { everyting is fine } else {... (3 Replies)
Discussion started by: jepombar
3 Replies

10. Shell Programming and Scripting

Code checking for all values in the same if statement.

I am trying to set up a variable based on the name of the file. function script_name { if then job_name='MONITOR' return job_name; elsif then job_name='VERSION' return job_name fi } for i in `ls *log` do script_name $i done. (4 Replies)
Discussion started by: oracle8
4 Replies
Login or Register to Ask a Question