I have a short line of code that checks very rudimentary for duplicate code:
It sorts the file, counts occurrences of each line, removes single occurrences and removes the ubiquitous closing brace. The language is C++, but is easily extensible to other programming languages.
I would like to make this a bit more advanced. A few examples:
1- Allow for spaces, so that the following lines of output are considered identical:
2- Allow for spaces within the code, so that the following lines of output are considered identical:
If there are easy ways to fix this, I like to hear from you.
I am deliberately not excluding lines of comment, such as those containing "/*" or "*/" or "//", as this would reduce the case to tell developers to document their code better.
Any other one-liner ideas to check for duplicate code are also welcome.
What is your idea of duplicate code? I'm sure you can not script out duplicate code and still keep a functioning, logically ordered execution path by doing that.
I'd start with removing blanks before doing anything else. In C/C++ blanks can only serve two functions: to make code easier to read (indentation) or in output (like "printf( " \n");"). Replace in the following "<spc>" and "<tab>" with literal space/tab characters.
This removes any space or tab character from the source, including indentation.
An idea you might want to follow is to concatenate lines which do not end in a closing brace or semicolon. Consider the following two lines:
They are equal to the compiler, but your procedure would count them as different.
You can do this concatenation with a regexp, but it involves a little hold space / pattern space gymnastics:
What it does (i suggest you get a sed-reference if you don't feel familiar with this): at first, all the spaces/tabs are deleted in the first line. Then there are 3 types of lines to handle:
The last line is covered first in the paragraph "$ {..". The content of the hold space is exchanged with the pattern space, then the content of the hold space (the former pattern space content) is copied to the end of the pattern space - we concatenate the line with the former read lines. Next, all the line feeds are deleted (s/\n//g) and the line is printed out, then we quit.
The next type of lines are the ones ending either with a ";" with a "{" or "}". (Braces end expressions too). We do practically the same as with the last line, but after printing the line to output we clear the pattern space and hold space to "flush the buffers". Otherwise portions of the text would be duplicated.
The last type of lines are the one which don't end on braces or semi-colons. We append their content to the hold space, delete the pattern space and start over with the next line.
So, in principle, we are collecting text in the hold space and flush that out on specific occasions (whenever we feel a "program line" is completely read).
I hope this helps.
bakunin
Last edited by bakunin; 06-03-2012 at 01:53 PM..
These 2 Users Gave Thanks to bakunin For This Post:
I have a job that produces a file of barcodes that gets added to every time the job runs
I want to check the list to see if the barcode is already in the list and report it out if it is. (3 Replies)
Hi,
I am writing the shell script in ksh to check certain no of files exists,In my case there are 7 files exist like below
Sales1_timstamp.csv
Sales2_timstamp.csv
Sales3_timstamp.csv
Sales4_timstamp.csv
Sales5_timstamp.csv
Sales7_timstamp.csv
Sales7_timstamp.csv
Once all the files... (4 Replies)
Hi all
I have a big file like this in rows and columns from 2 column onwards the next column is desciption of previous column means 3rd columns is description of 2 columns and 5 column is description of 4 column.
All cloumns are separated by comma
... (1 Reply)
Hi,
In a previous, now closed thread, I found the following awk script:
awk '{t=$5" "$6" "$7}END{for (i in t){print i,t}}'
This code does a great job of removing duplicates by the the first four fields from a 7-field set of columns. I would very very much like to understand how this code... (3 Replies)
I am very new to bash scripting and this is my first script.
I am trying to write a script that takes an argument d as the directory.
It looks through the files to find duplicates and delete them.
Here's some sorta-pseudocode but am unsure how to implement it:
#! /bin/bash
#get... (1 Reply)
i was just wondering how would you check , beside the lock method, if an instance of another code is already running and if it is then output a message to the user saying the program is already running and exit!! the code is in BOURNE SHELLL!!!
thanks in advance!! (3 Replies)
Hi i have a file like
110.10
120.10
-1120
110.10
and the lines are having more than 10k.
do we have anycommand to check the duplicate entries in the file.
I applied the while loop by greping each line with whole file,
but it is taking huge amount of time as the file size is large.
... (5 Replies)
hi
i have a file, i am reading line by line and checking a line contains a string ,
`grep "Change state" $LINE`
if
then
echo "The line contains---"
else
echo "The line does not contains---"
i need to check the return code , but i am getting an error
... (4 Replies)
I'm trying to create a directory from my Perl script. Only if the there was an error I want to let the user know about it. So if the folder exists is ok.
This is what I think should work:
`mkdir log 2>/dev/null`;
if($? == 0 || $? == errorCodeForFileExists)
{ everyting is fine }
else
{... (3 Replies)
I am trying to set up a variable based on the name of the file.
function script_name {
if
then
job_name='MONITOR'
return job_name;
elsif
then
job_name='VERSION'
return job_name
fi
}
for i in `ls *log`
do
script_name $i
done. (4 Replies)