Sponsored Content
Top Forums Shell Programming and Scripting finding duplicate files by size and finding pattern matching and its count Post 302098149 by jerome Sukumar on Friday 1st of December 2006 02:38:07 AM
Old 12-01-2006
finding duplicate files by size and finding pattern matching and its count

Hi,

I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern.

Note:These are the samples of two files,but i can have more duplicate and original pairs.

Input:
------
File_1 and File_2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
???????????????????????????????????
Name=Jerome
City=chicago
Name/city:Jerome-Chicago
Address#???????????????????
Place:/Chicago
counry::/US

Name=John
City=Detroit
Name/city:John-Detroit
Address#???????????????????
Place:/Detroit
counry::/US

Name=Josephine
City=Chicago
Name/city:Josephine-Chicago
Address#???????????????????
counry::/US

Check1:
------------
-rwxrwxrwx 1 tstibill tstibill 374 Dec 1 13:03 File1
-rwxrwxrwx 1 tstibill tstibill 374 Dec 1 13:02 File2

374 bytes

Check 2:
-----------
take anyone file suppose File_1 and find the pattern and count for
Name/city:
Address#
Place:/
counry::/

Output
----------
pattern,count,filename
Name/city:,3,File_1
Address#,3,File_1
Place:/,2,File_1
counry::/,3,File_1


I hope,I didnt confuse anyone
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding cumulative size of files older than certain days

Hi All, I've got a ton of files in a particular directory. I want to find pdf files older than 30 days in that directory and then the cumulative size of those files. Ex: find /home/jk/a -name "*.pdf" -mtime +30 consider it finds the below 4 files. /home/jk/a/1.pdf /home/jk/a/2.pdf... (1 Reply)
Discussion started by: rohan076
1 Replies

2. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

3. Shell Programming and Scripting

Finding conserved pattern in different files

Hi power user, For examples, I have three different files: file 1: file2: file 3: AAA CCC ZZZ BBB BBB CCC CCC DDD DDD DDD TTT AAA EEE AAA XXX I... (8 Replies)
Discussion started by: anjas
8 Replies

4. UNIX for Dummies Questions & Answers

finding all files that do not match a certain pattern

I hope I'm asking this the right way -- I've been sending out a lot of resumes and some of them I saw on Craigslist -- so I named the file as 'Craigslist -- (filename)'. Well I noticed that at least one of the files was misspelled as 'Craigslit.' I want to eventually try to write a shell... (5 Replies)
Discussion started by: Straitsfan
5 Replies

5. UNIX for Dummies Questions & Answers

Finding the files and count then

Hi i was trying to find the files which are not older than one day and copy them to other location . but i need to count the number of files and the copy them if the count is matches my number A=`find $SOURCE/* -type f -mtime -1 ` in the code above i need to count the number of file A has... (8 Replies)
Discussion started by: vikatakavi
8 Replies

6. Shell Programming and Scripting

Finding size of files with spaces in their file names

I am running a UNIX script to get unused files and their sizes from the server. The issue is arising due to the spaces present in the filename/folder names.Due to this the du -k command doesn't work properly.But I need to calculate the size of all files including the ones which have spaces in them.... (4 Replies)
Discussion started by: INNSAV1
4 Replies

7. Programming

Finding duplicate files in two base directories

Hello All, I have got some assignment to complete till this Monday and problem statement is as follow :- Problem :- Find duplicate files (especially .c and .cpp) from two project base directories with following requirement :- 1.Should be extendable to search in multiple base... (4 Replies)
Discussion started by: anand.shah
4 Replies

8. Shell Programming and Scripting

Finding all files based on pattern

Hi All, I need to find all files in a directory which are containing specific pattern. Thing is that file name should not consider if pattern is only in commented area. all contents which are under /* */ are commented all lines which are starting with -- or if -- is a part of some sentence... (13 Replies)
Discussion started by: Lakshman_Gupta
13 Replies

9. Shell Programming and Scripting

Finding matching patterns in two files

Hi, I have requirement to find the matching patterns of two files in Unix. One file is the log file and the other is the error list file. If any pattern in the log file matches the list of errors in the error list file, then I would need to find the counts of the match. For example, ... (5 Replies)
Discussion started by: Bobby_2000
5 Replies

10. Shell Programming and Scripting

Finding lines of specific size in files using sed

i am using sed to detect any lines that are not exactly 21. the following gives me the lines that ARE exactly 21. i want the opposite , i want the two lines that are not size 21 (shown in bold) type a.a 000008050110010201NNN 000008060810010201NNN 21212000008070110010201NNN... (5 Replies)
Discussion started by: boncuk
5 Replies
comm(1) 						      General Commands Manual							   comm(1)

NAME
comm - Compares two sorted files. SYNOPSIS
comm [-123] file1 file2 STANDARDS
Interfaces documented on this reference page conform to industry standards as follows: command: XCU5.0 Refer to the standards(5) reference page for more information about industry standards and associated tags. OPTIONS
Suppresses output of the first column (lines in file1 only). Suppresses output of the second column (lines in file2 only). Suppresses output of the third column (lines common to file1 and file2). The command comm -123 produces no output. OPERANDS
A pathname of the first file to be compared. If file1 is a hyphen (-), the standard input is used. A pathname of the second file to be compared. If file2 is a hyphen (-), the standard input is used. If both file1 and file2 refer to standard input or to the same FIFO special, block special or character special file, the results are unde- fined. DESCRIPTION
The comm command reads file1 and file2 and writes three columns to standard output, showing which lines are common to the files and which are unique to each. The leftmost column of standard output includes lines that are in file1 only. The middle column includes lines that are in file2 only. The rightmost column includes lines that are in both file1 and file2. If you specify a hyphen (-) in place of one of the file names, comm reads standard input. Generally, file1 and file2 should be sorted according to the collating sequence specified by the LC_COLLATE environment variable. (See sort(1).) If the input files are not sorted properly, the output of comm might not be useful. EXIT STATUS
Successful completion. Error occurred. EXAMPLES
In the following examples, file1 contains the following sorted list of North American cities: Anaheim Baltimore Boston Chicago Cleveland Dallas Detroit Kansas City Milwaukee Minneapolis New York Oakland Seattle Toronto The second file, file2, contains this sorted list: Atlanta Chicago Cincinnati Houston Los Angeles Montreal New York Philadelphia Pittsburgh San Diego San Francisco St. Louis To display the lines unique to each file and common to the two files, enter: comm file1 file2 This command results in the following output: Anaheim Atlanta Baltimore Boston Chicago Cincinnati Cleveland Dal- las Detroit Houston Kansas City Los Angeles Milwaukee Minneapolis Montreal New York Oakland Philadel- phia Pittsburgh San Diego San Francisco Seattle St. Louis Toronto The leftmost column contains lines in file1 only, the middle column contains lines in file2 only, and the rightmost column contains lines common to both files. To display any one or two of the three output columns, include the appropriate flags to suppress the columns you do not want. For example, the following command displays columns 1 and 2 only: comm -3 file1 file2 Anaheim Atlanta Baltimore Boston Cincinnati Cleveland Dallas Detroit Houston Kansas City Los Angeles Milwaukee Minneapolis Montreal Oakland Philadelphia Pittsburgh San Diego San Francisco Seattle St. Louis Toronto The following command displays output from only the second column: comm -13 file1 file2 Atlanta Cincinnati Houston Los Angeles Montreal Philadelphia Pittsburgh San Diego San Francisco St. Louis The following command displays output from only the third column: comm -12 file1 file2 Chicago New York SEE ALSO
Commands: cmp(1), diff(1), sdiff(1), sort(1), uniq(1) comm(1)
All times are GMT -4. The time now is 09:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy