Find common lines with one file and with all of the files in another folder Post: 303013673

Sponsored Content

Top Forums Shell Programming and Scripting Find common lines with one file and with all of the files in another folder Post 303013673 by Don Cragun on Sunday 25th of February 2018 02:11:51 PM

02-25-2018

Registered User

Quote:

Originally Posted by abdulbadii

Code:

for i in /notinthatfolder/*.* {
 comm -12 filetobecompared $i >filetobecompared-$i
 }

This can't possibly work... Your output filenames (not pathnames) contain at least two slash characters. And that is still assuming that filetobecompared doesn't contain any <slash> characters; which isn't clear in the above code nor in the specification of the problem in the first post in this thread.

Furthermore, comm is only specified to work if both input files being processed "are ordered in the current collating sequence" (i.e., in sorted order). And, despite the first post in this thread specifying comm, there is no indication that the input files being processed meet this requirement.

The awk code RudiC suggested doesn't care about input files being sorted, but it does have a problem with <slash> characters in output file. But, I'm confused by Eve's statement saying that RudiC's code puts output in a single file. It doesn't; it creates output file pathnames for each output line based on the two input file pathnames. It should do exactly what was requested in post #1 if all of the files being compared are in the current working directory and no <slash> characters are used in any of the arguments given as any of hte filename operands passed to his awk script.

If you want a tested script that can create report files in a specified directory where each report file contains lines that are common between one file (specified to be in any directory) and one or more files (each specified to be in any directory) you could try something more like the following:

Code:

#!/bin/ksh
usage() {
	printf "$Usage\n" "$IAm" >&2
}

# Define script variables.
IAm=${0##*/}
Usage="Usage: %s output_directory initial_file file_to_compare..."

# Verify that we have at least 3 arguments.
if [ $# -lt 3 ]
then	printf '%s: Not enough operands\n' "$IAm" >&2
	usage
	exit 1
fi

# Create output directory if it doesn't already exist and error out if we can't
# create it.
REPORT_DIR=$1
if ! mkdir -p "$REPORT_DIR"
then	usage
	exit 2
fi
printf '%s: Reports will be created in the directory "%s"\n' \
    "$IAm" "$REPORT_DIR"

# Shift off the output directory operand and invoke awk with the remaining
# arguments.
shift
awk -v destdir="$REPORT_DIR" '
FNR == 1 {
	# Create output pathname in destdir based on basename of input filenames
	if(NR == 1) {
		# Set the 1st part of the output pathname based on destdir and
		# the basename of the first input pathname.
		path1 = FILENAME
		sub(".*/", "", path1)
		path1 = destdir "/" path1 "-"
	} else {# Set the new output pathname that will be used if any
		# differences are found in the current input file based on
		# path1 and the basename of the current input pathname.
		new = FILENAME
		sub(".*/", "", new)
		new = path1 new
	}
}
NR == FNR {
	# Grab contents of the 1st input file.
	CMP[$0]
	next
}
$0 in CMP {
	# A line in the current file matched a line in the first file...
	if(last != new) {
		if(last)
			close(last)	# close the previous output file
		# Set the name of the new output file.
		last = new
	}
	# Print this duplicsted line into the current output file.
	print >> last
}' "$@"

This was written and tested using a Korn shell and also tested with bash. It should work with any shell that uses Bourne shell syntax and performs the parameter expansions required by the POSIX standards.

Note, however, that the output filenames uses the basename of the two input files found to have common lines. If files from different directories are being processed in a single run and the basenames of some of the files might be identical from different pathnames, it would be easy to also add a line in each output file naming the input file (or both input files) from which the following lines were copied, but that wasn't done here because it wasn't requested.

If you want to try this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk or nawk.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of...

2. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i�m looking for is intersection with A B C D F so the answer here will be

3. Shell Programming and Scripting

Find all text files in folder and then copy to a new folder

Hi all, *I use Uwin and Cygwin emulator. I�m trying to search for all text files in the current folder (C/Files) and its sub folders using find -depth -name "*.txt" The above command worked for me, but now I would like to copy all found text files to a new folder (C/Files/Text) with ...

4. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666...

5. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample...

6. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?...

7. Shell Programming and Scripting

Extracting lines from text files in folder based on the numbers in another file

Hello, I have a file ff.txt that looks as follows *ABNA.txt 356 24 36 112 *AC24.txt 457 458 321 2 ABNA.txt and AC24.txt are the files in the folder named foo1. Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and...

8. Shell Programming and Scripting

Shell Script to find common lines and replace next line

I want to find common line in two files and replace the next line of first file with the next line of second file. (sed,awk,perl,bash any solution is welcomed ) Case Ignored. Multiple Occurrence of same line. File 1: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "" kgcksdcgfkdsb...

9. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain...

10. Shell Programming and Scripting

Bash to trim folder and files within a path that share a common file extension

The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bashunder that executes and trims the .bam as expected but repeats the.bam.bai extentions...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Discussion started by: The Observer

2. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Discussion started by: alcalina

3. Shell Programming and Scripting

Find all text files in folder and then copy to a new folder

Discussion started by: cgkmal

4. Shell Programming and Scripting

Common lines from files

Discussion started by: jaysean

5. Shell Programming and Scripting

Common lines from files

Discussion started by: jaysean

6. Shell Programming and Scripting

Find common lines between multiple files

Discussion started by: bibb

7. Shell Programming and Scripting

Extracting lines from text files in folder based on the numbers in another file

Discussion started by: mohamad

8. Shell Programming and Scripting

Shell Script to find common lines and replace next line

Discussion started by: madira

9. Shell Programming and Scripting

Find common lines between all of the files in one folder

Discussion started by: Eve

10. Shell Programming and Scripting

Bash to trim folder and files within a path that share a common file extension

Discussion started by: cmccabe