Find common lines with one file and with all of the files in another folder


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find common lines with one file and with all of the files in another folder
# 1  
Old 02-24-2018
Find common lines with one file and with all of the files in another folder

Hi! I would like to
Code:
comm -12

with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the output files shoud be the two file names together that had common lines united by a dash sign - for instance
filetobecompared-filethathadacommonline

Sincerely grateful if anyone can help!
I don't have a python, could it be done with awk or anything else that works?

Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 02-24-2018 at 10:45 AM.. Reason: Added CODE tags.
# 2  
Old 02-24-2018
Welcome to the forum.

Untested (due to lack of samples):
Code:
awk 'NR == FNR {CMP[$0]; next} $0 in CMP {print >> (TMPFN=ARGV[1] "-" FILENAME); close (TMPFN)} ' onefile anotherdir/*

You can drop the close() if the file count is less than the system parameter OPEN_MAX.
# 3  
Old 02-25-2018
Hi! Thank you for your help! All of the outputs are correct, but it puts all of the outputs into a one file. I hoped that all of the outputs would be in different files and that the file names would be the two file names together that had common lines united by a dash sign.
# 4  
Old 02-25-2018
Code:
for i in /notinthatfolder/*.* {
 comm -12 filetobecompared $i >filetobecompared-$i
 }

# 5  
Old 02-25-2018
Quote:
Originally Posted by abdulbadii
Code:
for i in /notinthatfolder/*.* {
 comm -12 filetobecompared $i >filetobecompared-$i
 }

This can't possibly work... Your output filenames (not pathnames) contain at least two slash characters. And that is still assuming that filetobecompared doesn't contain any <slash> characters; which isn't clear in the above code nor in the specification of the problem in the first post in this thread.

Furthermore, comm is only specified to work if both input files being processed "are ordered in the current collating sequence" (i.e., in sorted order). And, despite the first post in this thread specifying comm, there is no indication that the input files being processed meet this requirement.

The awk code RudiC suggested doesn't care about input files being sorted, but it does have a problem with <slash> characters in output file. But, I'm confused by Eve's statement saying that RudiC's code puts output in a single file. It doesn't; it creates output file pathnames for each output line based on the two input file pathnames. It should do exactly what was requested in post #1 if all of the files being compared are in the current working directory and no <slash> characters are used in any of the arguments given as any of hte filename operands passed to his awk script.

If you want a tested script that can create report files in a specified directory where each report file contains lines that are common between one file (specified to be in any directory) and one or more files (each specified to be in any directory) you could try something more like the following:
Code:
#!/bin/ksh
usage() {
	printf "$Usage\n" "$IAm" >&2
}

# Define script variables.
IAm=${0##*/}
Usage="Usage: %s output_directory initial_file file_to_compare..."

# Verify that we have at least 3 arguments.
if [ $# -lt 3 ]
then	printf '%s: Not enough operands\n' "$IAm" >&2
	usage
	exit 1
fi

# Create output directory if it doesn't already exist and error out if we can't
# create it.
REPORT_DIR=$1
if ! mkdir -p "$REPORT_DIR"
then	usage
	exit 2
fi
printf '%s: Reports will be created in the directory "%s"\n' \
    "$IAm" "$REPORT_DIR"

# Shift off the output directory operand and invoke awk with the remaining
# arguments.
shift
awk -v destdir="$REPORT_DIR" '
FNR == 1 {
	# Create output pathname in destdir based on basename of input filenames
	if(NR == 1) {
		# Set the 1st part of the output pathname based on destdir and
		# the basename of the first input pathname.
		path1 = FILENAME
		sub(".*/", "", path1)
		path1 = destdir "/" path1 "-"
	} else {# Set the new output pathname that will be used if any
		# differences are found in the current input file based on
		# path1 and the basename of the current input pathname.
		new = FILENAME
		sub(".*/", "", new)
		new = path1 new
	}
}
NR == FNR {
	# Grab contents of the 1st input file.
	CMP[$0]
	next
}
$0 in CMP {
	# A line in the current file matched a line in the first file...
	if(last != new) {
		if(last)
			close(last)	# close the previous output file
		# Set the name of the new output file.
		last = new
	}
	# Print this duplicsted line into the current output file.
	print >> last
}' "$@"

This was written and tested using a Korn shell and also tested with bash. It should work with any shell that uses Bourne shell syntax and performs the parameter expansions required by the POSIX standards.

Note, however, that the output filenames uses the basename of the two input files found to have common lines. If files from different directories are being processed in a single run and the basenames of some of the files might be identical from different pathnames, it would be easy to also add a line in each output file naming the input file (or both input files) from which the following lines were copied, but that wasn't done here because it wasn't requested.

If you want to try this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk or nawk.
# 6  
Old 03-03-2018
Sorry that I didn't reply in the right time. All of the files were previously sorted. It was eventually acceptable for me in this case that the outcomes were all in one file. I could continue like that and I didn't want bother you any longer. I didn't get the very long code that Don Cragun offered working - my C Shell window closed by itself everytime I tried to use it. And since I can use the case when all of the outcomes are in one file too you don't need to find a solution for that any longer. I'm grateful to you and you were very helpful!
# 7  
Old 03-04-2018
This is why it is crucial that you always start a thread in this forum by explaining the environment you're using. You might note that I explicitly stated the requirements for running the code I suggested:
Quote:
This was written and tested using a Korn shell and also tested with bash. It should work with any shell that uses Bourne shell syntax and performs the parameter expansions required by the POSIX standards.
The csh shell does not use Bourne shell syntax and does not perform any of the parameter expansions required by the POSIX standards. Therefore, you should have expected that it might not work unless you used a shell that met the requirements I specified. But, if you had stored my suggestion in a file and used csh to run that file, it should have given you syntax errors running that script; it should not have closed your window.

I am sorry that I wasted your time by trying to help you with a script that should have worked perfectly for you if you had saved it into a file and then run it with ksh or bash instead of csh.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to trim folder and files within a path that share a common file extension

The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bashunder that executes and trims the .bam as expected but repeats the.bam.bai extentions... (9 Replies)
Discussion started by: cmccabe
9 Replies

2. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Discussion started by: Eve
19 Replies

3. Shell Programming and Scripting

Shell Script to find common lines and replace next line

I want to find common line in two files and replace the next line of first file with the next line of second file. (sed,awk,perl,bash any solution is welcomed ) Case Ignored. Multiple Occurrence of same line. File 1: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "" kgcksdcgfkdsb... (4 Replies)
Discussion started by: madira
4 Replies

4. Shell Programming and Scripting

Extracting lines from text files in folder based on the numbers in another file

Hello, I have a file ff.txt that looks as follows *ABNA.txt 356 24 36 112 *AC24.txt 457 458 321 2 ABNA.txt and AC24.txt are the files in the folder named foo1. Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and... (2 Replies)
Discussion started by: mohamad
2 Replies

5. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

6. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample... (2 Replies)
Discussion started by: jaysean
2 Replies

7. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666... (5 Replies)
Discussion started by: jaysean
5 Replies

8. Shell Programming and Scripting

Find all text files in folder and then copy to a new folder

Hi all, *I use Uwin and Cygwin emulator. I´m trying to search for all text files in the current folder (C/Files) and its sub folders using find -depth -name "*.txt" The above command worked for me, but now I would like to copy all found text files to a new folder (C/Files/Text) with ... (4 Replies)
Discussion started by: cgkmal
4 Replies

9. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i´m looking for is intersection with A B C D F so the answer here will be (10 Replies)
Discussion started by: alcalina
10 Replies

10. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question