Unique files in a given directory Post: 302544693

Sponsored Content

Top Forums Shell Programming and Scripting Unique files in a given directory Post 302544693 by DGPickett on Thursday 4th of August 2011 03:24:41 PM

08-04-2011

Registered User

xargs is a very nice way to get economy of scale in shell scripting, like calling grep once for every 99 files, not for every file. -n99 does 2 things, recommends trying to fit 99 on the command line (really, commands execvp()'d are arrays of pointers to arrays of characters, not one string), and also says do not run for empty.

Sort has old and new keys. These are old keys, zero-based and for whole white space separated fields, so sort -u +0 -1 is sort on the first field and toss any late duplicate first field records. If many files have the same checksum, they are probably identical, in fact probably empty!

You can "man sort" and "man xargs" for this, or use the "Man Pages" link above, or google.

I make lists, like database tables. I can cut off the first, key field and make key lists, then run them through comm to find out what is in list 1 but not 2 nor both. Then I can use that still sorted key in join to pull the desired file names. "while read x y z" says read lines and divide fields by $IFS (white space by default) to x first, y second and z rest.

Gnu parallel is much like xargs, but on steroids. I am not sure how it distributes the lines and how it syncs them back to sequential, in terms of costs, latency and disk space and such. I have several parallel tools, but xargs is good enough for many things. Since this feeds a sort, line buffering might be fine for many fd wrting one pipe, and who cares about order! I will look into it! One wonders if and how it buffers thread 2-n until 1 is done. Thanks!

Speedup: find all files in Stuff and then use sort, cut and comm to find out which files are new (not on the old Stuff list), and cksum them only making a new Stuff list, and finally add these cksums to the new Stuff list.

Last edited by DGPickett; 08-04-2011 at 04:45 PM..

This User Gave Thanks to DGPickett For This Post:

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Directory Inode Number Not Unique

Hi, I know that inode for each file is unique, but is it the for the directory? So far I found different directories has the same inode nubmer when you do ls -i, could some one explain why? Thanks a lot.

2. UNIX for Dummies Questions & Answers

To get unique numbers from two files

here i have two files: file 1 1 2 3 4 5 5 6 7 8 9 file 2 4 5 6 6 8 8

3. Shell Programming and Scripting

Unique Directory and Folder Deletion Script

Ok, so I just got charged with the task of deleting some 300 user folders in a FTP server to free up some space. I managed to grep and cut the list of user folders to delete into a list of one user folder per line. Example: bob00 jane01 sue03 In the home folder, there are folders a-z, and...

4. Shell Programming and Scripting

Find all images, append unique prefix to name and move to different directory

Hi, I have a directory with Multiple subdirectories and 1000s of pictures (jpg) in each directory. The problem is that each directory has a 001.jpg in them. I want to append a unique name (the directory_name)would be fine. and then move them to one main backup directory once they have been...

5. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my...

6. Shell Programming and Scripting

Looping through entire directory and count unique values

Hello, I`m a complete newbie to coding, please help with this problem. I have multiple files in a directory, I have to loop through the contents of each file and extract number of unique isoforms in that file. Each file is tab delimited and only the line with the first parent (column 3)...

7. Shell Programming and Scripting

Extract unique files

In a incoming folder i have list of files like below,i want to pick the unique files to process the job. if same file contain more than one then it should pick latest date modified file to process. drwxrwsrwx 2 n308799 infagrp 256 May 20 17:42 Final_Working drwxrwsrwx 2...

8. Shell Programming and Scripting

Add unique identifier from file to filetype in directory

I am trying to add a unique identifier to two file extensions .bam and .vcf in a directory located at /home/cmccabe/Desktop/index/R_2016_09_21_14_01_15_user_S5-00580-9-Medexome. The identifier is in $2 of the input file. What the code below is attempting to do is strip off the last portion...

9. Shell Programming and Scripting

Directory containing files,Print names of the files in the directory that are exactly same content.

Given a directory containing say a few thousand files, please output a list of all the names of the files in the directory that are exactly the same, i.e. have the same contents. func(a_directory_name) output -> {�matches�: , ... ]} e.g. func(�/home/my/files�) where the directory...

10. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401...

LEARN ABOUT NETBSD

join

JOIN(1) 						    BSD General Commands Manual 						   JOIN(1)

NAME

     join -- relational database operator

SYNOPSIS

     join [-a file_number | -v file_number] [-e string] [-j file_number field] [-o list] [-t char] [-1 field] [-2 field] file1 file2

DESCRIPTION

     The join utility performs an ``equality join'' on the specified files and writes the result to the standard output.  The ``join field'' is
     the field in each file by which the files are compared.  The first field in each line is used by default.	There is one line in the output
     for each pair of lines in file1 and file2 which have identical join fields.  Each output line consists of the join field, the remaining
     fields from file1 and then the remaining fields from file2.

     The default field separators are tab and space characters.  In this case, multiple tabs and spaces count as a single field separator, and
     leading tabs and spaces are ignored.  The default output field separator is a single space character.

     Many of the options use file and field numbers.  Both file numbers and field numbers are 1 based, i.e. the first file on the command line is
     file number 1 and the first field is field number 1.  The following options are available:

     -a file_number
		 In addition to the default output, produce a line for each unpairable line in file file_number.  (The argument to -a must not be
		 preceded by a space; see the COMPATIBILITY section.)

     -e string	 Replace empty output fields with string.

     -o list	 The -o option specifies the fields that will be output from each file for each line with matching join fields.  Each element of
		 list has the form 'file_number.field', where file_number is a file number and field is a field number.  The elements of list must
		 be either comma (``,'') or whitespace separated.  (The latter requires quoting to protect it from the shell, or, a simpler
		 approach is to use multiple -o options.)

     -t char	 Use character char as a field delimiter for both input and output.  Every occurrence of char in a line is significant.

     -v file_number
		 Do not display the default output, but display a line for each unpairable line in file file_number.  The options -v 1 and -v 2
		 may be specified at the same time.

     -1 field	 Join on the field'th field of file 1.

     -2 field	 Join on the field'th field of file 2.

     When the default field delimiter characters are used, the files to be joined should be ordered in the collating sequence of sort(1), using
     the -b option, on the fields on which they are to be joined, otherwise join may not report all field matches.  When the field delimiter char-
     acters are specified by the -t option, the collating sequence should be the same as sort(1) without the -b option.

     If one of the arguments file1 or file2 is ``-'', the standard input is used.

     The join utility exits 0 on success, and >0 if an error occurs.

COMPATIBILITY

     For compatibility with historic versions of join, the following options are available:

     -a 	 In addition to the default output, produce a line for each unpairable line in both file 1 and file 2.	(To distinguish between
		 this and -a file_number, join currently requires that the latter not include any white space.)

     -j1 field	 Join on the field'th field of file 1.

     -j2 field	 Join on the field'th field of file 2.

     -j field	 Join on the field'th field of both file 1 and file 2.

     -o list ...
		 Historical implementations of join permitted multiple arguments to the -o option.  These arguments were of the form ``file_num-
		 ber.field_number'' as described for the current -o option.  This has obvious difficulties in the presence of files named ``1.2''.

     These options are available only so historic shell scripts don't require modification and should not be used.

SEE ALSO

     awk(1), comm(1), paste(1), sort(1), uniq(1)

STANDARDS

     The join command is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible.

BSD
								  April 28, 1995							       BSD

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Directory Inode Number Not Unique

Discussion started by: nj302

2. UNIX for Dummies Questions & Answers

To get unique numbers from two files

Discussion started by: i.scientist

3. Shell Programming and Scripting

Unique Directory and Folder Deletion Script

Discussion started by: b4sher

4. Shell Programming and Scripting

Find all images, append unique prefix to name and move to different directory

Discussion started by: kmaq7621