Sponsored Content
Top Forums Shell Programming and Scripting Unique files in a given directory Post 302544693 by DGPickett on Thursday 4th of August 2011 03:24:41 PM
Old 08-04-2011
xargs is a very nice way to get economy of scale in shell scripting, like calling grep once for every 99 files, not for every file. -n99 does 2 things, recommends trying to fit 99 on the command line (really, commands execvp()'d are arrays of pointers to arrays of characters, not one string), and also says do not run for empty.

Sort has old and new keys. These are old keys, zero-based and for whole white space separated fields, so sort -u +0 -1 is sort on the first field and toss any late duplicate first field records. If many files have the same checksum, they are probably identical, in fact probably empty!

You can "man sort" and "man xargs" for this, or use the "Man Pages" link above, or google.

I make lists, like database tables. I can cut off the first, key field and make key lists, then run them through comm to find out what is in list 1 but not 2 nor both. Then I can use that still sorted key in join to pull the desired file names. "while read x y z" says read lines and divide fields by $IFS (white space by default) to x first, y second and z rest.

Gnu parallel is much like xargs, but on steroids. I am not sure how it distributes the lines and how it syncs them back to sequential, in terms of costs, latency and disk space and such. I have several parallel tools, but xargs is good enough for many things. Since this feeds a sort, line buffering might be fine for many fd wrting one pipe, and who cares about order! I will look into it! One wonders if and how it buffers thread 2-n until 1 is done. Thanks!

Speedup: find all files in Stuff and then use sort, cut and comm to find out which files are new (not on the old Stuff list), and cksum them only making a new Stuff list, and finally add these cksums to the new Stuff list.

Last edited by DGPickett; 08-04-2011 at 04:45 PM..
This User Gave Thanks to DGPickett For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Directory Inode Number Not Unique

Hi, I know that inode for each file is unique, but is it the for the directory? So far I found different directories has the same inode nubmer when you do ls -i, could some one explain why? Thanks a lot. (9 Replies)
Discussion started by: nj302
9 Replies

2. UNIX for Dummies Questions & Answers

To get unique numbers from two files

here i have two files: file 1 1 2 3 4 5 5 6 7 8 9 file 2 4 5 6 6 8 8 (6 Replies)
Discussion started by: i.scientist
6 Replies

3. Shell Programming and Scripting

Unique Directory and Folder Deletion Script

Ok, so I just got charged with the task of deleting some 300 user folders in a FTP server to free up some space. I managed to grep and cut the list of user folders to delete into a list of one user folder per line. Example: bob00 jane01 sue03 In the home folder, there are folders a-z, and... (5 Replies)
Discussion started by: b4sher
5 Replies

4. Shell Programming and Scripting

Find all images, append unique prefix to name and move to different directory

Hi, I have a directory with Multiple subdirectories and 1000s of pictures (jpg) in each directory. The problem is that each directory has a 001.jpg in them. I want to append a unique name (the directory_name)would be fine. and then move them to one main backup directory once they have been... (1 Reply)
Discussion started by: kmaq7621
1 Replies

5. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

6. Shell Programming and Scripting

Looping through entire directory and count unique values

Hello, I`m a complete newbie to coding, please help with this problem. I have multiple files in a directory, I have to loop through the contents of each file and extract number of unique isoforms in that file. Each file is tab delimited and only the line with the first parent (column 3)... (1 Reply)
Discussion started by: ritakadm
1 Replies

7. Shell Programming and Scripting

Extract unique files

In a incoming folder i have list of files like below,i want to pick the unique files to process the job. if same file contain more than one then it should pick latest date modified file to process. drwxrwsrwx 2 n308799 infagrp 256 May 20 17:42 Final_Working drwxrwsrwx 2... (1 Reply)
Discussion started by: katakamvivek
1 Replies

8. Shell Programming and Scripting

Add unique identifier from file to filetype in directory

I am trying to add a unique identifier to two file extensions .bam and .vcf in a directory located at /home/cmccabe/Desktop/index/R_2016_09_21_14_01_15_user_S5-00580-9-Medexome. The identifier is in $2 of the input file. What the code below is attempting to do is strip off the last portion... (21 Replies)
Discussion started by: cmccabe
21 Replies

9. Shell Programming and Scripting

Directory containing files,Print names of the files in the directory that are exactly same content.

Given a directory containing say a few thousand files, please output a list of all the names of the files in the directory that are exactly the same, i.e. have the same contents. func(a_directory_name) output -> {“matches”: , ... ]} e.g. func(“/home/my/files”) where the directory... (7 Replies)
Discussion started by: anuragpgtgerman
7 Replies

10. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies
hardlink(1)						      General Commands Manual						       hardlink(1)

NAME
hardlink - Consolidate duplicate files via hardlinks SYNOPSIS
hardlink [-c] [-n] [-v] [-vv] [-h] directory1 [ directory2 ... ] DESCRIPTION
This manual page documents hardlink, a program which consolidates duplicate files in one or more directories using hardlinks. hardlink traverses one or more directories searching for duplicate files. When it finds duplicate files, it uses one of them as the mas- ter. It then removes all other duplicates and places a hardlink for each one pointing to the master file. This allows for conservation of disk space where multiple directories on a single filesystem contain many duplicate files. Since hard links can only span a single filesystem, hardlink is only useful when all directories specified are on the same filesystem. OPTIONS
-c Compare only the contents of the files being considered for consolidation. Disregards permission, ownership and other differ- ences. -f Force hardlinking across file systems. -n Do not perform the consolidation; only print what would be changed. -v Print summary after hardlinking. -vv Print every hardlinked file and bytes saved. Also print summary after hardlinking. -h Show help. AUTHOR
hardlink was written by Jakub Jelinek <jakub@redhat.com>. Man page written by Brian Long. Man page updated by Jindrich Novy <jnovy@redhat.com> BUGS
hardlink assumes that its target directory trees do not change from under it. If a directory tree does change, this may result in hardlink accessing files and/or directories outside of the intended directory tree. Thus, you must avoid running hardlink on potentially changing directory trees, and especially on directory trees under control of another user. hardlink(1)
All times are GMT -4. The time now is 09:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy