08-04-2011
xargs is a very nice way to get economy of scale in shell scripting, like calling grep once for every 99 files, not for every file. -n99 does 2 things, recommends trying to fit 99 on the command line (really, commands execvp()'d are arrays of pointers to arrays of characters, not one string), and also says do not run for empty.
Sort has old and new keys. These are old keys, zero-based and for whole white space separated fields, so sort -u +0 -1 is sort on the first field and toss any late duplicate first field records. If many files have the same checksum, they are probably identical, in fact probably empty!
You can "man sort" and "man xargs" for this, or use the "Man Pages" link above, or google.
I make lists, like database tables. I can cut off the first, key field and make key lists, then run them through comm to find out what is in list 1 but not 2 nor both. Then I can use that still sorted key in join to pull the desired file names. "while read x y z" says read lines and divide fields by $IFS (white space by default) to x first, y second and z rest.
Gnu parallel is much like xargs, but on steroids. I am not sure how it distributes the lines and how it syncs them back to sequential, in terms of costs, latency and disk space and such. I have several parallel tools, but xargs is good enough for many things. Since this feeds a sort, line buffering might be fine for many fd wrting one pipe, and who cares about order! I will look into it! One wonders if and how it buffers thread 2-n until 1 is done. Thanks!
Speedup: find all files in Stuff and then use sort, cut and comm to find out which files are new (not on the old Stuff list), and cksum them only making a new Stuff list, and finally add these cksums to the new Stuff list.
Last edited by DGPickett; 08-04-2011 at 04:45 PM..
This User Gave Thanks to DGPickett For This Post:
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi, I know that inode for each file is unique, but is it the for the directory? So far I found different directories has the same inode nubmer when you do ls -i, could some one explain why? Thanks a lot. (9 Replies)
Discussion started by: nj302
9 Replies
2. UNIX for Dummies Questions & Answers
here i have two files:
file 1
1
2
3
4
5
5
6
7
8
9
file 2
4
5
6
6
8
8 (6 Replies)
Discussion started by: i.scientist
6 Replies
3. Shell Programming and Scripting
Ok, so I just got charged with the task of deleting some 300 user folders in a FTP server to free up some space. I managed to grep and cut the list of user folders to delete into a list of one user folder per line.
Example:
bob00
jane01
sue03
In the home folder, there are folders a-z, and... (5 Replies)
Discussion started by: b4sher
5 Replies
4. Shell Programming and Scripting
Hi,
I have a directory with Multiple subdirectories and 1000s of pictures (jpg) in each directory. The problem is that each directory has a 001.jpg in them. I want to append a unique name (the directory_name)would be fine. and then move them to one main backup directory once they have been... (1 Reply)
Discussion started by: kmaq7621
1 Replies
5. UNIX for Dummies Questions & Answers
hi
i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command?
my... (1 Reply)
Discussion started by: anurupa777
1 Replies
6. Shell Programming and Scripting
Hello,
I`m a complete newbie to coding, please help with this problem.
I have multiple files in a directory, I have to loop through the contents of each file and extract number of unique isoforms in that file. Each file is tab delimited and only the line with the first parent (column 3)... (1 Reply)
Discussion started by: ritakadm
1 Replies
7. Shell Programming and Scripting
In a incoming folder i have list of files like below,i want to pick the unique files to process the job. if same file contain more than one then it should pick latest date modified file to process.
drwxrwsrwx 2 n308799 infagrp 256 May 20 17:42 Final_Working
drwxrwsrwx 2... (1 Reply)
Discussion started by: katakamvivek
1 Replies
8. Shell Programming and Scripting
I am trying to add a unique identifier to two file extensions .bam and .vcf in a directory located at /home/cmccabe/Desktop/index/R_2016_09_21_14_01_15_user_S5-00580-9-Medexome.
The identifier is in $2 of the input file. What the code below is attempting to do is strip off the last portion... (21 Replies)
Discussion started by: cmccabe
21 Replies
9. Shell Programming and Scripting
Given a directory containing say a few thousand files,
please output a list of all the names of the files in the directory that are exactly the same, i.e. have the same contents.
func(a_directory_name) output -> {“matches”: , ... ]}
e.g. func(“/home/my/files”) where the directory... (7 Replies)
Discussion started by: anuragpgtgerman
7 Replies
10. UNIX for Beginners Questions & Answers
I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with:
wc -l *|sort
15263 Image.txt
16401 reference.txt
40459 richtexteditor.txt
How can I also print the number of unique lines in each file?
15263 1401 Image.txt
16401... (15 Replies)
Discussion started by: spacegoose
15 Replies