Sponsored Content
Top Forums Shell Programming and Scripting Split a folder with huge number of files in n folders Post 302906679 by Scrutinizer on Monday 23rd of June 2014 01:02:25 AM
Old 06-23-2014
If characteristics of filenames are used, then in addition to the format there would still need to be a reasonable understanding of the distribution of filenames along the filename-parts that are chosen as bins, otherwise some of them may still end up being too full.


----
Since you are using Ubuntu an entirely different alternative might be to leave the files as-is and use locate and updatedb, but of course that would not be adequate for files younger than the last update..


-----
Quote:
Originally Posted by MadeInGermany
Aren't 350 000 files too many arguments for for i in *?
Safer and faster is
Code:
find . -type f |
while read i

As Don mentioned: Safer? No. There is no limitation like ARG_MAX, since there are no external programs that arguments are being passed to. In theory find-and-pipe is slightly less safe, since it will not work for file names with newlines in them, but this is mostly theory since in practice I for one have never encountered files like that, other than the ones I had created myself for testing purposes...

Last edited by Scrutinizer; 06-23-2014 at 02:14 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

delete all folders/files and keep only the last 10 in a folder

Hi, I want to write a script that deletes all folders and keep the last 10 recent folders. I know the following: ls -ltr will sort the folders from old to recent. ls -ltr | awk '{print $9}' will list the folder names (with a blank line at the beginning) I want to get the 10th folder from... (3 Replies)
Discussion started by: melanie_pfefer
3 Replies

2. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

3. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

4. Shell Programming and Scripting

Move all files but not folders to a new folder

Hi, I have a sub directory with a number of files and folders. What i want is a subdirectory with just folders and not files for cleanliness sake. So I want to move the files into the new folder but keep the folders in the same place. Move all files (but not folders) to new folder. I am... (4 Replies)
Discussion started by: Hopper_no1
4 Replies

5. Shell Programming and Scripting

How to delete a huge number of files at a time

I met a problem on HPUX with 64G RAM and 20 CPU. There are 5 million files with file name from file0000001.dat to file9999999.dat, in the same directory, and with some other files with random names. I was trying to remove all the files from file0000001.dat to file9999999.dat at the same time.... (9 Replies)
Discussion started by: lisp21
9 Replies

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

7. Shell Programming and Scripting

moving files from one folder to many folders

I have a more than 10 K files in a folder. They are accumulated in a period of more than an year (Say from 13th July 2010 to 4th June 2011). I need to perform housekeeping on this. The requirement is to create a folder like 13Jul2010,14July2010,......3June2011,4June2010 and then from the main... (2 Replies)
Discussion started by: realspirituals
2 Replies

8. Shell Programming and Scripting

Symlink all files from one folder into all found folders

Hi. I have a folder which contains my application. I then have a flexible number of folders in another directory, called “sites”. It looks like this: -Application -- Test.html -- CSS --- Style.css -Sites --Site1 --Site2 I want to symlink all the files in the application folder... (1 Reply)
Discussion started by: Spadez
1 Replies

9. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

10. Shell Programming and Scripting

Moving files and folders to another folder

I recently bought Synology server and realised it can run scripts. I would need fairly simple script which moves all files and folders from ARCHIVE folder to WORKING folder. I would also need to maintain folder structure as each of the folders may contain subfolders. How would I go about it as I am... (1 Reply)
Discussion started by: ###
1 Replies
UPDATEDB(1)						      General Commands Manual						       UPDATEDB(1)

NAME
updatedb - update a file name database SYNOPSIS
updatedb [options] DESCRIPTION
This manual page documents the GNU version of updatedb, which updates file name databases used by GNU locate. The file name databases con- tain lists of files that were in particular directory trees when the databases were last updated. The file name of the default database is determined when locate and updatedb are configured and installed. The frequency with which the databases are updated and the directories for which they contain entries depend on how often updatedb is run, and with which arguments. In networked environments, it often makes sense to build a database at the root of each filesystem, containing the entries for that filesystem. updatedb is then run for each filesystem on the fileserver where that filesystem is on a local disk, to prevent thrashing the network. Users can select which databases locate searches using an environment variable or command line option; see locate(1). Databases can not be concatenated together. The file name database format changed starting with GNU find and locate version 4.0 to allow machines with different byte orderings to share the databases. The new GNU locate can read both the old and new database formats. However, old versions of locate and find produce incorrect results if given a new-format database. OPTIONS
--findoptions='-option1 -option2...' Global options to pass on to find. The environment variable FINDOPTIONS also sets this value. Default is none. --localpaths='path1 path2...' Non-network directories to put in the database. Default is /. --netpaths='path1 path2...' Network (NFS, AFS, RFS, etc.) directories to put in the database. The environment variable NETPATHS also sets this value. Default is none. --prunepaths='path1 path2...' Directories to not put in the database, which would otherwise be. Remove any trailing slashes from the path names, otherwise updat- edb won't recognise the paths you want to omit (because it uses them as regular expression patterns). The environment variable PRUNEPATHS also sets this value. Default is /tmp /usr/tmp /var/tmp /afs. --prunefs='path...' File systems to not put in the database, which would otherwise be. Note that files are pruned when a file system is reached; any file system mounted under an undesired file system will be ignored. The environment variable PRUNEFS also sets this value. Default is nfs NFS proc. --output=dbfile The database file to build. Default is /var/lib/locatedb. --localuser=user The user to search non-network directories as, using su(1). Default is to search the non-network directories as the current user. You can also use the environment variable LOCALUSER to set this user. --netuser=user The user to search network directories as, using su(1). Default is nobody. You can also use the environment variable NETUSER to set this user. --old-format Create the database in the old format. This is a synonym for --dbformat=old. --dbformat=F Create the database in format F. The default format is called LOCATE02. F can be old to select the old database format (this is the same as specifying --old-format). Alternatively the slocate format is also supported. When the slocate format is in use, the database produced is marked as having security level 1. If you want to build a system-wide slocate database, you may want to run updatedb as root. --version Print the version number of updatedb and exit. --help Print a summary of the options to updatedb and exit. SEE ALSO
find(1), locate(1), locatedb(5), xargs(1) Finding Files (on-line in Info, or printed) BUGS
The updatedb program correctly handles filenames containing newlines, but only if the system's sort command has a working -z option. If you suspect that locate may need to return filenames containing newlines, consider using its --null option. The best way to report a bug is to use the form at http://savannah.gnu.org/bugs/?group=findutils. The reason for this is that you will then be able to track progress in fixing the problem. Other comments about updatedb(1) and about the findutils package in general can be sent to the bug-findutils mailing list. To join the list, send email to bug-findutils-request@gnu.org. UPDATEDB(1)
All times are GMT -4. The time now is 02:16 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy