Using find in a directory containing large number of files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using find in a directory containing large number of files
# 1  
Old 08-08-2011
Using find in a directory containing large number of files

Hi All,

I have searched this forum for related posts but could not find one that fits mine. I have a shell script which removes all the XML tags including the text inside the tags from some 4 million XML files.

The shell script looks like this (MODIFIED):

Code:
find . "*.xml" -print | while read page
do
cat $page | sed -e 's/<.*>//g' $page>$page.txt
done

Previously, the shell script looked like this (ORIGINAL):


Code:
ls -1 *.xml | while read page
do
cat $page | sed -e 's/<.*>//g' $page>$page.txt
done

Since, ls gives "Argument list too long" message, so after searching through this forum I could do some modifications to my ORIGINAL to come up with the MODIFIED version (above). But the MODIFIED version does not seem to work.
# 2  
Old 08-08-2011
Hi

"ls -1 " and "find . " are not the same. find will get files from folders within the current directory as well, if it finds any.

Guru.
This User Gave Thanks to guruprasadpr For This Post:
# 3  
Old 08-08-2011
Thanks. Is there any workaround to handle 4 million files in a directory in Linux? Many posts here point out to using xargs. Let me try that if it works I'll post my code here. Smilie

---------- Post updated at 01:47 PM ---------- Previous update was at 01:12 PM ----------

ok, so as of now this is what I have done:


Code:
echo *.xml | xargs ls -1 | while read page
do
cat $page | sed -e 's/<.*>//g' $page>$page.txt
done

When I run
Code:
echo *.xml | xargs ls -1

I can see the list of files. But the .txt files that I am getting are all empty.
# 4  
Old 08-08-2011
Hi

You can try this code for finding large number of files in a directory.

Code:
for page in `find scripting |grep -e 'xml$'`; 
do 
  cat $page | sed -e 's/<.*>//g' $page>$page.txt_3; 
done

In the above code, "scripting" is the location of the directory

And this code cat $page | sed -e 's/<.*>//g' $page>$page.txt
As you said it removes all the XML tags including the text inside the tags in all the XML files.So obviously the output text files will be empty.

Last edited by Scott; 08-08-2011 at 07:18 AM.. Reason: Please use code tags
This User Gave Thanks to ravi san For This Post:
# 5  
Old 08-08-2011
Thanks. Sorry, I guess I was a bit vague here. When I wrote
Quote:
shell script which removes all the XML tags including the text inside the tags from some 4 million XML files
I meant the script deletes contents inside the tags like
Code:
<text>

Code:
 <?xml version="1.0" encoding="iso-8859-1" ?>

This means my script removes only the above tags including all the text inside the tags (like "text" and "?xml version="1.0" encoding="iso-8859-1" ?") and keeps the main paragraphs of the files.

---------- Post updated at 05:12 PM ---------- Previous update was at 05:09 PM ----------

Oh great!

You've pointed out one more fault. It is indeed deleting everything. This I can fix myself.
# 6  
Old 08-08-2011
Code:
 
Is this correct?
 
cat $page | sed -e 's/<.*>//g' $page>$page.txt

It should be 
 
sed -e 's/<.*>//g' $page>$page.txt

This User Gave Thanks to panyam For This Post:
# 7  
Old 08-08-2011
Yes it can also be your way. But my way works too. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find Large Files Recursively From Specific Directory

Hi. I found many scripts in the web of achieving this. But I like to use this one find /EDWH-DMT03 -xdev -size +10000 -exec ls -la {} \;|sort -n -k 5 > LARGE.rst But the problem is, why it still list out files with 89 bytes as the output? Is there anything wrong with the command? My... (7 Replies)
Discussion started by: aimy
7 Replies

2. Shell Programming and Scripting

Sftp large number of files

Want to sftp large number of files ... approx 150 files will come to server every minute. (AIX box) Also need make sure file has been sftped successfully... Please let me know : 1. What is the best / faster way to transfer files? 2. should I use batch option -b so that connectivity will be... (3 Replies)
Discussion started by: vegasluxor
3 Replies

3. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

4. UNIX for Dummies Questions & Answers

Delete large number of files

Hi. I need to delete a large number of files listed in a txt file. There are over 90000 files in the list. Some of the directory names and some of the file names do have spaces in them. In the file, each line is a full path to a file: /path/to/the files/file1 /path/to/some other/files/file 2... (4 Replies)
Discussion started by: inakajin
4 Replies

5. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest... (3 Replies)
Discussion started by: pchang
3 Replies

6. Shell Programming and Scripting

Concatenation of a large number of files

Hellow i have a large number of files that i want to concatenate to one. these files start with the word 'VOICE_' for example VOICE_0000000000 VOICE_1223o23u0 VOICE_934934927349 I use the following code: cat /ODS/prepaid/CDR_FLOW/MEDIATION/VOICE_* >> /ODS/prepaid/CDR_FLOW/WORK/VOICE ... (10 Replies)
Discussion started by: chriss_58
10 Replies

7. Shell Programming and Scripting

Need help combining large number of text files

Hi, i have more than 1000 data files(.txt) like this first file format: 178.83 554.545 179.21 80.392 second file: 178.83 990.909 179.21 90.196 etc. I want to combine them to the following format: 178.83,554.545,990.909,... 179.21,80.392,90.196,... (7 Replies)
Discussion started by: mr_monocyte
7 Replies

8. UNIX for Dummies Questions & Answers

Problem using find with prune on large number of files

Hi all; I'm having a problem when want to list a large number of files in current directory using find together with the prune option. First i used this command but it list all the files including those in sub directories: find . -name "*.dat" | xargs ls -ltr Then i modified the command... (2 Replies)
Discussion started by: ashikin_8119
2 Replies

9. Shell Programming and Scripting

moving large number of files

I have a task to move more than 35000 files every two hours, from the same directory to another directory based on a file that has the list of filenames I tried the following logics (1) find . -name \*.dat > list for i in `cat list` do mv $i test/ done (2) cat list|xargs -i mv "{}"... (7 Replies)
Discussion started by: bryan
7 Replies
Login or Register to Ask a Question