Process 2 lists at the same time and move files.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Process 2 lists at the same time and move files.
# 1  
Old 11-10-2015
Process 2 lists at the same time and move files.

I have this while loop, that works but is quite slow to process though. I'm hopping there might be a faster/better way to find what I'm looking for.

I have 2 lists of numbers, and want to only find files where a file name has both values present.

each list has about 100 values.

Code:
while read lineA 
    do echo $lineA
    while read lineB
        do cp /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day* /foundfiles/ 
    done < list1.txt
done < list2.txt

Thank you.
# 2  
Old 11-10-2015
Some possible improvements

If the files are large, consider:

the mv command to place the file in /foundfiles/ directory

the ln command to create a symlink in /foundfiles/ back to the original file.

Example:
change :
Code:
cp /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day*
# to:
ls  /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day* | 
while read $fname 
do
   mv $fname /foundfiles/
done

The mv command assumes the /foundfiles/ directory and the source directory are on the same filesystem.

Also your cp command assumes files all exist - with all of those variables

I think that, with all those * metacharacters and so on, you may be matching many many files. Decreased I/O is a big win for performance. Assuming all 10000 or so of those possible patterns match.
# 3  
Old 11-10-2015
You seem to be assuming that all those files with all possible value combinations exist. Should that not be the case, running cp for each and ignoring the error message is a huge waste of resources. And, although it will most probably be cached/buffered by the system, you are reading list1 about a hundred times. How about
- reading list1 and list2 once into memory
- running cp only for existing files?
Try
Code:
cd /archive/$year$month/storage
ls | awk '
FNR==1          {FC++}
FC<=2           {A[FC,FNR] = $0
                 C[FC] = FNR
                 next
                }
                {for (i=1; i<=C[1]; i++)
                   for (j=1; j<=C[2]; j++)
                     if ($0 ~ A[1,i]"[NIT]......"A[2,j]"#") print "cp", $0, "/foundfiles"
                }
' SUBSEP="\t" path/to/list1 path/to/list2 - | sh

Please adapt the match regex to your needs!

---------- Post updated at 22:59 ---------- Previous update was at 22:50 ----------

You could further improve performance by excluding unapt files with a pattern to ls. And, you could break out of the loops when a file was found.
# 4  
Old 11-10-2015
Thank you for the assistance.

In the end I decided to dumb down my approach and keep it simple.

Here is what I did.

Code:
#Build full list of all files from the day in question
cd /archive/$year$month/storage/
ls *#$year$month$day* > /home/login/scripts/fulllist.txt

#Find all files that match list1 and output to new file
while read line
        do awk 'substr($0,35,7)=='"$line"' {print}' /home/login/scripts/fulllist.txt >> /home/login/scripts/fulllist2.txt
done < /home/login/scripts/list1.txt

#Find all files that match list2 from the list created above
while read line
        do awk 'substr($0,49,7)=='"$line"' {print}' /home/login/scripts/fulllist2.txt >> /home/login/scripts/fulllist3.txt
done < /home/login/scripts/list2.txt

#copy all files to temp folder
while read line ; do
        cp /archive/$year$month/storage/$line /foundfiles/
done < /home/login/scripts/fulllist3.txt



time ./program
real 0m6.327s
user 0m4.822s
sys 0m1.336s

The other methods I tried took way longer to process, my original method was taking many hours.

Last edited by whegra; 11-10-2015 at 09:58 PM..
# 5  
Old 11-12-2015
The following does two loops within awk,
and uses pipes between the different parts:
Code:
#Build full list of all files from the day in question and send to pipe
cd /archive/$year$month/storage/
ls *"#$year$month$day"* |

#Find all files that match list1 and send to pipe
awk '
FILENAME!="-" { nums[++nmax]=$0; next } 
{ for (n in nums) if (substr($0,35,7)==n) print }
' /home/login/scripts/list1.txt - |

#Find all files that match list2 and send to pipe
awk '
FILENAME!="-" { nums[++nmax]=$0; next } 
{ for (n in nums) if (substr($0,49,7)==n) print }
' /home/login/scripts/list2.txt - |

#copy all files to temp folder
while read line ; do
        cp "/archive/$year$month/storage/$line" /foundfiles/
done

# 6  
Old 11-12-2015
Just for my own sanity, could you please try the following with your data and let me know how the runtime compares to your script? I get the feeling it should be faster, but maybe all of the processing time is being spent copying files and the time used determining which files to copy doesn't matter...
Code:
#!/bin/ksh
year="2015"	# Replace with your desired year.
month="11"	# Replace with your desired month.
day="11"	# Replace with your desired day.

#Get full list of all files from the day in question
cd /archive/$year$month/storage/
ls *"#$year$month$day"* |

#Find all files that match list1 and list2...
awk '
FNR == 1 {
	f++
}
f == 1 {list1[$0]
	next
}
f == 2 {list2[$0]
	next
}
substr($0, 35, 7) in list1 && substr($0, 49, 7) in list2
' /home/login/scripts/list1.txt /home/login/scripts/list2.txt - |

#copy all files to temp folder
while read -r line
do	cp "$line" /foundfiles/
done

This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 11-12-2015
Doh! I wanted the loop in awk and overlooked the smart lookup (substr($0,35,7) in nums)
Thanks for showing it!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to generate all combinations of group lists at the same time?

Hi, everyone I have a group lists like this (more lines are omitted: a b c d E F G H .... I want to generate all combinations of two elements in each group line. What I expected is this: a b a c a d b c b d c d E F E G E H F G F H G H (8 Replies)
Discussion started by: nengcheng
8 Replies

2. UNIX for Beginners Questions & Answers

How to move files older than certain time?

If there are 100 files created in a directory /data/ today from 2:00 AM to 10:00 AM I need to move files older than 3:00 AM to a new directory /hold/ How to do that? Also, if I have to move files between 3:00 AM to 9:00 AM, how to achieve that (3 Replies)
Discussion started by: eskay
3 Replies

3. Shell Programming and Scripting

• Write a shell script that upon invocation shows the time and date and lists all the logged-in user

help me (1 Reply)
Discussion started by: sonu pandey
1 Replies

4. Shell Programming and Scripting

Process files in a directory and move them

I have a directory e2e_ms_xfer/cent01 this contains the multiple files some of which will be named below with unique date time stamps e2e_ms_edd_nom_CCYYMMDD_HHMM.csv What I want to do is in a loop 1) Get the oldest file 2) Rename 3) Move it up one level from e2e_ms_xfer/cent01 to... (1 Reply)
Discussion started by: andymay
1 Replies

5. UNIX for Dummies Questions & Answers

How to move files based on filetype and time created?

Hi, I'm trying to improve my Unix skills and I'm wondering what is the best way to move some files based on filetype and attributes like time created? For instance, lets suppose I have a directory with many different files in it and I'd like to move all the jpgs that were created between May... (6 Replies)
Discussion started by: LuckyTommy
6 Replies

6. UNIX for Dummies Questions & Answers

move files between file systems with privileges, time stamp

Hi I have to move files between file systems but files in new file system must have the same attributes as in old one (privileges, time stamp etc). Which tool is best : - ufsdump / ufsrestore - tar - cpio - pax - dd - mv Or maybe there is sth else, you suggest to use. Thx for help (5 Replies)
Discussion started by: presul
5 Replies

7. Shell Programming and Scripting

Shell Script to Create non-duplicate lists from two lists

File_A contains Strings: a b c d File_B contains Strings: a c z Need to have script written in either sh or ksh. Derive resultant files (File_New_A and File_New_B) from lists File_A and File_B where string elements in File_New_A and File_New_B are listed below. Resultant... (7 Replies)
Discussion started by: mlv_99
7 Replies

8. Shell Programming and Scripting

Move files one at the time and wait until the previous file is handled

I'm a novice at unix and need it more and more to do my work. I seem running into problems getting this script "attempt" to work: I need to copy all files in a directory, which is containing 22000 files, into a directory one level up. There a tool monitors the content of the dir and processes... (2 Replies)
Discussion started by: compasscard
2 Replies

9. Shell Programming and Scripting

Shell script to move certain files on scheduled time

Hi Friends, I want a shell script which will move certain .jar files from a specified location (say /publish/content) to (/publish/archive) on every saturday morning 6 am. One more thing to add is that before moving files it must check free space at (/publish/archive), if it is more than 60 %... (7 Replies)
Discussion started by: abhishek27
7 Replies

10. UNIX for Dummies Questions & Answers

Archiving and move files in the same time

Hi All, I have tried so many command but none work like i wanted. I would like archive which i assume it will move the files and archive it somewhere. for example: if i have a folder and files: /home/blah/test /home/blah/hello /home/blah/foo/bar i would like to archive folder... (6 Replies)
Discussion started by: c00kie88
6 Replies
Login or Register to Ask a Question