I have this while loop, that works but is quite slow to process though. I'm hopping there might be a faster/better way to find what I'm looking for.
I have 2 lists of numbers, and want to only find files where a file name has both values present.
each list has about 100 values.
Code:
while read lineA
do echo $lineA
while read lineB
do cp /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day* /foundfiles/
done < list1.txt
done < list2.txt
the mv command to place the file in /foundfiles/ directory
the ln command to create a symlink in /foundfiles/ back to the original file.
Example:
change :
Code:
cp /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day*
# to:
ls /archive/$year$month/storage/*$lineA[NIT]??????$lineB*#$year$month$day* |
while read $fname
do
mv $fname /foundfiles/
done
The mv command assumes the /foundfiles/ directory and the source directory are on the same filesystem.
Also your cp command assumes files all exist - with all of those variables
I think that, with all those * metacharacters and so on, you may be matching many many files. Decreased I/O is a big win for performance. Assuming all 10000 or so of those possible patterns match.
You seem to be assuming that all those files with all possible value combinations exist. Should that not be the case, running cp for each and ignoring the error message is a huge waste of resources. And, although it will most probably be cached/buffered by the system, you are reading list1 about a hundred times. How about
- reading list1 and list2 once into memory
- running cp only for existing files?
Try
Code:
cd /archive/$year$month/storage
ls | awk '
FNR==1 {FC++}
FC<=2 {A[FC,FNR] = $0
C[FC] = FNR
next
}
{for (i=1; i<=C[1]; i++)
for (j=1; j<=C[2]; j++)
if ($0 ~ A[1,i]"[NIT]......"A[2,j]"#") print "cp", $0, "/foundfiles"
}
' SUBSEP="\t" path/to/list1 path/to/list2 - | sh
Please adapt the match regex to your needs!
---------- Post updated at 22:59 ---------- Previous update was at 22:50 ----------
You could further improve performance by excluding unapt files with a pattern to ls. And, you could break out of the loops when a file was found.
In the end I decided to dumb down my approach and keep it simple.
Here is what I did.
Code:
#Build full list of all files from the day in question
cd /archive/$year$month/storage/
ls *#$year$month$day* > /home/login/scripts/fulllist.txt
#Find all files that match list1 and output to new file
while read line
do awk 'substr($0,35,7)=='"$line"' {print}' /home/login/scripts/fulllist.txt >> /home/login/scripts/fulllist2.txt
done < /home/login/scripts/list1.txt
#Find all files that match list2 from the list created above
while read line
do awk 'substr($0,49,7)=='"$line"' {print}' /home/login/scripts/fulllist2.txt >> /home/login/scripts/fulllist3.txt
done < /home/login/scripts/list2.txt
#copy all files to temp folder
while read line ; do
cp /archive/$year$month/storage/$line /foundfiles/
done < /home/login/scripts/fulllist3.txt
time ./program
real 0m6.327s
user 0m4.822s
sys 0m1.336s
The other methods I tried took way longer to process, my original method was taking many hours.
The following does two loops within awk,
and uses pipes between the different parts:
Code:
#Build full list of all files from the day in question and send to pipe
cd /archive/$year$month/storage/
ls *"#$year$month$day"* |
#Find all files that match list1 and send to pipe
awk '
FILENAME!="-" { nums[++nmax]=$0; next }
{ for (n in nums) if (substr($0,35,7)==n) print }
' /home/login/scripts/list1.txt - |
#Find all files that match list2 and send to pipe
awk '
FILENAME!="-" { nums[++nmax]=$0; next }
{ for (n in nums) if (substr($0,49,7)==n) print }
' /home/login/scripts/list2.txt - |
#copy all files to temp folder
while read line ; do
cp "/archive/$year$month/storage/$line" /foundfiles/
done
Just for my own sanity, could you please try the following with your data and let me know how the runtime compares to your script? I get the feeling it should be faster, but maybe all of the processing time is being spent copying files and the time used determining which files to copy doesn't matter...
Code:
#!/bin/ksh
year="2015" # Replace with your desired year.
month="11" # Replace with your desired month.
day="11" # Replace with your desired day.
#Get full list of all files from the day in question
cd /archive/$year$month/storage/
ls *"#$year$month$day"* |
#Find all files that match list1 and list2...
awk '
FNR == 1 {
f++
}
f == 1 {list1[$0]
next
}
f == 2 {list2[$0]
next
}
substr($0, 35, 7) in list1 && substr($0, 49, 7) in list2
' /home/login/scripts/list1.txt /home/login/scripts/list2.txt - |
#copy all files to temp folder
while read -r line
do cp "$line" /foundfiles/
done
This User Gave Thanks to Don Cragun For This Post:
Hi, everyone
I have a group lists like this (more lines are omitted:
a b c d
E F G H
....
I want to generate all combinations of two elements in each group line. What I expected is this:
a b
a c
a d
b c
b d
c d
E F
E G
E H
F G
F H
G H (8 Replies)
If there are 100 files created in a directory /data/ today from 2:00 AM to 10:00 AM
I need to move files older than 3:00 AM to a new directory /hold/
How to do that?
Also, if I have to move files between 3:00 AM to 9:00 AM, how to achieve that (3 Replies)
I have a directory e2e_ms_xfer/cent01
this contains the multiple files some of which will be named below with unique date time stamps
e2e_ms_edd_nom_CCYYMMDD_HHMM.csv
What I want to do is in a loop
1) Get the oldest file
2) Rename
3) Move it up one level from e2e_ms_xfer/cent01 to... (1 Reply)
Hi,
I'm trying to improve my Unix skills and I'm wondering what is the best way to move some files based on filetype and attributes like time created?
For instance, lets suppose I have a directory with many different files in it and I'd like to move all the jpgs that were created between May... (6 Replies)
Hi
I have to move files between file systems but files in new file system must have the same attributes as in old one (privileges, time stamp etc).
Which tool is best :
- ufsdump / ufsrestore
- tar
- cpio
- pax
- dd
- mv
Or maybe there is sth else, you suggest to use.
Thx for help (5 Replies)
File_A contains Strings:
a
b
c
d
File_B contains Strings:
a
c
z
Need to have script written in either sh or ksh. Derive resultant files (File_New_A and File_New_B) from lists File_A and File_B where string elements in File_New_A and File_New_B are listed below.
Resultant... (7 Replies)
I'm a novice at unix and need it more and more to do my work.
I seem running into problems getting this script "attempt" to work:
I need to copy all files in a directory, which is containing 22000 files, into a directory one level up. There a tool monitors the content of the dir and processes... (2 Replies)
Hi Friends,
I want a shell script which will move certain .jar files from a specified location (say /publish/content) to (/publish/archive) on every saturday morning 6 am.
One more thing to add is that before moving files it must check free space at (/publish/archive), if it is more than 60 %... (7 Replies)
Hi All,
I have tried so many command but none work like i wanted.
I would like archive which i assume it will move the files and archive it somewhere.
for example:
if i have a folder and files:
/home/blah/test
/home/blah/hello
/home/blah/foo/bar
i would like to archive folder... (6 Replies)