perform 3 awk commands to multiple files in multiple directories


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting perform 3 awk commands to multiple files in multiple directories
# 15  
Old 10-27-2011
Quote:
Originally Posted by CarloM
Code:
awk -v outputPath="${fileDirName}" 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' ${fileBaseName}

You still need $fname (i.e. full path) for the input file to awk.
I think you misunderstood - giving awk a variable name 'inputPath' doesn't do anything by itself.

Change
Code:
awk -v outputPath="${fileDirName}" -v inputPath="${fileDirName}" 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' ${fileBaseName}

to:
Code:
awk -v outputPath="${fileDirName}" 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' ${fname}

# 16  
Old 10-27-2011
Hi CarloM,

I've tried that as well (by having my shell script outside the hour_1/ directory) and i get an error for all the cust_*.txt files - this is why i put cust_xxxx.yyyy.txt) :

Code:
awk: cmd. line1 (FILENAME=/home/tester/dataset/hour_1/cust_xxxx.yyyy.txt FNR=1) cannot redirect to `/home/tester/dataset/hour_1/n_/home/tester/datasets/hour_1/cust_xxxx.yyyy.txt

---------- Post updated at 10:40 AM ---------- Previous update was at 10:36 AM ----------

ok (this for felipe.) Smilie

i have a directory /home/datasets/ which contains 720 directories of hours

e.g. : hour_1/ hour_2/ ....... up to hour_720/

(Example for the hour_1/ which applies to all the hour_*/ i mentioned above)

for the first hour of an experiment i have a a folder named hour_1/
in this folder there is a file called hour1.txt which was broken down record by record and resulted into many cust_xxx_yyy.txt files (particularly for the hour the number of the cust_* files is 1160).

an example of a cust_xxxx_yyyy

Quote:
1 cust_3204_0002 12 43 6565 6786 221 6578 7686 435 .......
(every cust file after the $3 field (in this case after number 12 has varying number of fields)


what i want to do is:

put the Record Separator in every cust_xxx_yyy.txt as RS = " " in order to make it a single column file and then remove the first 3 records from every file

hand by hand i can apply the following two awk commands for setting the RS and then removing the first 3 records

doing the Record Separator
Code:
awk '{print >  "n_"FILENAME}' RS=" " cust_*

Removing the first three records
Code:
awk 'FNR>3 {print > "fin_"FILENAME}' n_cust*

and i want to apply this for all the hour_*/ directories


however, i'm now working only in hour_1/ and i get the errors i mentioned in my previous posts.

thanks again
# 17  
Old 10-27-2011
I'm afraid that didn't clarify things at all.

What is the code you're currently running, exactly what's in the directory you're running it in, and exactly what output do you get?

EDIT: Actually, just try this:
Code:
find /home/tester/datasets/ -name "hour_*.txt" -type f | \
while read fname
do
	fileBaseName=`basename "${fname}"`
	fileDirName=`dirname "${fname}"`

	awk -v outputPath="${fileDirName}" '{print $0 > outputPath "/" $2 ".txt"}' "${fname}"
	awk -v outputPath="${fileDirName}" '{print > outputPath "/" "n_" FILENAME}' RS= " " "${fileDirName}"/cust_*.txt
	awk -v outputPath="${fileDirName}" 'FNR>3 {print > outputPath "/" "fin_" FILENAME}' "${fileDirName}"/n_cust*.txt
done


Last edited by CarloM; 10-27-2011 at 01:12 PM..
# 18  
Old 10-27-2011
the latest code im running after your suggestion is (execAWK.sh) :

this script is placed in /home/tester/datasets/

where the /home/tester/datasets_26_10_11/ contains all the hour_*/ directories i explained in my previous post

Code:
#!/usr/bin/sh

set -xv

find /home/tester/datasets_26_10_11/ -name "cust_*.txt" -type f | \


while read fname
do
	fileBaseName=`basename "${fname}"`
	fileDirName=`dirname "${fname}"` 

	echo "fileBaseName: [${fileDirName}][${fileBaseName}] - fname[${fname}]"


	echo "now working on: [${fname}] with [${fileBaseName}]"
	

#first awk command	
	awk -v outputPath="${fileDirName}" 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' "${fname}"


	echo "fileBasename AFTER FIRST AWK : [${fileBaseName}]"

#second awk command - not included for now
#	awk -v outputPath="${fileDirName}" -v inputPath="${fileDirName}" 'FNR>3 {print > outputPath "/" "fin_"FILENAME}' ${fileBaseName}

done


the output i'm getting is this (i'm just copying part of it since it is extremely big):

Quote:
now working on: [/home/tester/datasets_26_10_11/hour1/cust_1064_219239.txt] with [cust_1064_219239.txt]
+ awk -v outputPath=/home/tester/datasets_26_10_11/hour1 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' /home/tester/datasets_26_10_11/hour1/cust_1064_219239.txt
+ clip
awk: cmd. line:1: (FILENAME=/home/tester/datasets_26_10_11/hour1/cust_1064_219239.txt FNR=1) fatal: can't redirect to `/home/tester/datasets_26_10_11/hour1/n_/home/tester/datasets_26_10_11/hour1/cust_1064_219239.txt' (No such file or d
irectory)
+ read fname
basename "${fname}"
++ basename /home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt
+ fileBaseName=cust_1072_220262.txt
dirname "${fname}"
++ dirname /home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt
+ fileDirName=/home/tester/datasets_26_10_11/hour1
+ echo 'fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1072_220262.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt]'
fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1072_220262.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt]
+ echo 'now working on: [/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt] with [cust_1072_220262.txt]'
now working on: [/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt] with [cust_1072_220262.txt]
+ awk -v outputPath=/home/tester/datasets_26_10_11/hour1 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' /home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt
+ clip
awk: cmd. line:1: (FILENAME=/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt FNR=1) fatal: can't redirect to `/home/tester/datasets_26_10_11/hour1/n_/home/tester/datasets_26_10_11/hour1/cust_1072_220262.txt' (No such file or d
irectory)
+ read fname
basename "${fname}"
++ basename /home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt
+ fileBaseName=cust_1077_222034.txt
dirname "${fname}"
++ dirname /home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt
+ fileDirName=/home/tester/datasets_26_10_11/hour1
+ echo 'fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1077_222034.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt]'
fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1077_222034.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt]
+ echo 'now working on: [/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt] with [cust_1077_222034.txt]'
now working on: [/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt] with [cust_1077_222034.txt]
+ awk -v outputPath=/home/tester/datasets_26_10_11/hour1 'BEGIN{RS =" ";}{print > outputPath "/" "n_"FILENAME}' /home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt
+ clip
awk: cmd. line:1: (FILENAME=/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt FNR=1) fatal: can't redirect to `/home/tester/datasets_26_10_11/hour1/n_/home/tester/datasets_26_10_11/hour1/cust_1077_222034.txt' (No such file or d
irectory)
+ read fname
basename "${fname}"
++ basename /home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt
+ fileBaseName=cust_1080_222291.txt
dirname "${fname}"
++ dirname /home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt
+ fileDirName=/home/tester/datasets_26_10_11/hour1
+ echo 'fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1080_222291.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt]'
fileBaseName: [/home/tester/datasets_26_10_11/hour1][cust_1080_222291.txt] - fname[/home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt]
+ echo 'now working on: [/home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt] with [cust_1080_222291.txt]'
now working on: [/home/tester/datasets_26_10_11/hour1/cust_1080_222291.txt] with [cust_1080_222291.txt]

i hope this helps, thank you again
# 19  
Old 10-27-2011
The problem is because awk's FILENAME points to the filename and path, not only the filename!

Try to change FILENAME to:
Code:
# E.g. awk '{ns=split(FILENAME, arr, "/"); print arr[ns]}' <infile>
# Change whenever you find FILENAME, change it by: arr[ns], but don't forget the: ns=split(FILENAME, arr, "/")

Code:
Code:
awk -v outputPath="${fileDirName}" 'BEGIN{RS =" ";}{ns=split(FILENAME, arr, "/"); print > outputPath "/" "n_" arr[ns]}' "${fname}"

This User Gave Thanks to felipe.vinturin For This Post:
# 20  
Old 10-27-2011
EDIT: What felipe said Smilie.

So this should work:
Code:
find /home/tester/datasets/ -name "hour_*.txt" -type f | \
while read fname
do
	fileBaseName=`basename "${fname}"`
	fileDirName=`dirname "${fname}"`

	awk -v outputPath="${fileDirName}" '{print $0 > outputPath "/" $2 ".txt"}' "${fname}"
	awk -v outputPath="${fileDirName}" '{ns=split(FILENAME, arr, "/"); print > outputPath "/" "n_" arr[ns]}' RS= " " "${fileDirName}"/cust_*.txt
	awk -v outputPath="${fileDirName}" 'FNR>3 {ns=split(FILENAME, arr, "/"); print > outputPath "/" "fin_" arr[ns]}' "${fileDirName}"/n_cust*.txt
done

This User Gave Thanks to CarloM For This Post:
# 21  
Old 10-27-2011
Dear both, it worked perfectly fine! Thank you!

I've also successfully employed the second awk command within a second while-do-done that is performed after a new find search, the code is as follows:

Code:
#!/usr/bin/sh

set -xv

#first find for cust_*.txt files
find /home/tester/datasets_26_10_11/hour_1/ -name "cust_*.txt" -type f | \


while read fname
do
	fileBaseName=`basename "${fname}"`
	fileDirName=`dirname "${fname}"` 

	echo "fileBaseName: [${fileDirName}][${fileBaseName}] - fname[${fname}]"


	echo "now working on: [${fname}] with [${fileBaseName}]"
	
	
	awk -v outputPath="${fileDirName}" 'BEGIN{RS =" ";}{ns=split(FILENAME,arr,"/"); print > outputPath "/" "n_" arr[ns]}' "${fname}" 


done


#start second find for the n_cust_*.txt files

find /home/tester/datasets_26_10_11/hour_1/ -name "n_cust_*.txt" -type f | \

while read fname
do
		fileBaseName=`basename "${fname}"`
		fileDirName=`dirname "${fname}"`
		awk -v outputPath="${fileDirName}" 'FNR>3 {ns=split(FILENAME,arr,"/"); print > outputPath "/" "fin_" arr[ns]}' "${fname}"
done


However it still only works for one single directory...is there a way through the find tool to make this script global for all the hour_*/ directories?

(i'll experiment with it and let you know...if anything comes into your mind you are more than welcome to suggestSmilie Smilie )

Thank you again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Create multiple directories with awk

Hello all. Newbie here. In a directory, I have 50 files and one additional file that is a list of the names of the 50 files. I would like to create a directory for each of the 50 files, and I need the 50 directory names to correspond to the 50 file names. I know this can be done by running... (6 Replies)
Discussion started by: Zeckendorff
6 Replies

2. Shell Programming and Scripting

awk, multiple files input and multiple files output

Hi! I'm new in awk and I need some help. I have a folder with a lot of files and I need that awk do something in each file and print a new file with the output. The input file name should be modified when I print the outpu files. Thanks in advance for help! :-) ciao (5 Replies)
Discussion started by: gabrysfe
5 Replies

3. UNIX for Dummies Questions & Answers

Deleting multiple directories inside multiple directories

Hi, Very unfamiliar with unix/linux stuff. Our admin is on vacation so, need help very quickly. I have directories (eg 40001, 40002, etc) that each have one subdirectory (01). Each subdir 01 has multiple subdirs (001, 002, 003, etc). They are same in each dir. I need to keep the top and... (7 Replies)
Discussion started by: kkouraus1
7 Replies

4. Shell Programming and Scripting

FTP multiple files from multiple directories

I have multiple files that starts as TRADE_LOG spread across multiple folders in the given structure.. ./dir1/1/TRADE_LOG*.gz ./dir2/10/TRADE_LOG*.gz ./dir11/12/TRADE_LOG*.gz ./dir12/13/TRADE_LOG*.gz when I do ftp uisng mput from the "." dir I am getting the below given error mput... (1 Reply)
Discussion started by: prasperl
1 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

7. Shell Programming and Scripting

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello, I am trying to write a bash shell script that does the following: 1.Finds all *.txt files within my directory of interest 2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format) 3. skips the first 10 rows of the file 4. extracts and... (4 Replies)
Discussion started by: manishabh
4 Replies

8. Shell Programming and Scripting

Multiple search string in multiple files using awk

Hi, filenames: contains name of list of files to search in. placelist contains the names of places to be searched in all files in "filenames" for i in $(<filenames) do egrep -f placelist $i if ] then echo $i fi done >> outputfile Output i am getting: (0 Replies)
Discussion started by: pinnacle
0 Replies

9. AIX

Script to perform some actions on multiple files

I have this Korn script that I wrote (with some help) that is run by cron. I basically watches a file system for a specific filename to be uploaded (via FTP), checks to make sure that the file is no longer being uploaded (by checking the files size), then runs a series of other scripts. The... (2 Replies)
Discussion started by: heprox
2 Replies

10. UNIX for Dummies Questions & Answers

Perform a command to multiple files

How do I perform a command to multiple files? For example, I want to look at all files in a directory and print the ones that do not contain a certain string. How do I go about doing this? (4 Replies)
Discussion started by: mcgrawa
4 Replies
Login or Register to Ask a Question