Passing multiple files to awk for processing in bash script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Passing multiple files to awk for processing in bash script
# 1  
Old 06-12-2014
Passing multiple files to awk for processing in bash script

Hi,

I'm using awk command in bash script. I'm able to pass multiple files to awk for processing.The code i can use is as below(sample code)
Code:
#!/bin/bash
awk -F "," 'BEGIN { 
...
...
...
}' file1 file2 file3

In the above code i'm passing the file names manually and it is fine till my processing files are less. Suppose if i have a 1000's of files in a directory, then how can i process them using awk at a time?
Can i point to a directory so that the awk will take all the files present in that directory and allowing me not to supply all the filenames to awk command?
Is it possible?

Thanks,
Shree
# 2  
Old 06-12-2014
Use wildcard, whenever FNR==1 file is changing, you can process individual file here, if array is used you can delete it even, if all together you want to process then process in END block
Try this you will get idea..
Code:
awk 'FNR==1{print FILENAME; ++i}END{print "total files read : ",i}' file*

# 3  
Old 06-12-2014
You could just let the shell expand the filenames:
Code:
awk stuff <dirname>/*

If you have a lot of files (enough to exceed the maximum length of an argument list) then you could use find with -exec or xargs:
Code:
find <dirname> -type f | xargs awk stuff

# 4  
Old 06-12-2014
Okay, <dirname>/* can be used to process multiple files.

Suppose if i want to process files present in a hadoop HDFS , can i direcctly process using awk script like below as u suggested ?
Code:
awk { 
stuff
}'  "hadoop fs -ls /user/user/data/file*"  

# 5  
Old 06-12-2014
Code:
awk '{ 
stuff
}'  $(hadoop fs -ls /user/user/data/file*)

should pass the list of filenames to awk. I'm not familiar with hadoop though - would you need to execute a hadoop command to access the file contents?
# 6  
Old 06-12-2014
If you have many files and ARG_MAX is exceeded (You'll see a message like: awk: arg list too long) , then it depends on the script if xargs can be used, though. xargs may need to call the awk script multiple times depending on the number of files, so the outcome will be wrong if for example your script is calculating a grand total number, for which it needs the content of all those files.

Of course using cat * to concatenate the files and feeding that output into awk's stdin brings no solace either, since cat has the same restrictions.

But these restrictions could be circumvented with a construct like this:

Code:
for i in *
do
  cat "$i"
done |
awk -F "," 'BEGIN { 
...
...
...
}'


Last edited by Scrutinizer; 06-12-2014 at 09:14 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 06-13-2014
@CarloM,

I tried with the code:
Code:
awk '{ 
stuff
}' file1.txt $(hadoop fs -ls /user/user/data/file2.txt)

where file1 -> reading from the local
fiel2 -> reading the file from HDFS(hadoop)

But it's saying:
Quote:
awk: cmd. line:12: (FILENAME=file1.txt FNR=5) fatal: cannot open file `Found' for reading (No such file or directory)
Not getting why it is giving error for file1 although file1.txt is exists in the local directory.When i tried removing the part $(hadoop fs -ls /user/user/data/file2.txt) and copy the file2.txt to local and give the local path like }' file1.txt file2.txt its working fine.

So it's not allowing to take the file directly from hdfs to process using awk command . How how can i do it? is there any alternative for this?
NOTE: i don't want to copy the hadoop files to local directory to acheive this

Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Loop through multiple files in bash script

Hi Everybody, I'm a newbie to shell scripting, and I'd appreciate some help. I have a bunch of .txt files that have some unwanted content. I want to remove lines 1-3 and 1028-1098. #!/bin/bash for '*.txt' in <path to folder> do sed '1,3 d' "$f"; sed '1028,1098 d' "$f"; done I... (2 Replies)
Discussion started by: BabyNuke
2 Replies

2. Shell Programming and Scripting

Plink (processing multiple commands) using Bash

I'm completely brand new to bash scripting (migrating from Windows batch file scripting). I'm currently trying to write a bash script that will automatically reset "error-disabled" Cisco switch ports. Please forgive the very crude and inefficient script I have so far (shown below). It is... (10 Replies)
Discussion started by: MKANET
10 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. Shell Programming and Scripting

Processing multiple files awk

hai i need my single awk script to act on 4 trace files of ns2 and to calculate througput and it should print result from each trace file in a single trace file. i tried with the following code but it doesnt work awk -f awkscript inputfile1 inputfile2 inputfile3 inputfile4>outputfile ... (4 Replies)
Discussion started by: sarathyy
4 Replies

5. Shell Programming and Scripting

Passing multiple files to awk

Hi all, I have a load of files in the format e.g. a_1.out a_300.out a_20.out etc I would like to numeric sort them in ascending order by the number in the file name, then pass them into awk for manipulation. How do I do this? (8 Replies)
Discussion started by: jimjam
8 Replies

6. Shell Programming and Scripting

Bash script to copy timestamps of multiple files

Hi, I have a bunch of media files in a directory that have been converted (from MTS to MOV format), so my directory contains something like this: clip1.mts clip1.mov clip2.mts clip2.mov The problem is that the .mov files that have been created have the timestamps of the conversion task,... (2 Replies)
Discussion started by: Krakus
2 Replies

7. Shell Programming and Scripting

bash script to compile multiple .c files with some options

I'm trying to write a bash script and call it "compile" such that running it allows me to compile multiple files with the options "-help," "-backup," and "-clean". I've got the code for the options written, i just can't figure out how to read the input string and then translate that into option... (5 Replies)
Discussion started by: travis.batzer
5 Replies

8. Shell Programming and Scripting

awk script processing data from 2 files

Hi! I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files. Here is an explanation of what I need to do (possibly with an awk script). File "samples.txt" contains data in the format: time_instant... (6 Replies)
Discussion started by: Alice236
6 Replies

9. UNIX for Dummies Questions & Answers

single output of awk script processing multiple files

Helllo UNIX Forum :) Since I am posting on this board, yes, I am new to UNIX! I read a copy of "UNIX made easy" from 1990, which felt like a making a "computer-science time jump" backwards ;) So, basically I have some sort of understanding what the basic concept is. Problem Description:... (6 Replies)
Discussion started by: Kasimir
6 Replies

10. Shell Programming and Scripting

How to write bash script to explode multiple zip files

I have a directory full of zip files. How would I write a bash script to enumerate all the zip files, remove the ".zip" from the file name, create a directory by that name and unzip each zip file into its corresponding directory? Thanks! Siegfried (3 Replies)
Discussion started by: siegfried
3 Replies
Login or Register to Ask a Question