Passing multiple files to awk for processing in bash script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Passing multiple files to awk for processing in bash script
# 8  
Old 06-16-2014
I imagine the hadoop ls output isn't just a list of filenames - you'd need to pre-process it to get just the names.

However, a quick look at a hadoop man page seems to imply that you can't access the files directly, so you would need a local copy of each to do it by file anyway.

If you're not using the filename (or other per-file processing) then you could cat the files into awk. Something like:
Code:
hadoop fs -cat /user/user/data/file* | awk '{stuff}'

(assuming hadoop cat doesn't add anything to the output, else you'd need to pre-process it)

You could also adapt Scrutinizer's suggestion if you need per-file processing.

Last edited by CarloM; 06-16-2014 at 11:21 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Loop through multiple files in bash script

Hi Everybody, I'm a newbie to shell scripting, and I'd appreciate some help. I have a bunch of .txt files that have some unwanted content. I want to remove lines 1-3 and 1028-1098. #!/bin/bash for '*.txt' in <path to folder> do sed '1,3 d' "$f"; sed '1028,1098 d' "$f"; done I... (2 Replies)
Discussion started by: BabyNuke
2 Replies

2. Shell Programming and Scripting

Plink (processing multiple commands) using Bash

I'm completely brand new to bash scripting (migrating from Windows batch file scripting). I'm currently trying to write a bash script that will automatically reset "error-disabled" Cisco switch ports. Please forgive the very crude and inefficient script I have so far (shown below). It is... (10 Replies)
Discussion started by: MKANET
10 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. Shell Programming and Scripting

Processing multiple files awk

hai i need my single awk script to act on 4 trace files of ns2 and to calculate througput and it should print result from each trace file in a single trace file. i tried with the following code but it doesnt work awk -f awkscript inputfile1 inputfile2 inputfile3 inputfile4>outputfile ... (4 Replies)
Discussion started by: sarathyy
4 Replies

5. Shell Programming and Scripting

Passing multiple files to awk

Hi all, I have a load of files in the format e.g. a_1.out a_300.out a_20.out etc I would like to numeric sort them in ascending order by the number in the file name, then pass them into awk for manipulation. How do I do this? (8 Replies)
Discussion started by: jimjam
8 Replies

6. Shell Programming and Scripting

Bash script to copy timestamps of multiple files

Hi, I have a bunch of media files in a directory that have been converted (from MTS to MOV format), so my directory contains something like this: clip1.mts clip1.mov clip2.mts clip2.mov The problem is that the .mov files that have been created have the timestamps of the conversion task,... (2 Replies)
Discussion started by: Krakus
2 Replies

7. Shell Programming and Scripting

bash script to compile multiple .c files with some options

I'm trying to write a bash script and call it "compile" such that running it allows me to compile multiple files with the options "-help," "-backup," and "-clean". I've got the code for the options written, i just can't figure out how to read the input string and then translate that into option... (5 Replies)
Discussion started by: travis.batzer
5 Replies

8. Shell Programming and Scripting

awk script processing data from 2 files

Hi! I have 2 files containing data that I need to process at the same time, I have problems in reading a different number of lines from the different files. Here is an explanation of what I need to do (possibly with an awk script). File "samples.txt" contains data in the format: time_instant... (6 Replies)
Discussion started by: Alice236
6 Replies

9. UNIX for Dummies Questions & Answers

single output of awk script processing multiple files

Helllo UNIX Forum :) Since I am posting on this board, yes, I am new to UNIX! I read a copy of "UNIX made easy" from 1990, which felt like a making a "computer-science time jump" backwards ;) So, basically I have some sort of understanding what the basic concept is. Problem Description:... (6 Replies)
Discussion started by: Kasimir
6 Replies

10. Shell Programming and Scripting

How to write bash script to explode multiple zip files

I have a directory full of zip files. How would I write a bash script to enumerate all the zip files, remove the ".zip" from the file name, create a directory by that name and unzip each zip file into its corresponding directory? Thanks! Siegfried (3 Replies)
Discussion started by: siegfried
3 Replies
Login or Register to Ask a Question
SLEEP(1)						    BSD General Commands Manual 						  SLEEP(1)

NAME
sleep -- suspend execution for an interval of time SYNOPSIS
sleep seconds DESCRIPTION
The sleep utility suspends execution for a minimum of seconds. It is usually used to schedule the execution of other commands (see EXAMPLES below). Note: The NetBSD sleep command will accept and honor a non-integer number of specified seconds. This is a non-portable extension, and its use will nearly guarantee that a shell script will not execute properly on another system. When the SIGINFO signal is received, the estimate of the amount of seconds left to sleep is printed on the standard output. EXIT STATUS
The sleep utility exits with one of the following values: 0 On successful completion, or if the signal SIGALRM was received. >0 An error occurred. EXAMPLES
To schedule the execution of a command for 1800 seconds later: (sleep 1800; sh command_file >& errors)& This incantation would wait half an hour before running the script command_file. (See the at(1) utility.) To reiteratively run a command (with csh(1)): while (1) if (! -r zzz.rawdata) then sleep 300 else foreach i (*.rawdata) sleep 70 awk -f collapse_data $i >> results end break endif end The scenario for a script such as this might be: a program currently running is taking longer than expected to process a series of files, and it would be nice to have another program start processing the files created by the first program as soon as it is finished (when zzz.rawdata is created). The script checks every five minutes for the file zzz.rawdata, when the file is found, then another portion processing is done courteously by sleeping for 70 seconds in between each awk job. SEE ALSO
at(1), nanosleep(2), sleep(3) STANDARDS
The sleep command is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible. BSD
August 13, 2011 BSD