extract multiple cloumns from multiple files; skip rows and include filenames; awk
Hello,
I am trying to write a bash shell script that does the following:
1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.
I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.
Below I have pasted a sample input file, output file and my code
re: extract multiple cloumns from multiple files; skip rows and include filenames; aw
Hello,
I am trying to write a bash shell script that does the following:
1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.
I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.
Below I have pasted a sample input file, output file and my code
Input file format:
Code:
TYPE text text text text integer float float text text text integer integer integer integer
FEPARAMS Protocol_Name Protocol_date Scan_Date Scan_ScannerName Scan_NumChannels Scan_MicronsPerPixelX Scan_MicronsPerPixelY Scan_OriginalGUID Grid_Name Grid_Date Grid_NumSubGridRows Grid_NumSubGridCols Grid_NumRows Grid_NumCols
DATA miRNA-v1_95_May07 (Read Only) 5/2/2007 12:14 1/26/2008 11:25 Agilent Technologies Scanner G2505B US45102930 1 5 5 a18d8bd4-628a-4054-b2ba-45c7a66de583 016436_D_20070426 4/26/2007 0:00 1 1 192 82
*
TYPE float float float integer integer float integer float float float integer float float integer
STATS gDarkOffsetAverage gDarkOffsetMedian gDarkOffsetStdDev gDarkOffsetNumPts gSaturationValue gAvgSig2BkgNegCtrl gNumSatFeat gLocalBGInlierNetAve gLocalBGInlierAve gLocalBGInlierSDev gLocalBGInlierNum gGlobalBGInlierAve gGlobalBGInlierSDev gGlobalBGInlierNum
DATA 26.709 27 5.44777 1000 1201320 1.27225 0 34.3689 61.0779 3.34618 12016 61.0779 3.34618 12016
*
TYPE integer integer integer text integer text integer integer text text text text float float
FEATURES FeatureNum Row Col chr_coord SubTypeMask SubTypeName ProbeUID ControlType ProbeName GeneName SystematicName Description PositionX PositionY
DATA 1 1 1 0 0 1 miRNABrightCorner30 miRNABrightCorner30 miRNABrightCorner30 9949.06 229.667
DATA 2 1 2 66 Structural 2 1 DarkCorner DarkCorner DarkCorner 9975.77 229.279
DATA 3 1 3 chr14:100595916-100595897 0 3 0 A_25_P00010115 hsa-miR-154* hsa-miR-154* NA 10000.3 229.909
DATA 4 1 4 chr8:135881995-135882010 0 5 0 A_25_P00010390 hsa-miR-30b hsa-miR-30b NA 10025.1 229.908
DATA 5 1 5 chr14:100558179-100558161 0 7 0 A_25_P00010956 hsa-miR-379 hsa-miR-379 NA 10050.6 229.313
Output format: tab delimited file. The last column shows the filename from which the data was extracted
1 6774.29 228.723 ABC.txt
2 6800.2 229.421 ABC.txt
3 6826.51 228.385 DEF.txt
4 6850.48 228.853 DEF.txt
5 6875.37 228.408 XYZ.txt
6 6900.98 229.321 XYZ.txt
My incomplete code:
Code:
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done
thanks in advance for your help.
Last edited by manishabh; 08-18-2009 at 11:41 PM..
Reason: displayed wrong format of tables; needed to add tags
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done
Your data sample is useless try to copy/paste again and use [code] tags not [table] tags !
From your spinet
Code:
for filename in *.txt # you don't need to find anything special and you are in current directory anyway
do
awk -F"\t" ' # awk have the internal FILENAME variable(read the manual)
BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
' $filename > output.txt
done
Not tested but should work if that's what you want.
Thanks a lot for posting the code. I apologise for your frustating experience with trying to understand the tables. I ran the code, however it throws an error:
'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do
also I forgot to mention that each of my files is in a subdirectory. So the directory hierarchy is as follows:
root_folder-->ABC-->ABC.txt
-->CDF-->CDF.txt
So I changed a code a bit as follows:
Code:
for filename in $(find -iname '*.txt')
do
awk -F"\t" '
BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
' $filename > output.txt
done
Hi,
Is there any way to do a cat * where it shows the name of each file in the process? Similar to what more does below?
$ more ?.sql
::::::::::::::
1.sql
::::::::::::::
set linesize 200
select db_unique_name,
cast(
from_tz(
cast(... (5 Replies)
Dear All,
I am trying to write a Unix Script which fires a sql query. The output of the sql query gives multiple rows. Each row should be saved in a separate Unix File.
The number of rows of sql output can be variable. I am able save all the rows in one file but in separate files.
Any... (14 Replies)
I need to extract the data from oracle table and written the below code.
But it is not working.There is some problem with the query and output is shown is No rows selected" . If I run the same query from sql developer there is my required output.
And if I run the shell script with simple sql... (7 Replies)
I have an issue with some fortran include files.
I have the following src directory structure,
trunk/src/client_main
trunk/src/client_models
trunk/src/server
The make file is at trunk/makefile.
In /src/client_main there are some included dependencies,
COMMON.BLK
PARAM.DAT... (0 Replies)
Hi,
I have a directory /home/datasets/ which contains a bunch (720) of subdirectories called hour_1/ hour_2/ etc..etc.. in each of these there is a single text file called (hour_1.txt in hour_1/ , hour_2.txt for hour_2/ etc..etc..) and i would like to do some text processing in them.
Each of... (20 Replies)
Hi,
I have a directory full of *.txt files. I would like to print the last line of every file to screen.
I know you can use FNR for printing the first line of each file, but how do I access the last line of each file?
This code doesn't work, it only prints the last line of the last file:BEGIN... (5 Replies)
Hi,
I'd like to process multiple files. For example:
file1.txt
file2.txt
file3.txt
Each file contains several lines of data. I want to extract a piece of data and output it to a new file.
file1.txt ----> newfile1.txt
file2.txt ----> newfile2.txt
file3.txt ----> newfile3.txt
Here is... (3 Replies)
Hi,
I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag.
I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Hi guys,
say I have a few files in a directory (58 text files or somthing)
each one contains mulitple strings that I wish to replace with other strings
so in these 58 files I'm looking for say the following strings:
JAM (replace with BUTTER)
BREAD (replace with CRACKER)
SCOOP (replace... (19 Replies)
hello,
I will would be grateful if anyone can help me reply to my post
extract multiple cloumns from multiple files; skip rows and include filenames; awk
Please see this thread.
Thanks
manishabh (0 Replies)