extract multiple cloumns from multiple files; skip rows and include filenames; awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract multiple cloumns from multiple files; skip rows and include filenames; awk
# 1  
Old 08-18-2009
extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello,

I am trying to write a bash shell script that does the following:

1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

Below I have pasted a sample input file, output file and my code


Input file format:

TYPEtexttexttexttextintegerfloatfloattexttexttextintegerintegerintegerintegerFEPARAMSProtocol_NamePr otocol_dateScan_DateScan_ScannerNameScan_NumChannelsScan_MicronsPerPixelXScan_MicronsPerPixelYScan_O riginalGUIDGrid_NameGrid_DateGrid_NumSubGridRowsGrid_NumSubGridColsGrid_NumRowsGrid_NumColsDATAmiRNA-v1_95_May07 (Read Only)####################Agilent Technologies Scanner G2505B US45102930155a18d8bd4-628a-4054-b2ba-45c7a66de583016436_D_20070426############1119282* TYPEfloatfloatfloatintegerintegerfloatintegerfloatfloatfloatintegerfloatfloatintegerSTATSgDarkOffset AveragegDarkOffsetMediangDarkOffsetStdDevgDarkOffsetNumPtsgSaturationValuegAvgSig2BkgNegCtrlgNumSatF eatgLocalBGInlierNetAvegLocalBGInlierAvegLocalBGInlierSDevgLocalBGInlierNumgGlobalBGInlierAvegGlobal BGInlierSDevgGlobalBGInlierNumDATA26.709275.44777100012031791.11899038.717365.42632.954291202965.426 32.9542912029* TYPEintegerintegerintegertextintegertextintegerintegertexttexttexttextfloatfloatFEATURESFeatureNumRo wColchr_coordSubTypeMaskSubTypeNameProbeUIDControlTypeProbeNameGeneNameSystematicNameDescriptionPosi tionXPositionYDATA111 0 01miRNABrightCorner30miRNABrightCorner30miRNABrightCorner30 6774.29228.723DATA212 66Structural21DarkCornerDarkCornerDarkCorner 6800.2229.421DATA313chr14:100595916-1005958970 30A_25_P00010115hsa-miR-154*hsa-miR-154*NA6826.51228.385DATA414chr8:135881995-1358820100 50A_25_P00010390hsa-miR-30bhsa-miR-30bNA6850.48228.853DATA515chr14:100558179-1005581610 70A_25_P00010956hsa-miR-379hsa-miR-379NA6875.37228.408DATA616chr19:058916206-0589161860 80A_25_P00011941hsa-miR-517bhsa-miR-517bNA6900.98229.321

Output format: tab delimited file. The last column shows the filename from which the data was extracted

16774.29228.723ABC.txt26800.2229.421ABC.txt36826.51228.385DEF.txt46850.48228.853DEF.txt56875.37228.4 08XYZ.txt66900.98229.321XYZ.txt


My incomplete code:
Code:
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done

thanks in advance for your help.

Last edited by Yogesh Sawant; 06-01-2010 at 10:48 AM.. Reason: added code tags
# 2  
Old 08-18-2009
Your problem can be solved using awk but first please edit your first post and add [code] tags.
# 3  
Old 08-18-2009
re: extract multiple cloumns from multiple files; skip rows and include filenames; aw

Hello,

I am trying to write a bash shell script that does the following:

1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

Below I have pasted a sample input file, output file and my code


Input file format:
Code:
 
TYPE	text	text	text	text	integer	float	float	text	text	text	integer	integer	integer	integer
FEPARAMS	Protocol_Name	Protocol_date	Scan_Date	Scan_ScannerName	Scan_NumChannels	Scan_MicronsPerPixelX	Scan_MicronsPerPixelY	Scan_OriginalGUID	Grid_Name	Grid_Date	Grid_NumSubGridRows	Grid_NumSubGridCols	Grid_NumRows	Grid_NumCols
DATA	miRNA-v1_95_May07 (Read Only)	5/2/2007 12:14	1/26/2008 11:25	Agilent Technologies Scanner G2505B US45102930	1	5	5	a18d8bd4-628a-4054-b2ba-45c7a66de583	016436_D_20070426	4/26/2007 0:00	1	1	192	82
*														
TYPE	float	float	float	integer	integer	float	integer	float	float	float	integer	float	float	integer
STATS	gDarkOffsetAverage	gDarkOffsetMedian	gDarkOffsetStdDev	gDarkOffsetNumPts	gSaturationValue	gAvgSig2BkgNegCtrl	gNumSatFeat	gLocalBGInlierNetAve	gLocalBGInlierAve	gLocalBGInlierSDev	gLocalBGInlierNum	gGlobalBGInlierAve	gGlobalBGInlierSDev	gGlobalBGInlierNum
DATA	26.709	27	5.44777	1000	1201320	1.27225	0	34.3689	61.0779	3.34618	12016	61.0779	3.34618	12016
*														
TYPE	integer	integer	integer	text	integer	text	integer	integer	text	text	text	text	float	float
FEATURES	FeatureNum	Row	Col	chr_coord	SubTypeMask	SubTypeName	ProbeUID	ControlType	ProbeName	GeneName	SystematicName	Description	PositionX	PositionY
DATA	1	1	1		0		0	1	miRNABrightCorner30	miRNABrightCorner30	miRNABrightCorner30		9949.06	229.667
DATA	2	1	2		66	Structural	2	1	DarkCorner	DarkCorner	DarkCorner		9975.77	229.279
DATA	3	1	3	chr14:100595916-100595897	0		3	0	A_25_P00010115	hsa-miR-154*	hsa-miR-154*	NA	10000.3	229.909
DATA	4	1	4	chr8:135881995-135882010	0		5	0	A_25_P00010390	hsa-miR-30b	hsa-miR-30b	NA	10025.1	229.908
DATA	5	1	5	chr14:100558179-100558161	0		7	0	A_25_P00010956	hsa-miR-379	hsa-miR-379	NA	10050.6	229.313

Output format: tab delimited file. The last column shows the filename from which the data was extracted

 
1 6774.29 228.723 ABC.txt
2 6800.2 229.421 ABC.txt
3 6826.51 228.385 DEF.txt
4 6850.48 228.853 DEF.txt
5 6875.37 228.408 XYZ.txt
6 6900.98 229.321 XYZ.txt
My incomplete code:

Code:
 
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done


thanks in advance for your help.

Last edited by manishabh; 08-18-2009 at 11:41 PM.. Reason: displayed wrong format of tables; needed to add tags
# 4  
Old 08-18-2009
Quote:
Originally Posted by manishabh
Code:
 
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done

Your data sample is useless Smilie try to copy/paste again and use [code] tags not [table] tags !

From your spinet
Code:
for filename in *.txt	# you don't need to find anything special and you are in current directory anyway
do
	awk -F"\t" '	# awk have the internal FILENAME variable(read the manual)
				BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
				' $filename > output.txt
done

Not tested but should work if that's what you want.
# 5  
Old 08-18-2009
Hi Danmero,

Thanks a lot for posting the code. I apologise for your frustating experience with trying to understand the tables. I ran the code, however it throws an error:

'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do

also I forgot to mention that each of my files is in a subdirectory. So the directory hierarchy is as follows:
root_folder-->ABC-->ABC.txt
-->CDF-->CDF.txt

So I changed a code a bit as follows:

Code:
for filename in $(find -iname '*.txt') 
do
 awk -F"\t" ' 
    BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
    ' $filename > output.txt
done

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Is there any way to cat multiple files and show filenames?

Hi, Is there any way to do a cat * where it shows the name of each file in the process? Similar to what more does below? $ more ?.sql :::::::::::::: 1.sql :::::::::::::: set linesize 200 select db_unique_name, cast( from_tz( cast(... (5 Replies)
Discussion started by: newbie_01
5 Replies

2. Shell Programming and Scripting

Create Multiple UNIX Files for Multiple SQL Rows output

Dear All, I am trying to write a Unix Script which fires a sql query. The output of the sql query gives multiple rows. Each row should be saved in a separate Unix File. The number of rows of sql output can be variable. I am able save all the rows in one file but in separate files. Any... (14 Replies)
Discussion started by: Rahul_Bhasin
14 Replies

3. Shell Programming and Scripting

connecting to table to extract multiple rows into file from unix script

I need to extract the data from oracle table and written the below code. But it is not working.There is some problem with the query and output is shown is No rows selected" . If I run the same query from sql developer there is my required output. And if I run the shell script with simple sql... (7 Replies)
Discussion started by: giridhar276
7 Replies

4. Programming

where to put include files, multiple src directories

I have an issue with some fortran include files. I have the following src directory structure, trunk/src/client_main trunk/src/client_models trunk/src/server The make file is at trunk/makefile. In /src/client_main there are some included dependencies, COMMON.BLK PARAM.DAT... (0 Replies)
Discussion started by: LMHmedchem
0 Replies

5. Shell Programming and Scripting

perform 3 awk commands to multiple files in multiple directories

Hi, I have a directory /home/datasets/ which contains a bunch (720) of subdirectories called hour_1/ hour_2/ etc..etc.. in each of these there is a single text file called (hour_1.txt in hour_1/ , hour_2.txt for hour_2/ etc..etc..) and i would like to do some text processing in them. Each of... (20 Replies)
Discussion started by: amarn
20 Replies

6. UNIX for Dummies Questions & Answers

awk, extract last line of multiple files

Hi, I have a directory full of *.txt files. I would like to print the last line of every file to screen. I know you can use FNR for printing the first line of each file, but how do I access the last line of each file? This code doesn't work, it only prints the last line of the last file:BEGIN... (5 Replies)
Discussion started by: Liverpaul09
5 Replies

7. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

8. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Discussion started by: Liverpaul09
8 Replies

9. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

10. Shell Programming and Scripting

need help with post:extract multiple columns from multiple files

hello, I will would be grateful if anyone can help me reply to my post extract multiple cloumns from multiple files; skip rows and include filenames; awk Please see this thread. Thanks manishabh (0 Replies)
Discussion started by: manishabh
0 Replies
Login or Register to Ask a Question