Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 08-18-2009
manishabh manishabh is offline
Registered User
 
Join Date: Aug 2008
Last Activity: 16 November 2011, 2:28 PM EST
Posts: 12
Thanks: 0
Thanked 0 Times in 0 Posts
extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello,

I am trying to write a bash shell script that does the following:

1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

Below I have pasted a sample input file, output file and my code


Input file format:

TYPEtexttexttexttextintegerfloatfloattexttexttextintegerintegerintegerintegerFEPARAMSProtocol_NamePr otocol_dateScan_DateScan_ScannerNameScan_NumChannelsScan_MicronsPerPixelXScan_MicronsPerPixelYScan_O riginalGUIDGrid_NameGrid_DateGrid_NumSubGridRowsGrid_NumSubGridColsGrid_NumRowsGrid_NumColsDATAmiRNA-v1_95_May07 (Read Only)####################Agilent Technologies Scanner G2505B US45102930155a18d8bd4-628a-4054-b2ba-45c7a66de583016436_D_20070426############1119282* TYPEfloatfloatfloatintegerintegerfloatintegerfloatfloatfloatintegerfloatfloatintegerSTATSgDarkOffset AveragegDarkOffsetMediangDarkOffsetStdDevgDarkOffsetNumPtsgSaturationValuegAvgSig2BkgNegCtrlgNumSatF eatgLocalBGInlierNetAvegLocalBGInlierAvegLocalBGInlierSDevgLocalBGInlierNumgGlobalBGInlierAvegGlobal BGInlierSDevgGlobalBGInlierNumDATA26.709275.44777100012031791.11899038.717365.42632.954291202965.426 32.9542912029* TYPEintegerintegerintegertextintegertextintegerintegertexttexttexttextfloatfloatFEATURESFeatureNumRo wColchr_coordSubTypeMaskSubTypeNameProbeUIDControlTypeProbeNameGeneNameSystematicNameDescriptionPosi tionXPositionYDATA111 0 01miRNABrightCorner30miRNABrightCorner30miRNABrightCorner30 6774.29228.723DATA212 66Structural21DarkCornerDarkCornerDarkCorner 6800.2229.421DATA313chr14:100595916-1005958970 30A_25_P00010115hsa-miR-154*hsa-miR-154*NA6826.51228.385DATA414chr8:135881995-1358820100 50A_25_P00010390hsa-miR-30bhsa-miR-30bNA6850.48228.853DATA515chr14:100558179-1005581610 70A_25_P00010956hsa-miR-379hsa-miR-379NA6875.37228.408DATA616chr19:058916206-0589161860 80A_25_P00011941hsa-miR-517bhsa-miR-517bNA6900.98229.321

Output format: tab delimited file. The last column shows the filename from which the data was extracted

16774.29228.723ABC.txt26800.2229.421ABC.txt36826.51228.385DEF.txt46850.48228.853DEF.txt56875.37228.4 08XYZ.txt66900.98229.321XYZ.txt


My incomplete code:

Code:
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done

thanks in advance for your help.

Last edited by Yogesh Sawant; 06-01-2010 at 09:48 AM.. Reason: added code tags
Sponsored Links
    #2  
Old Unix and Linux 08-18-2009
danmero danmero is offline Forum Advisor  
 
Join Date: Nov 2007
Last Activity: 31 July 2016, 9:42 AM EDT
Location: H3X
Posts: 2,163
Thanks: 11
Thanked 123 Times in 116 Posts
Your problem can be solved using awk but first please edit your first post and add [code] tags.
Sponsored Links
    #3  
Old Unix and Linux 08-18-2009
manishabh manishabh is offline
Registered User
 
Join Date: Aug 2008
Last Activity: 16 November 2011, 2:28 PM EST
Posts: 12
Thanks: 0
Thanked 0 Times in 0 Posts
re: extract multiple cloumns from multiple files; skip rows and include filenames; aw

Hello,

I am trying to write a bash shell script that does the following:

1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

Below I have pasted a sample input file, output file and my code


Input file format:

Code:
 
TYPE	text	text	text	text	integer	float	float	text	text	text	integer	integer	integer	integer
FEPARAMS	Protocol_Name	Protocol_date	Scan_Date	Scan_ScannerName	Scan_NumChannels	Scan_MicronsPerPixelX	Scan_MicronsPerPixelY	Scan_OriginalGUID	Grid_Name	Grid_Date	Grid_NumSubGridRows	Grid_NumSubGridCols	Grid_NumRows	Grid_NumCols
DATA	miRNA-v1_95_May07 (Read Only)	5/2/2007 12:14	1/26/2008 11:25	Agilent Technologies Scanner G2505B US45102930	1	5	5	a18d8bd4-628a-4054-b2ba-45c7a66de583	016436_D_20070426	4/26/2007 0:00	1	1	192	82
*														
TYPE	float	float	float	integer	integer	float	integer	float	float	float	integer	float	float	integer
STATS	gDarkOffsetAverage	gDarkOffsetMedian	gDarkOffsetStdDev	gDarkOffsetNumPts	gSaturationValue	gAvgSig2BkgNegCtrl	gNumSatFeat	gLocalBGInlierNetAve	gLocalBGInlierAve	gLocalBGInlierSDev	gLocalBGInlierNum	gGlobalBGInlierAve	gGlobalBGInlierSDev	gGlobalBGInlierNum
DATA	26.709	27	5.44777	1000	1201320	1.27225	0	34.3689	61.0779	3.34618	12016	61.0779	3.34618	12016
*														
TYPE	integer	integer	integer	text	integer	text	integer	integer	text	text	text	text	float	float
FEATURES	FeatureNum	Row	Col	chr_coord	SubTypeMask	SubTypeName	ProbeUID	ControlType	ProbeName	GeneName	SystematicName	Description	PositionX	PositionY
DATA	1	1	1		0		0	1	miRNABrightCorner30	miRNABrightCorner30	miRNABrightCorner30		9949.06	229.667
DATA	2	1	2		66	Structural	2	1	DarkCorner	DarkCorner	DarkCorner		9975.77	229.279
DATA	3	1	3	chr14:100595916-100595897	0		3	0	A_25_P00010115	hsa-miR-154*	hsa-miR-154*	NA	10000.3	229.909
DATA	4	1	4	chr8:135881995-135882010	0		5	0	A_25_P00010390	hsa-miR-30b	hsa-miR-30b	NA	10025.1	229.908
DATA	5	1	5	chr14:100558179-100558161	0		7	0	A_25_P00010956	hsa-miR-379	hsa-miR-379	NA	10050.6	229.313

Output format: tab delimited file. The last column shows the filename from which the data was extracted

 
1 6774.29 228.723 ABC.txt
2 6800.2 229.421 ABC.txt
3 6826.51 228.385 DEF.txt
4 6850.48 228.853 DEF.txt
5 6875.37 228.408 XYZ.txt
6 6900.98 229.321 XYZ.txt
My incomplete code:


Code:
 
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done


thanks in advance for your help.

Last edited by manishabh; 08-18-2009 at 10:41 PM.. Reason: displayed wrong format of tables; needed to add tags
    #4  
Old Unix and Linux 08-18-2009
danmero danmero is offline Forum Advisor  
 
Join Date: Nov 2007
Last Activity: 31 July 2016, 9:42 AM EDT
Location: H3X
Posts: 2,163
Thanks: 11
Thanked 123 Times in 116 Posts
Quote:
Originally Posted by manishabh View Post
Code:
 
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done

Your data sample is useless Linux try to copy/paste again and use [code] tags not [table] tags !

From your spinet

Code:
for filename in *.txt	# you don't need to find anything special and you are in current directory anyway
do
	awk -F"\t" '	# awk have the internal FILENAME variable(read the manual)
				BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
				' $filename > output.txt
done

Not tested but should work if that's what you want.
Sponsored Links
    #5  
Old Unix and Linux 08-18-2009
manishabh manishabh is offline
Registered User
 
Join Date: Aug 2008
Last Activity: 16 November 2011, 2:28 PM EST
Posts: 12
Thanks: 0
Thanked 0 Times in 0 Posts
Hi Danmero,

Thanks a lot for posting the code. I apologise for your frustating experience with trying to understand the tables. I ran the code, however it throws an error:

'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do

also I forgot to mention that each of my files is in a subdirectory. So the directory hierarchy is as follows:
root_folder-->ABC-->ABC.txt
-->CDF-->CDF.txt

So I changed a code a bit as follows:


Code:
for filename in $(find -iname '*.txt') 
do
 awk -F"\t" ' 
    BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
    ' $filename > output.txt
done

Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to extract multiple files from tape goude UNIX for Dummies Questions & Answers 10 10-06-2008 11:32 AM
read list of filenames from text file and remove these files in multiple directories fxvisions Shell Programming and Scripting 5 08-07-2008 03:59 PM
How can I rename multiple files depending on a string occuring in the filenames? karman UNIX for Dummies Questions & Answers 6 05-22-2007 02:29 PM
Script to backup multiple files and amend their filenames m223464 Shell Programming and Scripting 5 11-25-2005 05:30 AM
Renaming of multiple filenames shashi_kiran_v UNIX for Dummies Questions & Answers 7 07-05-2005 08:33 AM



All times are GMT -4. The time now is 01:57 PM.