Sponsored Content
Top Forums Shell Programming and Scripting Creating a new percentage summary file Post 302655653 by fozrun on Wednesday 13th of June 2012 12:57:02 PM
Old 06-13-2012
Creating a new percentage summary file

Hello Forumites.

You guys really helped me out in the past with manipulating some files with awk commands. Now, the output from the analysis program has changed and I would like to rework the data.

The output now looks something like file below, where the top row will contain 5 initial columns, followed by however many samples have been analysed. The file is tab delimited. I would like to get the percentage sequences, using the Bacteria in each sample as a denominator, above a certain threshold (say 0.005 (a half of a percent) occurring in at least one sample) and have the result in a new file.

Code:
taxlevel	 rankID	 taxon	 daughterlevels	 total	D1	D13	D17	D19	
0	0	Root	1	167944	3323	4018	4704	3634	
1	0.1	Bacteria	25	167944	3323	4018	4704	3634

5	0.1.7.4.1.18	Prevotellaceae	3	38447	923	1198	1727	1267
6	0.1.7.4.1.18.2	Prevotella	1	24834	658	915	1235	734
7	0.1.7.4.1.18.2.1	unclassified	0	24834	658	915	1235	734
6	0.1.7.4.1.18.3	Xylanibacter	1	756	3	2	41	3
7	0.1.7.4.1.18.3.1	unclassified	0	756	3	2	3	0
6	0.1.7.4.1.18.5	uncultured	1	12857	262	281	451	533	
7	0.1.7.4.1.18.5.1	unclassified	0	12857	262	281	451	533	
5	0.1.7.4.1.19	RF16	1	2196	14	39	77	58

Would become something like (calc errors possible, I did it by hand):

Code:
taxlevel	 rankID	 taxon	 daughterlevels	 total	D1	D13	D17	D19	
0	0	Root	1	167944	3323	4018	4704	3634	
1	0.1	Bacteria	25	167944	3323	4018	4704	3634

5	0.1.7.4.1.18	Prevotellaceae	3	0.2289	0.2777	0.2982	0.3671	0.3486
6	0.1.7.4.1.18.2	Prevotella	1	0.1478	0.1980	0.2277	0.2625	0.2020
7	0.1.7.4.1.18.2.1	unclassified	0	0.1478	0.1980	0.2277	0.2625	0.2020
6	0.1.7.4.1.18.3	Xylanibacter	1	0.00450	0.0000	0.0000	0.0087	0.0000
6	0.1.7.4.1.18.5	uncultured	1	0.0765	.07884	.0699	.0959	0.1467	
7	0.1.7.4.1.18.5.1	unclassified	0	0.0765	.07884	.0699	.0959	0.1467
5	0.1.7.4.1.19	RF16	1	0.0131	0.0042	0.0097	0.0164	0.0159

Xylanibacter would stay be in the table, as sample D17 is above the threshold, but 0.1.7.4.1.18.3.1 unclassified would not.

Any ideas greatly appreciated!
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Create Summary file containg information

Folks, I have multiple files in a folder containing some information (there is around 100 of them). What I would like to do would be able to import some of the information into a summary text file so that it will be easier to read a glance. The name of the files all start with the naming... (4 Replies)
Discussion started by: lodey
4 Replies

2. Shell Programming and Scripting

Need to find the percentage of the directory in the file system.

Hi All, I want to find the percentage occupied by the directory in the file system. Say, i have the file system /home/arun/work under this file system i have the directories /home/arun/work/yesterday /home/arun/work/today /home/arun/work/tomorrow The size of the file system is... (5 Replies)
Discussion started by: Arunprasad
5 Replies

3. Shell Programming and Scripting

awk script to count percentage from log file

Hi, I have a log like this : actually i want to get the log like this : where % can get from : 100 * pmTotNoRrcConnectReqSucc / pmTotNoRrcConnectReq Thanks in advance.. :) (8 Replies)
Discussion started by: justbow
8 Replies

4. AIX

File system percentage to the hole size ?

Hi, I'd like to know how can I figure out my disk space area on AIX machine, for example to the situation of ( df -g ) which I have in my system : the area used by (/opt/oracle) file system is (98%) now. the free area on (/opt/oracle) is (0.75) now. the total size in Gigabyte... (1 Reply)
Discussion started by: arm
1 Replies

5. Shell Programming and Scripting

Using awk to create a summary of a structured file

I am trying to use awk to create a summary of a structured file. Here is what it looks like: (random text) H1 H2 H3 H4 44 78 99 30 31 -- 32 21 12 33 55 21 I'd like to be able to specify a column, say H2, and then have information about that column printed. ... (4 Replies)
Discussion started by: afulldevnull
4 Replies

6. Shell Programming and Scripting

How to calculate what percentage of X value is there in the file?

Input File: 5081 2058 175 8282 2358 7347 6612 3459 END OF INPUT FILE I need to know how to calculate minimum,maximum,average of the values in the file and also what percentage is the values over some user defined value for example 1000 and what percentage of value is over 5000. By... (2 Replies)
Discussion started by: aroragaurav.84
2 Replies

7. Shell Programming and Scripting

Summary report csv file

Hello, I have 2 csv files with 4 columns each. file1.csv A, AA, AAA, AAAA B, BB, BBB, BBBB file2.csv C, CC, CCC, CCCC D, DD, DDD, DDDD I would like to use shell commands (sed, awk...) to copy the content of the 2 files (2x4 columns) into a final csv template file. Expected... (2 Replies)
Discussion started by: inMyZone35
2 Replies

8. Shell Programming and Scripting

Help with awk percentage calculation from a file

i have a file say test with the below mentioned details Folder Name Total space Space used /test/test1 500.1GB 112.0 GB /test/test2 3.2 TB 5TB /test/test3 3TB 100GB i need to calculate percentage of each row based on total space and space used and copy... (9 Replies)
Discussion started by: venkitesh
9 Replies
OPGPROF(1)						      General Commands Manual							OPGPROF(1)

NAME
opgprof - produce gprof-format profile data SYNOPSIS
opgprof [ options ] [profile specification] DESCRIPTION
opgprof outputs gprof-format profile data for a given binary image, from an OProfile session. See oprofile(1) for how to write profile specifications. OPTIONS
--help / -? / --usage Show help message. --version / -v Show version. --verbose / -V [options] Give verbose debugging output. --session-dir=dir_path Use sample database out of directory dir_path instead of the default location (/var/lib/oprofile). --image-path / -p [paths] Comma-separated list of additional paths to search for binaries. This is needed to find modules in kernels 2.6 and upwards. --root / -R [path] A path to a filesystem to search for additional binaries. --threshold / -t [percentage] Only output data for symbols that have more than the given percentage of total samples. --output-filename / -o [file] Output to the given file instead of the default, gmon.out ENVIRONMENT
No special environment variables are recognised by opgprof. FILES
/var/lib/oprofile/samples/ The location of the generated sample files. VERSION
This man page is current for oprofile-0.9.9. SEE ALSO
/usr/share/doc/oprofile/, oprofile(1) 4th Berkeley Distribution Tue 10 June 2014 OPGPROF(1)
All times are GMT -4. The time now is 06:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy