Sponsored Content
Top Forums Programming awk to count occurrence of strings and loop for multiple columns Post 302919042 by Don Cragun on Saturday 27th of September 2014 04:20:00 AM
Old 09-27-2014
Quote:
Originally Posted by iling14
Hi Don,

Thanks for your response. I'm currently using redhat. The line max is about 100k lines, while the number of fields are 100.

Yup, my desired output shall be what you have corrected. I manage to do the count for column by column, but wonder how could i perform a for loop one shot for 100 columns.Smilie

Smilie
You didn't answer most of my questions. But, now that I know that you're using a Linux system, I know that your version of awk will handle pretty much unlimited line lengths.

I'm still trying to get a feel for what the input and output data is going to look like. I'm assuming that the 100k number you gave is the number of lines in your input file (not the number of bytes in the longest line in your input file). Assuming that the numbers in the 1st column of your input are unique and that there are 6 distinct values in the other columns (A, G, C, T, D, and E) that means that we are converting 100k input rows with a maximum line length of about 210 bytes each into six output data lines (plus one header line) with a maximum line length approaching (3 * 99 spaces between fields +1 * 99 single letters for the col2 through col100 data + 99 * (6 digits + 1 comma) * 100k col1 values + 99 * 5 digits for the count values) 69.3 million bytes and an average line length approaching 11.5 million bytes.

Am I in the right ballpark here, or are my assumptions off? If my assumptions are off, where am I guessing wrong? Are there values other than A, G, C, T, D, and E that will appear in col2 through col100 in the input file?

Once you have created this file, do you have something that is going to be able to use this data?
 

10 More Discussions You Might Find Interesting

1. Linux

To find multiple strings count in a file

I need to find the line count of multiple strings in a particular file. The strings are as follows: bmgcc bmgccftp bsmsftp bulkftp cctuneftp crbtftp crmpos cso gujhr I am doing manual grep for each of the string to find the line count. The command i am using right now is: grep mark... (3 Replies)
Discussion started by: salaathi
3 Replies

2. Shell Programming and Scripting

Count occurance of multiple strings using grep command

How to grep multiple string occurance in input file using single grep command? I have below input file with many IDP, RRBE messages. Out put should have count of each messages. I have used below command but it is not working grep -cH "(sent IDP Request)(Recv RRBCSM)" *.txt ... (5 Replies)
Discussion started by: sushmab82
5 Replies

3. Shell Programming and Scripting

Copying of multiple columns of one table to another by mapping with particular strings.

Hi, I would like to copy some columns from a particular file by mapping with the string names. i am using the .csv file format. my one file consist of 100 of columns but i want only particular 4 columns such as ( First_name, Middle_name,Last_name & Stlc). but they are listed in many files... (15 Replies)
Discussion started by: dsh007
15 Replies

4. Shell Programming and Scripting

Count no of occurrence of the strings based on column value

Can anyone help me to count number of occurrence of the strings based on column value. Say i have 300 files with 1000 record length from which i need to count the number of occurrence string which is existing from 213 to 219. Some may be unique and some may be repeated. (8 Replies)
Discussion started by: zooby
8 Replies

5. Shell Programming and Scripting

How to count the number of occurrence of words from multiple files?

File 1 aaa bbb ccc File 2 aaa xxx zzz bbb File 3 aaa bbb xxx Output: (4 Replies)
Discussion started by: Misa-Misa
4 Replies

6. UNIX for Dummies Questions & Answers

Split a column into multiple columns at certain character count

Hey everyone, I have an issue with a client that is passing me a list of values in one column, and occasionally the combination of all the values results in more than an 255 character string. My DB has a 255 character limit, so I am looking to take the column (comma delimited file), and if it... (1 Reply)
Discussion started by: perekl
1 Replies

7. Shell Programming and Scripting

Count occurrence of string in a column using awk

Hi, I want to count the occurrences of strings in a column and display as in example below: Input: get1 345 789 098 get2 567 982 090 fet4 777 610 632 get1 800 544 230 get1 600 788 451 get2 892 321 243 get1 673 111 235 fet3 789 220 278 fet4 768 222 341 output: 4 get1 345 789... (7 Replies)
Discussion started by: aydj
7 Replies

8. UNIX for Dummies Questions & Answers

[Solved] Awk: count occurrence of each character for every field

Hi, let's say an input looks like: A|C|C|D A|C|I|E A|B|I|C A|T|I|B as the title of the thread explains, I am trying to get something like: 1|A=4 2|C=2|B=1|T=1 3|I=3|C=1 4|D=1|E=1|C=1|B=1 i.e. a count of every character in each field (first column of output) independently, sorted... (4 Replies)
Discussion started by: beca123456
4 Replies

9. UNIX for Dummies Questions & Answers

Count occurrence of string (based on type) in a column using awk

Hello, I have a table that looks like what is shown below: AA BB CC XY PQ RS AA BB CC XY RS I would like the total counts depending on the set they belong to: if search pattern is in {AA, BB, CC} --> count them as Type1 | wc -l (3 Replies)
Discussion started by: Gussifinknottle
3 Replies

10. UNIX for Beginners Questions & Answers

Count multiple columns and print original file

Hello, I have two tab files with headers File1: with 4 columns header1 header2 header3 header4 44 a bb 1 57 c ab 4 64 d d 5 File2: with 26 columns header1.. header5 header6 header7 ... header 22...header26 id1 44 a bb id2 57 ... (6 Replies)
Discussion started by: nans
6 Replies
RS(1)							    BSD General Commands Manual 						     RS(1)

NAME
rs -- reshape a data array SYNOPSIS
rs [-[csCS][x] [kKgGw][N] tTeEnyjhHmz] [rows [cols]] DESCRIPTION
The rs utility reads the standard input, interpreting each line as a row of blank-separated entries in an array, transforms the array accord- ing to the options, and writes it on the standard output. With no arguments it transforms stream input into a columnar format convenient for terminal viewing. The shape of the input array is deduced from the number of lines and the number of columns on the first line. If that shape is inconvenient, a more useful one might be obtained by skipping some of the input with the -k option. Other options control interpretation of the input col- umns. The shape of the output array is influenced by the rows and cols specifications, which should be positive integers. If only one of them is a positive integer, rs computes a value for the other which will accommodate all of the data. When necessary, missing data are supplied in a manner specified by the options and surplus data are deleted. There are options to control presentation of the output columns, including transposition of the rows and columns. The following options are available: -cx Input columns are delimited by the single character x. A missing x is taken to be `^I'. -sx Like -c, but maximal strings of x are delimiters. -Cx Output columns are delimited by the single character x. A missing x is taken to be `^I'. -Sx Like -C, but padded strings of x are delimiters. -t Fill in the rows of the output array using the columns of the input array, that is, transpose the input while honoring any rows and cols specifications. -T Print the pure transpose of the input, ignoring any rows or cols specification. -kN Ignore the first N lines of input. -KN Like -k, but print the ignored lines. -gN The gutter width (inter-column space), normally 2, is taken to be N. -GN The gutter width has N percent of the maximum column width added to it. -e Consider each line of input as an array entry. -n On lines having fewer entries than the first line, use null entries to pad out the line. Normally, missing entries are taken from the next line of input. -y If there are too few entries to make up the output dimensions, pad the output by recycling the input from the beginning. Normally, the output is padded with blanks. -h Print the shape of the input array and do nothing else. The shape is just the number of lines and the number of entries on the first line. -H Like -h, but also print the length of each line. -j Right adjust entries within columns. -wN The width of the display, normally 80, is taken to be the positive integer N. -m Do not trim excess delimiters from the ends of the output array. -z Adapt column widths to fit the largest entries appearing in them. With no arguments, rs transposes its input, and assumes one array entry per input line unless the first non-ignored line is longer than the display width. Option letters which take numerical arguments interpret a missing number as zero unless otherwise indicated. EXAMPLES
The rs utility can be used as a filter to convert the stream output of certain programs (e.g., spell, du, file, look, nm, who, and wc(1)) into a convenient ``window'' format, as in % who | rs This function has been incorporated into the ls(1) program, though for most programs with similar output rs suffices. To convert stream input into vector output and back again, use % rs 1 0 | rs 0 1 A 10 by 10 array of random numbers from 1 to 100 and its transpose can be generated with % jot -r 100 | rs 10 10 | tee array | rs -T > tarray In the editor vi(1), a file consisting of a multi-line vector with 9 elements per line can undergo insertions and deletions, and then be neatly reshaped into 9 columns with :1,$!rs 0 9 Finally, to sort a database by the first line of each 4-line field, try % rs -eC 0 4 | sort | rs -c 0 1 SEE ALSO
jot(1), pr(1), sort(1), vi(1) BUGS
Handles only two dimensional arrays. The algorithm currently reads the whole file into memory, so files that do not fit in memory will not be reshaped. Fields cannot be defined yet on character positions. Re-ordering of columns is not yet possible. There are too many options. BSD
December 30, 1993 BSD
All times are GMT -4. The time now is 06:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy