Sponsored Content
Top Forums Shell Programming and Scripting Awk to Count Multiple patterns in a huge file Post 302652807 by new_item on Thursday 7th of June 2012 07:17:54 PM
Old 06-07-2012
Code:
[ok@x60 ~]$ cat xxx | awk '{a[$1]++} END{for (x in a) {print x "\t" a[x]}}'

|site2|MAP 4
|site2|LINK 2
|site1|LINK 2
|site1|MAP 6
|site1|MODAL 2
|site2|MODAL 2
[ok@x60 ~]$ cat xxx
|site1|MAP
|site1|MAP
|site1|MAP
|site1|MAP
|site1|MAP
|site2|MAP
|site1|MODAL
|site2|MAP
|site2|MODAL
|site2|LINK
|site1|LINK
|site1|MAP
|site2|MAP
|site1|MODAL
|site2|MAP
|site2|MODAL
|site2|LINK
|site1|LINK

Last edited by new_item; 06-07-2012 at 08:24 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

AWK: Multiple patterns per line

Hi there, We have been given a bit of coursework using awk on html pages. Without giving too much away and risking the wrath of the plagerism checks, I can say we need to deal with certain html elements. There may be several of these elements on one line. My question is, if there are more... (1 Reply)
Discussion started by: Plavixo
1 Replies

2. Shell Programming and Scripting

Count lines between two patterns inside a file

Hi, Im doing a script to find the number of lines included inside a file newly. These lines are in between #ifdef FLAG1 and #else or #endif or #else and #endif. I tried like this, awk '/#ifdef Flag1/,/#e/{print}' aa.c | wc -l awk '/#ifndef Flag1/,/#endif/{print}' aa.c | awk... (6 Replies)
Discussion started by: priyadarshini
6 Replies

3. Shell Programming and Scripting

count the number of occurring patterns in a file.

Hi, I have a file with a '|' pipe delimeter. I want to find number of counts for a particular pattern in particular field. Is it possible to do it in a single command? 1) want to find total number of "0" in field 4. 2) want to find total number of different records in field 4 ( similar to... (5 Replies)
Discussion started by: rudoraj
5 Replies

4. Shell Programming and Scripting

[Solved] HP-UX awk sub multiple patterns

Hi, I am using sub to remove blank spaces and one pattern(=>) from the input string. It works fine when I am using two sub functions for the same. However it is giving error while I am trying to remove both spaces and pattern using one single sub function. Working: $ echo " OK => " |awk... (2 Replies)
Discussion started by: sai_2507
2 Replies

5. Shell Programming and Scripting

Searching multiple patterns using awk

Hello, I have the following input file: qh1adm 20130710111201 : tp import all QH1 u6 -Dsourcesystems=BFI,EBJ qh1adm 20130711151154 : tp import all QH1 u6 -Dsourcesystems=BFI,EBJ qx1adm 20130711151154 : tp count QX1 u6 -Dsourcesystems=B17,E17,EE7 qh1adm 20130711151155 : tp import all... (7 Replies)
Discussion started by: kcboy
7 Replies

6. Shell Programming and Scripting

Grep from multiple patterns multiple file multiple output

Hi, I want to grep multiple patterns from multiple files and save to multiple outputs. As of now its outputting all to the same file when I use this command. Input : 108 files to check for 390 patterns to check for. output I need to 108 files with the searched patterns. Xargs -I {} grep... (3 Replies)
Discussion started by: Diya123
3 Replies

7. Shell Programming and Scripting

awk - fetch multiple data from huge dump

Hello Experts I have a requirement wherein I need to fetch multiple data from huge dump egrep -f Pattern.txt Dump.txt My pattern file has got like 300 entries and Dump file is like 8GB data. It taking eternity to complete on my machine. Is their a faster way to search pattern like using... (5 Replies)
Discussion started by: navkanwal
5 Replies

8. Shell Programming and Scripting

Multiple patterns for awk script

Hi, I'm getting stuck when supplying multiple patterns for the below code: awk -F, ' .. .. if ($0 ~ pattern) { .. .. } .. .. ' pattern='$ROW' input_file for the same code I'm trying to supply multiple patterns as given below: awk -F, ' .. .. if( ($0 ~ pattern) && ($0 ~... (6 Replies)
Discussion started by: penqueen
6 Replies

9. Shell Programming and Scripting

Check multiple patterns in awk

I need to check if 2 values exists in the file and if they are equal print 0. output.txt: ------------ 1 2 3 4 5 6 Inputs: a=1 b=2 My pattern matching code works but I am trying to set a counter if both the pattern matches which does not work.If the count > 0,then I want to... (3 Replies)
Discussion started by: kannan13
3 Replies

10. Shell Programming and Scripting

awk to print before and after lines then count of patterns

What i'm trying to do here is show X amount of lines before and after the string "serialNumber" is found. BEFORE=3 AFTER=2 gawk '{a=$0} {count=0} /serialNumber/ && /./ {for(i=NR-'"${BEFORE}"';i<=NR;i++){count++ ;print a}for(i=1;i<'"${AFTER}"';i++){getline; print ; count ++; print... (5 Replies)
Discussion started by: SkySmart
5 Replies
DJVUXML(1)							DjVuLibre XML Tools							DJVUXML(1)

NAME
djvutoxml, djvuxmlparser - DjVuLibre XML Tools. SYNOPSIS
djvutoxml [options] inputdjvufile [outputxmlfile] djvuxmlparser [inputxmlfile] DESCRIPTION
The DjVuLibre XML Tools provide for editing the metadata, hyperlinks and hidden text associated with DjVu files. Unlike djvused(1) the DjVuLibre XML Tools rely on the XML technology and can take advantage of XML editors and verifiers. DJVUTOXML
Program djvutoxml creates a XML file outputxmlfile containing a reference to the original DjVu document inputdjvufile as well as tags describing the metadata, hyperlinks, and hidden text associated with the DjVu file. The following options are supported: --page pagenum Select a page in a multi-page document. Without this option, djvutoxml outputs the XML corresponding to all pages of the document. --with-text Specifies the HIDDENTEXT element for each page should be included in the output. If specified without the --with-anno flag then the --without-anno is implied. If none of the --with-text, --without-text, --with-anno, or --without-anno, flags are specified, then the --with-text and --with-anno flags are implied. --without-text Specifies not to output the HIDDENTEXT element for each page. If specified without the --without-anno flag then the --with-anno flag is implied. --with-anno Specifies the area MAP element for each page should be included in the output. If specified without the --with-text flag then the --without-text flag is implied. --without-anno Specifies the area MAP element for each page should not be included in the output. If specified without the --without-text flag then the --with-text flag is implied. DJVUXMLPARSER
Files produced by djvutoxml can then be modified using either a text editor or a XML editor. Program djvuxmlparser parses the XML file inputxmlfile and modifies the metadata of the DjVu files referenced by the OBJECT elements. DJVUXML DOCUMENT TYPE DEFINITION
The document type definition file (DTD) /usr/share/djvu/pubtext/DjVuXML-s.dtd defines the input and output of the DjVu XML tools. The DjVuXML-s DTD is a simplification of the HTML DTD: http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html with a few new attributes added specific to DjVu. Each of the specified pages of a DjVu document are represented as OBJECT elements within the BODY element of the XML file. Each OBJECT element may contain multiple PARAM elements to specify attributes like page name, resolu- tion, and gamma factor. Each OBJECT element may also contain one HIDDENTTEXT element to specify the hidden text (usually generated with an OCR engine) within the DjVu page. In addition each OBJECT element may reference a single area MAP element which contains multiple AREA elements to represent all the hyperlink and highlight areas within the DjVu document. PARAM Elements Legal PARAM elements of a DjVu OBJECT include but are not limited to PAGE for specifying the page-name, GAMMA for specifying the gamma cor- rection factor (normally 2.2), and DPI for specifying the page resolution. HIDDENTEXT Elements The HIDDENTEXT elements consists of nested elements of PAGECOLUMNS, REGION, PARAGRAPH, LINE, and WORD. The most deeply nested element specified, should specify the bounding coordinates of the element in top-down orientation. The body of the most deeply nested element should contain the text. Most DjVu documents use either LINE or WORD as the lowest level element, but any element is legal as the lowest level element. A white space is always added between WORD elements and a line feed is always added between LINE elements. Since languages such as Japanese do not use spaces between words, it is quite common for Asian OCR engines to use WORD as characters instead. MAP Elements The body of the MAP elements consist of AREA elements. In addition to the attributes listed in http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA, the attributes bordertype, bordercolor, border, and highlight have been added to specify border type, border color, border width, and high- light colors respectively. Legal values for each of these attributes are listed in the DjVuXML-s DTD. In addition, the shape oval has been added to the legal list of shapes. An oval uses a rectangular bounding box. BUGS
Perhaps it would have been better to use CC2 style sheets with standard HTML elements instead of defining the HIDDENTEXT element. CREDITS
The DjVu XML tools and DTD were written by Bill C. Riemers <docbill@sourceforge.net> and Fred Crary. SEE ALSO
djvu(1), djvused(1), and utf8(7). DjVuLibre XML Tools 11/15/2002 DJVUXML(1)
All times are GMT -4. The time now is 12:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy