Sponsored Content
Full Discussion: Command for non-unique text
Top Forums Shell Programming and Scripting Command for non-unique text Post 302906301 by Corona688 on Wednesday 18th of June 2014 03:11:19 PM
Old 06-18-2014
I'm afraid it's not a one-liner anymore but it is the shortest even marginally-compliant parser I've written:

Code:
$ cat uniqxml.awk

BEGIN {
        FS=">"
        RS="<"
        OFS="\t"
}

NR==1 { next } # The first "line" is blank when RS=<
/^[!?]/ {       next    }               # Skip XML specification junk
{       gsub(/[\r\n]*$/, " ");  }       # Clean up newlines

# Handle open-tags
match($0, /^[^\/ \r\n\t]+/) {
        TAG=substr(toupper($0), RSTART, RLENGTH);
        TAGS=TAG "%" TAGS;
}

# Handle close-tags
/^[\/]/ {
        sub(/^\//, "", $1);
        sub("^.*" toupper($1) "%", "", TAGS);
        next;
}
TAGS ~ /^(TESTNAME|OFFERER|LINE1|CITY|STATE|STRING%METHODLIST%CATEGORY)%/ {
        print $1, $2
}

$ awk -f uniqxml.awk input.xml

TestName        UBE3A sequencing
Offerer Genetic Services Laboratory University of Chicago
Line1   5841 S. Maryland Ave. Rm G701, MC0077
City    Chicago
State   Illinois
string  Bi-directional Sanger Sequence Analysis

$

It processes tag-by-tag instead of line-by-line, and keeps a list of the tags its seen. "<html><body><h1>" would put "H1%BODY%HTML" in TAGS, for example. Then you can check what tags you're inside, and print accordingly.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to read all the unique words in a text file

How can i read all the unique words in a file, i used - cat comment_file.txt | /usr/xpg6/bin/tr -sc 'A-Za-z' '/012' and cat comment_file.txt | /usr/xpg6/bin/tr -sdc 'A-Za-z' '/012' but they didnt worked..... (5 Replies)
Discussion started by: aditya.ece1985
5 Replies

2. Shell Programming and Scripting

extracting unique lines from text file

I have a file with 14million lines and I would like to extract all the unique lines from the file into another text file. For example: Contents of file1 happy sad smile happy funny sad I want to run a command against file one that only returns the unique lines (ie 1 line for happy... (3 Replies)
Discussion started by: soliberus
3 Replies

3. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

4. UNIX for Dummies Questions & Answers

Copying Text between two unique text patterns

Dear Colleagues: I have .rtf files of a collection of newspaper articles. Each newspaper article starts with a variation of the phrase "Document * of 20" and is separated from the next article with the character string "===================" I would like to be able to take the text composing... (3 Replies)
Discussion started by: spindoctor
3 Replies

5. Shell Programming and Scripting

Extracting Text Between Two Unique Lines

Hi all! Im trying to extract a portion of text from a file and put it into a new file. I need all the lines between <Placement> and </Placement> including the Placemark lines themselves. Is there a way to extract all instances of these and not just the first one found? I've tried using sed and... (4 Replies)
Discussion started by: Grizzly
4 Replies

6. Shell Programming and Scripting

Extracting several lines of text after a unique string

I'm attempting to write a script to identify users who have sudo access on a server. I only want to extract the ID's of the sudo users after a unique line of text. The list of sudo users goes to the EOF so I only need the script to start after the unique line of text. I already have a script to... (1 Reply)
Discussion started by: bouncer
1 Replies

7. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

8. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

9. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

10. Programming

find & Replace text using two non-unique delimiters.

I can find and replace text when the delimiters are unique. What I cannot do is replace text using two NON-unique delimiters: Ex., "This html code <text blah >contains <garbage blah blah >. All tags must go,<text > but some must be replaced with <garbage blah blah > without erasing other... (5 Replies)
Discussion started by: bedtime
5 Replies
xmlparsing(3)							       Coin							     xmlparsing(3)

NAME
xmlparsing - XML Parsing with Coin For Coin 3.0, we added an XML parser to Coin. This document describes how it can be used for generic purposes. Why another XML parser, you might ask? First of all, the XML parser is actually a third-party parser, expat. Coin needed one, and many Coin-dependent projects needed one as well. We therefore needed to expose an API for it. However, integrating a 3rd-party parser into Coin, we can not expose its API directly, or other projects also using Expat would get conflicts. We therefore needed to expose the XML API with a unique API, hence the API you see here. It is based on a XML DOM API we use(d) in a couple of other projects, but it has been tweaked to fit into Coin and to be wrapped over Expat (the original implementation just used flex). The XML parser is both a streaming parser and a DOM parser. Being a streaming parser means that documents can be read in without having to be fully contained in memory. When used as a DOM parser, the whole document is fully parsed in first, and then inspected by client code by traversing the DOM. The two modes can actually be mixed arbitrarily if ending up with a partial DOM sounds useful. The XML parser has both a C API and a C++ API. The C++ API is just a wrapper around the C API, and only serves as convenience if you prefer to read/write C++ code (which is tighter) over more verbose C code. The C API naming convention may look a bit strange, unless you have written libraries to be wrapped for scheme/lisp-like languages before. Then you might be familiar with the convention of suffixing your functions based on their behaviour/usage meaning. Mutating functions are suffixed with '!', or '_x' for (eXclamation point), and predicates are suffixed with '?', or '_p' in C. The simplest way to use the XML parser is to just call cc_xml_read_file(filename) and then traverse the DOM model through using cc_xml_doc_get_root(), cc_xml_elt_get_child(), and cc_xml_elt_get_attr(). See also: XML related functions and objects, cc_xml_doc, cc_xml_elt, cc_xml_attr Version 3.1.3 Wed May 23 2012 xmlparsing(3)
All times are GMT -4. The time now is 08:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy