Sponsored Content
Top Forums Shell Programming and Scripting Get group of consecutive uppercase words using gawk Post 302645337 by neutronscott on Wednesday 23rd of May 2012 08:55:58 AM
Old 05-23-2012
Quote:
Originally Posted by louisJ
But anchal_khare you are right, I have an uppercase at the begining of sentences, is there a way to exclude it in the netronscott's command ?
What do you mean? Mine would only match what is after 'The Thing'. In his, he checks the line for 'The Thing' but then starts at the beginning rather than the start of the occurance of 'The Thing' for capital words..


Quote:
Originally Posted by louisJ
And another thing, how to get all the matching expressions in the line, not only the first one?
This would require a loop. Continuing with the match method:

Code:
#!/usr/bin/awk -f
 
{
  offset=0
  while (match(substr($0,offset+1),/The Thing([[:space:]]*[[:upper:]][^[:space:]]*)*/))
  {
    print substr($0,RSTART+offset,RLENGTH)
    offset+=RSTART+RLENGTH
  }
}

tested with this input i made up:

Code:
[mute@geek ~/temp/louisJ]$ cat input
some would say The Thing He Wants and The Thing She Gives Him are not The Thing That Matters Most. :(
This Thing and The Thing like Another Thing
I HAVE NOT The Thing TO DO WITH IT! The Thing Is Not Here it is there
 
[mute@geek ~/temp/louisJ]$ ./script input
The Thing He Wants
The Thing She Gives Him
The Thing That Matters Most.
The Thing
The Thing TO DO WITH IT! The Thing Is Not Here

if you want 'The Thing' to also allow matches like 'THE THING' you can group the successive letters like so: T[Hh][Ee] T[Hh][Ii][Nn][Gg]

another thing (heh): In the last example, the longest match is taken, so you see the 2nd 'The Thing' stays a part of one match. Do you want it split at the second occurance of 'The Thing' within another 'The Thing' ?

Last edited by neutronscott; 05-23-2012 at 10:01 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing uppercase words from textfiles

I have the task of removing all uppercase words from csv files, mit 10000's lines. I think it shoud be possible with regex's, something like "s/{2,}//g" but I can't get it work with sed or Vi. It would also be possible to script in ksh, awk, perl or python. example this "this is a EXAMPLE... (5 Replies)
Discussion started by: frieling
5 Replies

2. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

3. Shell Programming and Scripting

finding and removing 2 identical consecutive words in a text

i want to write a shell script that correct a text file.for example if i have the input file: "john has has 2 apples anne has 3 oranges oranges" i want that the output file be like this: "john has 2 apples anne has 3 oranges" i've tried to read line by line from input text file into array... (11 Replies)
Discussion started by: cocostaec
11 Replies

4. Shell Programming and Scripting

Finding consecutive same words in a file

Hi All, I tried this but I am having trouble formulating this: I have a file that looks like this (this is a sample file words can be different): network router frame network router computer card host computer card One can see that in this file "network" and "router" occur... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

5. Shell Programming and Scripting

matching group of words

Hi, I am stuck with a problem, will be thankful for your guidance and help. I have two files. Each line is a group of words with first word as group Id. eg. 'gp1' in File1 and 'grp1' in File2. <File1> gp1 : xyz xys3 syt2 ssx itt kty gp2 : syt2 kgk iti op2 gp3 : ppy yt5 itt sky... (11 Replies)
Discussion started by: mira
11 Replies

6. Shell Programming and Scripting

How to move a group of words before another group of words

Hi I have a file containing lines with several consecutive words starting with a capital letter (i.e. Zuvaia Flex), followed by "de The New Foul", and I would like to put "The New Foul" before the group with capital letters and delete "de" From the line: Le short femme Zuvaia Flex de The... (2 Replies)
Discussion started by: louisJ
2 Replies

7. Shell Programming and Scripting

Match groups of capital words using gawk

Hi I'd like to extract from a text file, using gawk, the groups of words beginning with a capital letter, that are not at the begining of a sentence (i.e. Not after a full stop and a pace ". "), including special characters like registered or trademark (® or ™ ). For example I would like to... (1 Reply)
Discussion started by: louisJ
1 Replies

8. Shell Programming and Scripting

Gawk gensub, match capital words and lowercase words

Hi I have strings like these : Vengeance mitt Men Vengeance gloves Women Quatro Windstopper Etip gloves Quatro Windstopper Etip gloves Girls Thermobite hooded jacket Thermobite Triclimate snow jacket Boys Thermobite Triclimate snow jacket and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies

9. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies

10. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies
LIBXO(3)						   BSD Library Functions Manual 						  LIBXO(3)

NAME
xo_emit -- emit formatted output based on format string and arguments LIBRARY
library ``libxo'' SYNOPSIS
#include <libxo/xo.h> LIBXO(3) BSD Library Functions Manual LIBXO(3) NAME
xo_open_container xo_open_container_h xo_open_container_hd xo_open_container_d xo_close_container xo_close_container_h xo_close_container_hd xo_close_container_d -- open and close containers LIBRARY
library ``libxo'' SYNOPSIS
int xo_open_container(const char *name); int xo_open_container_h(xo_handle_t *handle, const char *name); int xo_open_container_hd(xo_handle_t *handle, const char *name); int xo_open_container_d(const char *name); int xo_close_container(const char *name); int xo_close_container_h(xo_handle_t *handle, const char *name); int xo_close_container_hd(xo_handle_t *handle); int xo_close_container_d(void); DESCRIPTION
libxo represents to types of hierarchy: ``containers'' and ``lists''. A container appears once under a given parent where a list contains instances that can appear multiple times. A container is used to hold related fields and to give the data organization and scope. The con- tainer has no value, but serves to contain other nodes. To open a container, call xo_open_container() or xo_open_container_h(). The former uses the default handle and the latter accepts a specific handle. To close a level, use the xo_close_container() or xo_close_container_h() functions. Each open call must have a matching close call. If the XOF_WARN flag is set and the name given does not match the name of the currently open container, a warning will be generated. Example: xo_open_container("top"); xo_open_container("system"); xo_emit("{:host-name/%s%s%s", hostname, domainname ? "." : "", domainname ?: ""); xo_close_container("system"); xo_close_container("top"); Sample Output: Text: my-host.example.org XML: <top> <system> <host-name>my-host.example.org</host-name> </system> </top> JSON: "top" : { "system" : { "host-name": "my-host.example.org" } } HTML: <div class="data" data-tag="host-name">my-host.example.org</div> EMITTING HIERARCHY
To create a container, use the xo_open_container() and xo_close_container() set of functions. The handle parameter contains a handle such as returned by xo_create(3) or NULL to use the default handle. The name parameter gives the name of the container, encoded in UTF-8. Since ASCII is a proper subset of UTF-8, traditional C strings can be used directly. The close functions with the ``_d'' suffix are used in ``Do The Right Thing'' mode, where the name of the open containers, lists, and instances are maintained internally by libxo to allow the caller to avoid keeping track of the open container name. Use the XOF_WARN flag to generate a warning if the name given on the close does not match the current open container. For TEXT and HTML output, containers are not rendered into output text, though for HTML they are used when the XOF_XPATH flag is set. EXAMPLE: xo_open_container("system"); xo_emit("The host name is {:host-name}0, hn); xo_close_container("system"); XML: <system><host-name>foo</host-name></system> DTRT MODE
Some users may find tracking the names of open containers, lists, and instances inconvenient. libxo offers a ``Do The Right Thing'' mode, where libxo will track the names of open containers, lists, and instances so the close function can be called without a name. To enable DTRT mode, turn on the XOF_DTRT flag prior to making any other libxo output. xo_set_flags(NULL, XOF_DTRT); Each open and close function has a version with the suffix ``_d'', which will close the open container, list, or instance: xo_open_container("top"); ... xo_close_container_d(); Note that the XOF_WARN flag will also cause libxo to track open containers, lists, and instances. A warning is generated when the name given to the close function and the name recorded do not match. ADDITIONAL DOCUMENTATION
Complete documentation can be found on github: http://juniper.github.io/libxo/libxo-manual.html libxo lives on github as: https://github.com/Juniper/libxo The latest release of libxo is available at: https://github.com/Juniper/libxo/releases SEE ALSO
xo_emit(3) HISTORY
The libxo library was added in FreeBSD 11.0. AUTHOR
Phil Shafer BSD
December 4, 2014 BSD
All times are GMT -4. The time now is 10:23 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy