awk and regex of wikisource data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk and regex of wikisource data
# 8  
Old 06-21-2015
Everything is working now as mentioned earlier. If you are interested in the full application it is here:

https://en.wikipedia.org/wiki/User:G...cebandwref.awk
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex within IF statement in awk

Hello to all, I have: X="string 1-" Y="-string 2" Z="string 1-20-string 2"In the position of the number 20 could be different numbers, but I'm interest only when the number is 15, 20,45 or 70. I want to include an IF within an awk code with a regex in the following way. ... (12 Replies)
Discussion started by: Ophiuchus
12 Replies

2. Shell Programming and Scripting

wildcard in regex for awk

Hello I have a file like : 20120918000001413 | 1.17.163.89 | iSelfcare | MSISDN | N 20120918000001806 | 1.33.27.100 | iSelfcare | 5564 | N .... I want to extract all lines that have on 4th field (considering "|" the separator ) something other than just digits. I want to do this using a... (5 Replies)
Discussion started by: black_fender
5 Replies

3. Shell Programming and Scripting

Regex to Parse data

Experts and Informed folks, Need some help here in parsing the log file. 1389675 Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2) VALUES (1, 'TEST1'), (2,'TEST2'); 1389685 Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3) VALUES (9,'Siz12, FormFit',... (12 Replies)
Discussion started by: ManoharMa
12 Replies

4. Shell Programming and Scripting

RegeX to parse data from a txt file

Hi all the experts out there, I am totally new to perl and I was given an assignment by using Perl to find the 2nd element of each line in each curly bracket which made up of 5 elements. Expected result should like this: Type: VCC Pin_name: AK32,AL32,AH21,..... Type: NC Pin_name:... (2 Replies)
Discussion started by: killbanne
2 Replies

5. Shell Programming and Scripting

awk equivalent of regex

Hi all, Can someone tell me what's the (g)awk equal of this simple regex to find ip addresses in urls: egrep "^http://{1,3}\.{1,3}\.{1,3}\.{1,3}(:{1,5})?/"Input: http://10.0.0.1/query.exe http://11y10x09w:80/howaboutme http://192.168.100.190:1234/takeme.gpg Output:... (8 Replies)
Discussion started by: r4v3n
8 Replies

6. UNIX for Dummies Questions & Answers

Using AWK and regex

Hi can you suggest in this regard The sample.txt conatins the data name lines type sam 12 txt sam 24 xls sam 36 pdf ram 32 txt ram 45 sxls ram 58 word sam 92 jpeg sam 21 gif sam 22 ltf from the data i need to sum all line... (5 Replies)
Discussion started by: krashraj
5 Replies

7. Shell Programming and Scripting

awk regex problem

hi everyone suppose my input file is ABC-12345 ABCD-12345 BCD-123456 i want to search the specific pattern which looks like - in a file so i used this command cat $file | awk ' { if ($0 ~ /-/) { print } }' so it gives me the result as ABCD-12345 BCD-12345 BCD-12345 ... (31 Replies)
Discussion started by: aishsimplesweet
31 Replies

8. Shell Programming and Scripting

sed to awk (regex pattern) how?

Hello, I am trying to covert a for statement into a single awk script and I've got everything but one part. I also need to execute an external script when "not found", how can I do that ? for TXT in `find debugme -name "*.txt"` ;do FPATH=`echo $TXT | sed 's/\(.*\)\/\(.*\)/\1/'` how... (7 Replies)
Discussion started by: TehOne
7 Replies

9. Shell Programming and Scripting

Extracting a regex with awk

I have a regexp that I wish to match against every line of a file using awk. But I do not want to substitute it or select the line. I want to pull the matched text out and put it in a different file, line by line. What is the correct awk usage to *extract* a regexp and put it in another... (11 Replies)
Discussion started by: Enobarbus37
11 Replies

10. Shell Programming and Scripting

awk or regex

Hi! I want to made a program that will generate code like this: {{Navedi XYZ |avtor=XYZ1 |naslov=XYZ2 |leto_izzida=XYZ3 |zalozba=XYZ4 |kraj=XYZ5 |isbn=XYZ6 |cobiss_id=XYZ7 }} from input like this: <b> ODGOVORNOST............. : <a... (5 Replies)
Discussion started by: smihael
5 Replies
Login or Register to Ask a Question
WIKIPEDIA2TEXT(1)					      General Commands Manual						 WIKIPEDIA2TEXT(1)

NAME
wikipedia2text -- displays Wikipedia entries on the command line SYNOPSIS
wikipedia2text [-BCnNoOpPsSuU] [-b prog] [ {-c | -i | -I } patt] [-l lang] [-W base-url] Query wikipedia2text -o [-b prog] [-l lang] Query wikipedia2text [-h] wikipedia2text -v wikipedia2text -r DESCRIPTION
This manual page documents briefly the wikipedia2text command. wikipedia2text fetches and renders Wikipedia articles using a text-mode web-browser (currently recognises elinks, links2, links, lynx and w3m) and display the text of the article on STDOUT respectively in a pager. OPTIONS
The program recognizes the following command line options: -b prog Use program prog as browser. -B Do not use browser configured via configuration file or environment. -c patt, -I patt Colorize case-sensitive pattern patt in output. -C, -N Colorize output. -h Show help and a summary of options. -i patt Colorize case-insensitive pattern patt in output. -l lang use Wikipedia in language lang. See the Wikipedia Languages entry elsewhere in this document. -n Do not colorize output. -o Open the Wikipedia page in the browser. -O Do not open the Wikipedia page in the browser. -p Use a pager (set by default). -P Don't use a pager. -r Display a random Wikipedia article. -s Display only the summary of the Wikipedia article. -S Display the full content of the Wikipedia article and not only the summary. -u Just print the URL of the Wikipedia page and exit. -U Display the full content of the Wikipedia article and not only print the URL of the page. -v Show version number. -W base-url use base-url as base URL for wikipedia (e.g. use a different wiki), querying this URL will happen by appending the search term. ENVIRONMENT
The following environment variables are recognized: ABROWSER Browser to use as default instead of to found or configured web browser. IGNCASE Default value for case-sensitivity of colorizing the output. Can be set to "true" or "false". LOCAL Default value for language in which Wikipedia should be used. See the Wikipedia Languages entry elsewhere in this document. OUTPUTURL Determines, if wikipedia2text should display only the URL of the Wikipedia article by default. Can be set to "true" or "false". PAGER Default value for pager to use. Can also be set to "true", in which case, wikipedia2text tries to figure out the appropriate pager, or "false", which means not to use a pager at all. SHORT Determines, if wikipedia2text should display only the summary of the Wikipedia article by default. Can be set to "true" or "false". USEBROWSER Determines, if wikipedia2text should open the Wikipedia page via openurl in the globally set default browser, i.e. Firefox or Konqueror, by default. Can be set to "true" or "false". FILES
$HOME/.wikipedia2textrc Will be sourced from wikipedia2text on startup. Should contain variable assignments. The same variables as for the environment are recognised. WIKIPEDIA LANGUAGES
wikipedia2text currently supports the following Wikipedia languages: af Cape Dutch (Afrikaans) als Alemannic ca Catalan cs Czech da Danish de German en English eo Esperanto es Spanish fi Finnish fr French hu Hungarian ia Interlingua is Islandic it Italian la Latin lb Luxembourgian nds Low German nl Dutch nn, no Norwegian (Nynorsk and Bokmal) pl Polish pt Portuguese rm Rhaeto-Romanic ro Romanian simple Simple English sk Slovak sl Slovenian sv Swedish tr Turkish SEE ALSO
elinks(1), links(1), links2(1), lynx(1), w3m(1) AUTHOR
wikipedia2text was written by Christian Brabandt <cb@256bit.org>. Patches also from Axel Beckert <abe@deuxchevaux.org>. This manual page was written by Axel Beckert <abe@deuxchevaux.org> for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License (GPL), Version 2 any later ver- sion published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. HISTORY
wikipedia2text was first released by Christian Brabandt on in his blog (link to URL http://blog.256bit.org/archives/126-Wikipedia-in-der- Shell.html) with as script named wiki. Some users may find it useful to create an alias with that name for speeding up the typing of a wikipedia2text command if no other command of that name is present. OTHER INFO
The current version of wikipedia2text should be available at on Christian Brabandt's website (link to URL http://www.256bit.org/~chris- bra/wiki) . WIKIPEDIA2TEXT(1)