Sponsored Content
Top Forums Programming find & Replace text using two non-unique delimiters. Post 303013954 by bedtime on Friday 2nd of March 2018 12:11:18 PM
Old 03-02-2018
Quote:
Originally Posted by Don Cragun
We assume that you know that exactly the same thing works in awk:
Yes, I found that out. How satisfying it is to just drag and drop the regex parameters into the C++ code and have them work! Smilie

Quote:
Good you had another improvement of your code. Applying what you learned in some of your other threads (gsub (tagIn "[^" tagOut "]*" tagOut, ""), post7, post2), you'd get what you request in post#1, setting tagin first to <garbage, then to just <.
The code has been updated, but please do not feel obligated to respond, though I do very much appreciate and welcome the advice of all of you! There is no urgent desire to fix anything; I'm just 'putting it out there.' Smilie

If anyone would like to peruse and comment, they are welcome to:
Code:
// This program parses an XML dictionary file and prints a formatted result.
//
// NOTE: The required XML dictionary (16mb) will be downloaded to this
//       machine if it is not found! It will be stored in: ~/.config/latin/
//
// The goals of this project:
//
//	1. < 100 lines code
//	2. Simple & elegant coding
//	3. Fast & efficient execution.
//
//		"Do one thing,
//		 and do it well."
//
//		—Linux Credo
//
// Compile with:
// $ g++ -O -Wall lat.cpp -o lat
//
// Run with:
// $ lat amo sum totus
//
// Where 'amo', 'sum', and 'totus' are the words to be searched
//
// Gather online possibilities and pipe output into 'less'
// ('latc' script required for this functionality!!!):
//
// $ lat $(latc quam totus amor)
//
// Where 'quam', 'totus', and 'amor' are your search terms
//
// For testing. Completely clear terminal to not confuse with other text.
// $ reset; g++ -O -Wall lat.cpp -o lat; sleep 2; lat amo sum totus | less
//

#include<iostream>
#include<string>
#include<regex>
#include<fstream>
#include<unistd.h>
#include<sys/types.h>
#include<pwd.h>

using namespace std;

int main(int argc, char* argv[])
{
	// No search term entered. Bye!
	if (!argv[1]) return 1;

	std::string line;					// Used for file input
	std::string charToStr(argv[1]);				// Cannot use char with strings
	std::string keyStart	("key=\"" + charToStr + "\"");	// Key tags which word in XML file is surrounded
	std::string keyEnd	("</entry>");
	std::string text;
        struct passwd *pw = getpwuid(getuid());                 // Set up to get ~/
	std::string homeDir = pw->pw_dir;
	std::string XMLfile	(homeDir + "/.config/latin/Perseus_text_1999.04.0060.xml");
	std::string XMLfileDlURL="http://www.perseus.tufts.edu/hopper/dltext?doc=Perseus:text:1999.04.0060";

	//ifstream myFileTest (XMLfile);
	ifstream myFile(XMLfile);

	// Download dictionary if not found
	if (myFile.fail())
	{

		std::cout << "\nNote: The XML dictionary file " << XMLfile << " has not been found.\n\nDownloading and preparing XML file...\n\n";

                string dlCmd=("mkdir -p " + homeDir  + "/.config/latin/ && cd " + homeDir + "/.config/latin/ && wget -O- " +  XMLfileDlURL  +  " | tr -d '\\r' > " +  XMLfile);

		// system() won't accept a string
                const char * sysCharCmd = dlCmd.c_str();

		system(sysCharCmd);

		// Check again to see if the file was created and can be found
		myFile.clear();

		if (myFile.fail())
		{
			std::cout << "Could not download or find file!\n\nExiting...\n\n";
			return 2;
		}else{
			std::cout << "Finished downloading!\n\nRestart program to use new dictionary.\n\n";
			return 0;
		}
	}


	// Go through all given keys from command line parameters
	for(int keyNum = 1; keyNum < argc; keyNum++ )
	{
		charToStr=argv[keyNum];				// Make compatible with int
		keyStart="key=\"" + charToStr + "\"";
		text="";					// Do not append text

		myFile.clear();					// Go to beginning of file
		myFile.seekg(0, ios::beg);

		// Find search key and save result in 'text' string
		while (getline (myFile,line) && text == "")
			if (line.find(keyStart) != std::string::npos)	// We found a key!
				do					// Grab keys text
					text += line;
				while (getline (myFile,line) && line.find(keyEnd) == std::string::npos);

		// Don't waste time—go to next iteration!
		if (text == "")
		{
			std::cout << "Search key '" << charToStr << "' not found.\n" << endl;
			continue;
		}

		/* User may want to define an entire paragrapth of words
		   at one time, so do string modification right after
		   each key to allow first results to be shown instantly. */

		// Replace regex pattern in slot #1 with the text in slot #2.
		std::string tReplace[] = {"<orth>", "[", "</orth>", ",", "</gen>", ".", "<sense id.*><etym lang=\"la\" opt=\"n\">", "[", "<etym lang=\"la\" opt=\"n\">", "[", "</etym>, <trans opt=\"n\">|</etym>\\.—", "]\n\n • ", "(</etym>\\. —</sense>|</etym>\\.)", "]", "</etym>\\. </sense>", "", "(\\.|</usg>) ?— ?</sense>", ".", "<sense[^>]*>", "\n\n", "<[^>]*>", "", " — ", "\n\n • ", "\\. ?+—", ".\n\n • ", " +", " ", ". ?—", "\n\n", " ,", ",", " \\.", ".", " :", ":", "‘ ", "‘", " ’", "’", "^ ", "", "\\( ", "\\(", " \\)", "\\)" };

		// Now manipulate that text string and make it pretty.
		signed int repSize = (sizeof(tReplace) / sizeof(tReplace[0]));
		for (signed int i = 0; i < repSize; i += 2)
		{
			regex reg(tReplace[i]);
			text = regex_replace(text, reg, tReplace[i + 1]);
		}

		// Give lots of space to easily distinguish between definitions
		std::cout << text << "\n\n\n";

	}

	myFile.close();

	return 0;

}


Last edited by bedtime; 03-02-2018 at 07:13 PM.. Reason: WOOT! All bugs fixed.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find & Replace

I get a text file with 70+ columns (seperated by Tab) and about 10000 rows. The 58th Column is all numbers. But sometimes 58th columns has "/xxx=##" after the numeric data. I want to truncate this string using the script. Any Ideas...:confused: (3 Replies)
Discussion started by: gagansharma
3 Replies

2. Shell Programming and Scripting

find & incremental replace?

Looking for a way using sed/awk/perl to replace port numbers in a file with an incrementing number. The original file looks like... Host cmg-iqdrw3p4 LocalForward *:9043 localhost:9043 Host cmg-iqdro3p3a LocalForward *:10000 localhost:10000 Host cmg-iqdro3p3b LocalForward... (2 Replies)
Discussion started by: treadwm
2 Replies

3. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies

4. Shell Programming and Scripting

Find & Replace string in multiple files & folders using perl

find . -type f -name "*.sql" -print|xargs perl -i -pe 's/pattern/replaced/g' this is simple logic to find and replace in multiple files & folders Hope this helps. Thanks Zaheer (0 Replies)
Discussion started by: Zaheer.mic
0 Replies

5. Homework & Coursework Questions

[Scripting]Find & replace using user input then replacing text after

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: (o) Checkout an auto part: should prompt the user for the name of the auto part and borrower's name: Name:... (2 Replies)
Discussion started by: SlapnutsGT
2 Replies

6. Shell Programming and Scripting

Find and add/replace text in text files

Hi. I would like to have experts help on below action. I have text files in which page nubmers exists in form like PAGE : 1 PAGE : 2 PAGE : 3 and so on there is other text too. I would like to know is it possible to check the last occurance of Page... (6 Replies)
Discussion started by: lodhi1978
6 Replies

7. Red Hat

copy & replace text

how can i copy a certain word from a text file then use this word to replace in another text file?? i tried to use something like: awk '{print "Hit the",$1,"with your",$2}' /aaa/qqqq.txt > uu.txt but i can't add an argument to point to the second file which i will replace in. please... (8 Replies)
Discussion started by: mos33
8 Replies

8. UNIX for Dummies Questions & Answers

Find & Replace

Hi I am looking to rename the contents of this dir, each one with a new timestamp, interval of a second for each so it the existing format is on lhs and what I want is to rename each of these to what is on rhs..hopefully it nake sense CDR.20060505.150006.gb CDR.20121211.191500.gb... (3 Replies)
Discussion started by: rob171171
3 Replies

9. Shell Programming and Scripting

Finding a text in files & replacing it with unique strings

Hallo Everyone. I have to admit I'm shell scripting illiterate . I need to find certain strings in several text files and replace each of the string by unique & corresponding text. I prepared a csv file with 3 columns: <filename>;<old_pattern>;<new_pattern> ... (5 Replies)
Discussion started by: gordom
5 Replies

10. Shell Programming and Scripting

Delete characters & find unique IP addresses with port

Hi, I have a file having following content. <sip:9376507346@97.208.31.7:51088 <sip:9907472291@97.208.31.7:51208 <sip:8103742422@97.208.31.7:51024 <sip:9579892841@97.208.31.7:51080 <sip:9370904222@97.208.31.7:51104 <sip:9327665215@97.208.31.7:51104 <sip:9098364262@97.208.31.7:51024... (2 Replies)
Discussion started by: SunilB2011
2 Replies
All times are GMT -4. The time now is 01:58 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy