Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Replacing stopwords based on a list Post 302939652 by drl on Friday 27th of March 2015 11:30:02 AM
Old 03-27-2015
Hi.

Here is an alternate solution with perl. The stop words are in a list, one pair to a line, separated by TAB. There is a "general string" which is supplied if no replacement string is specified. This driver script lists the data file, the replacements file, the perl code, and the results:
Code:
#!/usr/bin/env bash

# @(#) s2	Demonstrate easy replacement with perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C perl

FILE=${1-data1}

pl " perl code:"
cat p2

pl " Input data file $FILE:"
cat $FILE

pl " Replacement pairs, old TAB new, unpaired get \"general string\":"
cat replacements.txt

pl " Results:"
./p2 $FILE

exit 0

producing:
Code:
$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
perl 5.10.0

-----
 perl code:
#!/usr/bin/env perl

# @(#) p2	Demonstrate replacement from hashed pairs.

use strict;
use warnings;
use Carp;

my ($f);
my $rf = "replacements.txt";
my $gs;    # general string, used if no replacement found.
$gs = "...";

# Read pairs into hash.
my (%hp);
open( $f, "<", $rf ) || croak "cannot open pairs file \"$rf\"";
my ( $t1, $t2 );
while (<$f>) {
  chomp;
  ( $t1, $t2 ) = split( /\s+/, $_, 2 );
  $hp{$t1} = defined($t2) ? $t2 : $gs;
}
close $f;

# Print hash if debugging needed.
# foreach $t1 ( keys %hp ) { print "$t1 => $hp{$t1}\n" };

# Read and replace.
while (<>) {
  foreach $t1 ( keys %hp ) {
    s/$t1/$hp{$t1}/gi;
  }
  print;
}

-----
 Input data file data1:
Now is the time
for all good men
to come to the aid
of their country.
aa aa cc bb cc bb aa AA Aa aA aa

-----
 Replacement pairs, old TAB new, unpaired get "general string":
Now	Then
is	was
good	manly
their	< yours, mine, ours >
aid	bandage
to
aa	xx
bb	yy
cc	zz

-----
 Results:
Then was the time
for all manly men
... come ... the bandage
of < yours, mine, ours > country.
xx xx zz yy zz yy xx xx xx xx xx

If one is interested, there is a perl module https://metacpan.org/release/String-Replace which may be a bit faster (claimed in the object-oriented mode), but I do not think that it allows case insensitivity.

Best wishes ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing text based on replacement tables

Dear all, will be grateful for your advices.. The need is (i guess) simple for UNIX experts. Basically, there are replacement tables, which would be used to replace text strings in the data (large volumes..). An exmpl table (a "config file"): VIFIS1_1_PE1836 VIBRIO_FISCHERI VIPAR1_1_PE1662 ... (7 Replies)
Discussion started by: roussine
7 Replies

2. Shell Programming and Scripting

Replacing Character in a file based on element

Hi, I have file like below. Unix:/pclls/turc>cat tibc.property executeReceiver=Y executeSender=Y I want to replace executeSender=N in the file. My file should be like below. executeReceiver=Y executeSender=N I tried with the below command, its giving error. cat tibc.property |... (2 Replies)
Discussion started by: senthil_is
2 Replies

3. UNIX for Dummies Questions & Answers

Script for replacing text in a file based on list

Hi All, I am fairly new to the world of Unix, and I am looking for a way to replace a line of text in a file with a delimited array of values. I have an aliases file that is currently in use on our mail server that we are migrating off of. Until the migration is complete, the server must stay... (8 Replies)
Discussion started by: phoenixjc
8 Replies

4. Shell Programming and Scripting

Replacing headers based on a second file

I have a file with thousands of sequences that looks like this: I need to replace the headers using a second file Thus, I will end up having the following file: I am looking for an AWK script that I can easily plug in my current pipeline. Any help will be greatly appreciated! (6 Replies)
Discussion started by: Xterra
6 Replies

5. Shell Programming and Scripting

Replacing the text in a row based on certain condition

Hi All, I felt tough to frame my question. Any way find my below input. (.CSV file) SNo, City 1, Chennai 2, None 3, Delhi 4,None Note that I have many rows ans also other columns beside my City column. What I need is the below output. SNo, City 1, Chennai 2, Chennai_new 3, Delhi... (2 Replies)
Discussion started by: ks_reddy
2 Replies

6. Shell Programming and Scripting

Finding/replacing strings in some files based on a file

Hi, We have a file (e.g. a .csv file, but could be any other format), with 2 columns: the old value and the new value. We need to modify all the files within the current directory (including subdirectories), so find and replace the contents found in the first column within the file, with the... (9 Replies)
Discussion started by: Talkabout
9 Replies

7. Shell Programming and Scripting

Help with awk replacing identical columns based on another file

Hello, I am using Awk in UBUNTU 12.04. I have a file like following with three fields and 44706 rows. F1 A A F2 G G F3 A T I have another file like this: AL_1 F1 A A AL_2 F1 A T AL_3 F1 A A AL_1 F2 G G AL_2 F2 G A AL_3 F2 G G BO_1 F1 A A BO_2 F1 A T... (6 Replies)
Discussion started by: Homa
6 Replies

8. Shell Programming and Scripting

Replacing a character with a number based on lines

Hi, I am in need of help for the two things which is to be done. First, I have a file that has around four columns. The first column is filled with letter "A". There are around 400 lines in the files as shown below. A 1 5.2 3.2 A 2 0.2 4.5 A 1 2.2 2.2 A 5 2.1 ... (2 Replies)
Discussion started by: begin_shell
2 Replies

9. UNIX for Advanced & Expert Users

Replacing string length based on pattern

Hi All, I have a file which is like below. I need to read all the patterns that starts with P and then replace the 9 digit values to 8 digit values (remove leading integer). Can you please help Example : ( Please look below File) File : P,1 M1,... (7 Replies)
Discussion started by: arunkumar_mca
7 Replies

10. UNIX for Beginners Questions & Answers

Replacing tag based on condition

Hi All, I am having a file like below. The file will having information about the records.If you see the file the file is header and data. For example it have 1 men tag and the tag id will be come after headers. The change is I want to convert All pets tag from P to X. I did a sed like below... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies
env(1)								   User Commands							    env(1)

NAME
env - set environment for command invocation SYNOPSIS
/usr/bin/env [-i | -] [name=value]... [utility [arg... ]] /usr/xpg4/bin/env [-i | -] [name=value]... [utility [arg... ]] DESCRIPTION
The env utility obtains the current environment, modifies it according to its arguments, then invokes the utility named by the utility op- erand with the modified environment. Optional arguments are passed to utility. If no utility operand is specified, the resulting environment is written to the standard output, with one name=value pair per line. /usr/bin If env executes commands with arguments, it uses the default shell /usr/bin/sh (see sh(1)). /usr/xpg4/bin If env executes commands with arguments, it uses /usr/xpg4/bin/sh (see ksh(1)). OPTIONS
The following options are supported: -i | - Ignores the environment that would otherwise be inherited from the current shell. Restricts the environment for utility to that specified by the arguments. OPERANDS
The following operands are supported: name=value Arguments of the form name=value modify the execution environment, and are placed into the inherited environment before utility is invoked. utility The name of the utility to be invoked. If utility names any of the special shell built-in utilities, the results are unde- fined. arg A string to pass as an argument for the invoked utility. EXAMPLES
Example 1 Invoking utilities with new PATH values The following utility: example% env -i PATH=/mybin mygrep xyz myfile invokes the utility mygrep with a new PATH value as the only entry in its environment. In this case, PATH is used to locate mygrep, which then must reside in /mybin. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of env: LANG, LC_ALL, LC_CTYPE, LC_MES- SAGES, and NLSPATH. PATH Determine the location of the utility. If PATH is specified as a name=value operand to env, the value given shall be used in the search for utility. EXIT STATUS
If utility is invoked, the exit status of env is the exit status of utility. Otherwise, the env utility returns one of the following exit values: 0 Successful completion. 1-125 An error occurred. 126 utility was found but could not be invoked. 127 utility could not be found. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: /usr/bin +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ /usr/xpg4/bin +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWxcu4 | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
ksh(1), sh(1), exec(2), profile(4), attributes(5), environ(5), standards(5) SunOS 5.11 2 Jan 2002 env(1)
All times are GMT -4. The time now is 03:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy