Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Problems with deleting punctuation and apostrophes Post 302656769 by A-V on Friday 15th of June 2012 10:54:55 AM
Old 06-15-2012
Data Problems with deleting punctuation and apostrophes

Dear All,

I have a file which I want to get the list of frequency of each word, ignoring list of stop words and now I have problems which punctuations and " 's ".

what I am doing is:

Code:
sed 's/[^a-zA-Z ]//g' file01.txt > file01-clear.txt
cat file01-clear.txt | tr "[:upper:]" "[:lower:]"| tr ' ' '\012' |sort |uniq -c |sort -n -r -k 1 > file01-FQ.txt
grep -v -F -f rejectfile.txt file01-FQ.txt > file01-results.txt

I have realized the sed comment is deleting some of my words and I dont know y.SmilieSmilieSmilieSmilie
like in my file I have 26 word general but in file01-clear I get only one.
because my file01-clear is wrong, I cant see whether the final cammand to delete stop words is right or wrong either Smilie
moreover, no matter what I did the file didnt delete the 's so I have to do it manually.

I dont really know what I am doing wrong

can you pleaseeeeeeeeeeee help me

A-V
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

whacky punctuation dealies

Say I have a command that looks like this: host=$(/usr/bin/host xxx.xxx.xxx.xxx) What is the significance of the $() I know what happens when I don't include them, and I know what happens when I do, but... Why doo it woik wit $() Sorry for the lame question :o (3 Replies)
Discussion started by: [MA]Flying_Meat
3 Replies

2. Shell Programming and Scripting

How to include a variable between apostrophes within a command

Hi. I'm trying to find some words within my directory and created a text file containing them which is read by my shell script: #!/bin/bash var=`cat words.txt` for i in $var; do echo $i find -type f -print0 | xargs -r0 grep -F '$i' done But it searches "$i" (dollar sign... (2 Replies)
Discussion started by: guarriman
2 Replies

3. What is on Your Mind?

Names/nicknames for certain punctuation

I know that forward slash and backslash are "whack" and "backwhack," and I know that a pound-sign or number sign is "crunch" and an exclamation point, "bang." What I would like to know is whether or not there's a popular nickname for the dollar sign. I call it "cash," but that may just be Yank... (6 Replies)
Discussion started by: SilversleevesX
6 Replies

4. Programming

Regex to pull out all words in apostrophes in a string

Hi, I have string like this: CHECK (VALUE::text = ANY (ARRAY)) and I am trying to get out the words in apostrophes ('). In this case"ACTIVE INACTIVE DELETE" Also the array may consist of one or more words (in given example 3). Also instead of word it can be only one LETTER. And... (4 Replies)
Discussion started by: neptun79
4 Replies

5. Shell Programming and Scripting

grep ignoring punctuation

I have a file xxx.txt containing winter_kool sugar_"sweet" Is there anyway i can grep xxx.txt for strings without using punctuations. for eg: `grep sugarsweet xxx.txt` should give output : sugar_"sweet" (2 Replies)
Discussion started by: jack_gb
2 Replies

6. Shell Programming and Scripting

Replacing punctuation marks with the help of sed

#!/bin/bash a=(*.pdf) punct=((~`!@#$%^&*()_-+=|\{};':",./<>?)) for (( i =0; i < ${#a}; i++ )) do sed -ri 's/$punct//g' ${a} done I cannot use the above code, can you help me in removing all punctuation marks from file name except file extension. The idea is that once all... (9 Replies)
Discussion started by: ambijat
9 Replies

7. Shell Programming and Scripting

PHP Labeling/Punctuation Syntax Question

Greetings! My first PHP question; and, no doubt, a "no-brainer" for the initiated :) The question centers around the proper syntax for input field labeling. The snippet which puzzles me (and the candidate which I wish to modify) goes like this:<?php _e('Hello World'); ?>:<br />What I'd like... (0 Replies)
Discussion started by: LinQ
0 Replies

8. Shell Programming and Scripting

Printing apostrophes by using awk

Hello All, I would like to ask your kind help regarding the following query: I have this line: awk '$2>5 {print "File: "$1,$2}' I have got this output: File: zzzds 76 File: fd9ffh 58 File: gfh0dg 107 .... Could you please help me how to modify my line to get these outputs with... (5 Replies)
Discussion started by: Padavan
5 Replies

9. HP-UX

Problems after deleting /var/tmp

Hi, To clear up the filesystem, I archived /var/tmp (forgot that this directory was important for crontab), and then deleted the directory itself. After that there were problems like crontab not accessible, certain ftp commands like mget not functioning, and worst there were some scripts which... (4 Replies)
Discussion started by: anaigini45
4 Replies

10. Shell Programming and Scripting

Adding text from a variable using sed (Or awk) with punctuation

Hi All, I would have though this would have been simple, but... I have text in a variable that I need to insert into a bunch of other files... The text is simple: ... (2 Replies)
Discussion started by: joeg1484
2 Replies
deroff(1)							   User Commands							 deroff(1)

NAME
deroff - remove nroff/troff, tbl, and eqn constructs SYNOPSIS
deroff [ -m [m | s | l] ] [-w] [-i] [ filename...] DESCRIPTION
deroff reads each of the filenames in sequence and removes all troff(1) requests, macro calls, backslash constructs, eqn(1) constructs (between .EQ and .EN lines, and between delimiters), and tbl(1) descriptions, perhaps replacing them with white space (blanks and blank lines), and writes the remainder of the file on the standard output. deroff follows chains of included files (.so and .nx troff commands); if a file has already been included, a .so naming that file is ignored and a .nx naming that file terminates execution. If no input file is given, deroff reads the standard input. OPTIONS
-m The -m option may be followed by an m, s, or l. The -mm option causes the macros to be interpreted so that only running text is output (that is, no text from macro lines.) The -ml option forces the -mm option and also causes deletion of lists associated with the mm macros. -w If the -w option is given, the output is a word list, one ``word'' per line, with all other characters deleted. Otherwise, the output follows the original, with the deletions mentioned above. In text, a ``word'' is any string that contains at least two let- ters and is composed of letters, digits, ampersands (&), and apostrophes ('); in a macro call, however, a ``word'' is a string that begins with at least two letters and contains a total of at least three letters. Delimiters are any characters other than letters, digits, apostrophes, and ampersands. Trailing apostrophes and ampersands are removed from ``words.'' -i The -i option causes deroff to ignore .so and .nx commands. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWdoc | +-----------------------------+-----------------------------+ SEE ALSO
eqn(1), nroff(1), tbl(1), troff(1), attributes(5) NOTES
deroff is not a complete troff interpreter, so it can be confused by subtle constructs. Most such errors result in too much rather than too little output. The -ml option does not handle nested lists correctly. SunOS 5.10 14 Sep 1992 deroff(1)
All times are GMT -4. The time now is 12:47 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy