Sponsored Content
Full Discussion: [awk]Chinese words!!
Top Forums Programming [awk]Chinese words!! Post 302948075 by Scrutinizer on Thursday 25th of June 2015 12:45:44 PM
Old 06-25-2015
I suppose if you new what part of the code set are chinese characters you could try deleting the other characters. For starters, see what something like this brings:
Code:
tr -d '[:alnum:][:punct:]' < file

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract numbers below words with awk

Hi all, Please some help over here. I have a Sales.txt file containing info in blocks for every sold product in the pattern showed below (only for 2 products). NEW BLOCK SALE DATA PRODUCT SERIAL 79833269999 146701011945004 .Some other data .Some... (17 Replies)
Discussion started by: cgkmal
17 Replies

2. Shell Programming and Scripting

awk after words

hi sorry, newbie for scripting.. the text file: abcdefghijk%%$%^U^%234454234 I got awk script: awk ' ($1 == "abcd") the output only show: abcd I tried using asterix: awk ' ($1 == "abcd*") but same output abcd only... how can I get all of the line ?? (6 Replies)
Discussion started by: flekzout
6 Replies

3. Shell Programming and Scripting

How to get a known word between two known words using awk

hi I have posted it earlier but i was unable to put my exact problem.This time posting in parts. I have a text file which i had transferred to UNIX.It has strings like: alter table table_name add (column_name); as well as modify options. now i need to read the table name between alter... (3 Replies)
Discussion started by: alisha
3 Replies

4. Shell Programming and Scripting

using the $1 $2 etc for words in awk

So if I have an awk statement that is basically just looking at the NF at if its more than 2, then print out the first 2 words, and all the rest on another line. I know that $1 and $2 are the first two fields, but how would I symbolise telling it to print all the other fields regardless of how many... (11 Replies)
Discussion started by: linuxkid
11 Replies

5. Shell Programming and Scripting

search several words with awk command

Hello, I want to test if i find the word CACCIA AND idlck in a file, i have to print a message Ok. For that , i need to user a awk command with a && logical. Can you help me ? :confused: ### CACCIA: DEBUT ### if $(grep -wqi "$2" /etc/passwd); then && rm /etc/security/.idlck ... (3 Replies)
Discussion started by: khalidou13
3 Replies

6. Shell Programming and Scripting

search for a pattern using awk between two words

Hi, how can we search for a pattren between two words? below are the examples input 1)select from table_name c1,c2,c3,c4,fn(),fn2(),c5;-->false 2)select from table_name c1,c2,c3,c4;--True 3)select from table c1, c2, c3, fn(), c4;-->true 4)select from table_name c1, c2, c3;-->true... (11 Replies)
Discussion started by: manasa_vs
11 Replies

7. UNIX for Advanced & Expert Users

Need help either with awk or sed to get text between words

Hello All, My requirement is to get test between two words START & END, something like html tags Eg. Input file: START Line1 Line2 Line3 CLOSE START Line4 Line5 Line6 END START Line7 START Line8 (7 Replies)
Discussion started by: konerusuneel
7 Replies

8. Shell Programming and Scripting

AWK count letters words

Hi All! can anyone help me with this code? I want to count words or letters in every line with if(count>20){else echo $myline} awk '/<script /{p=1} /<\/script>/{p=0; next}!p' index.html | while read myline; do echo $myline done Thank you !!! (3 Replies)
Discussion started by: sanantonio7777
3 Replies

9. Shell Programming and Scripting

Count words/lines between two tags using awk

Is there an efficient awk that can count the number of lines that occur in between two tags. For instance, consider the following text: <s> Hi PP - my VBD - name DT - is NN - . SENT . </s> <s> Her PP - name VBD - is DT - the NN - same WRT - . SENT - </s> I am interested to know... (4 Replies)
Discussion started by: owwow14
4 Replies

10. Shell Programming and Scripting

Permutation Words in awk

i have 13 different words. I need to get permutations like all combinations of this words: word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12 word13 But the combinations only should be 12 words long. Is there a fast efficient way to do this? Maybe with linux tool... (1 Reply)
Discussion started by: watcherpro
1 Replies
ISWPUNCT(3)						     Linux Programmer's Manual						       ISWPUNCT(3)

NAME
iswpunct - test for punctuation or symbolic wide character SYNOPSIS
#include <wctype.h> int iswpunct(wint_t wc); DESCRIPTION
The iswpunct() function is the wide-character equivalent of the ispunct(3) function. It tests whether wc is a wide character belonging to the wide-character class "punct". The wide-character class "punct" is a subclass of the wide-character class "graph", and therefore also a subclass of the wide-character class "print". The wide-character class "punct" is disjoint from the wide-character class "alnum" and therefore also disjoint from its subclasses "alpha", "upper", "lower", "digit", "xdigit". Being a subclass of the wide-character class "print", the wide-character class "punct" is disjoint from the wide-character class "cntrl". Being a subclass of the wide-character class "graph", the wide-character class "punct" is disjoint from the wide-character class "space" and its subclass "blank". RETURN VALUE
The iswpunct() function returns nonzero if wc is a wide-character belonging to the wide-character class "punct". Otherwise, it returns zero. ATTRIBUTES
For an explanation of the terms used in this section, see attributes(7). +-----------+---------------+----------------+ |Interface | Attribute | Value | +-----------+---------------+----------------+ |iswpunct() | Thread safety | MT-Safe locale | +-----------+---------------+----------------+ CONFORMING TO
POSIX.1-2001, POSIX.1-2008, C99. NOTES
The behavior of iswpunct() depends on the LC_CTYPE category of the current locale. This function's name is a misnomer when dealing with Unicode characters, because the wide-character class "punct" contains both punctuation characters and symbol (math, currency, etc.) characters. SEE ALSO
ispunct(3), iswctype(3) COLOPHON
This page is part of release 4.15 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at https://www.kernel.org/doc/man-pages/. GNU
2015-08-08 ISWPUNCT(3)
All times are GMT -4. The time now is 02:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy