Sponsored Content
Top Forums Shell Programming and Scripting PHP: preg_match_all with multibyte characters? Post 302302462 by Ilja on Tuesday 31st of March 2009 04:37:28 AM
Old 03-31-2009
PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this:
Code:
$pattern = "/[A-Z]([a-z]|[[:space:]]|,)*[\.\!\?:]*/";
preg_match_all($pattern, $text, $matches);

This works fine unless the text contains multibyte characters, like "едц". How can I make this work with these exotic characters?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

split string with multibyte delimiter

Hi, I need to split a string, either using awk or cut or basic unix commands (no programming) , with a multibyte charectar as a delimeter. Ex: abcd-efgh-ijkl split by -efgh- to get two segments abcd & ijkl Is it possible? Thanks A.H.S (1 Reply)
Discussion started by: azmathshaikh
1 Replies

2. Shell Programming and Scripting

Multibyte characters to ASCII

Hello, Is there any UNIX utility/command/executable that will convert mutlibyte characters to standard single byte ASCII characters in a given file? and Is there any UNIX utility/command/executable that will recognize multibyte characters in a given file name? The typical multibyte... (8 Replies)
Discussion started by: jerardfjay
8 Replies

3. Shell Programming and Scripting

PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this: $pattern = "/(|]|,)**/"; preg_match_all($pattern, $text, $matches); This works fine unless the text contains multibyte characters, like "едц". How can I make this work with these exotic characters? An example phrase that doesn't match:... (1 Reply)
Discussion started by: Ilja
1 Replies

4. Shell Programming and Scripting

How to replace characters with random characters

I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far: #!/bin/bash rand=$(($RANDOM % 9)) sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies

5. Programming

How will the behaviour of multibyte char differ because of different LC_CTYPE locale?

I am comparing two multibyte characters in two different platforms having different LC_CTYPE variables, they are returning different values. One of the variable is sigma initialised to "\317\203" and the other one is empty string i.e, "" Below is the scenario of the two platforms: In... (4 Replies)
Discussion started by: baig_1988
4 Replies

6. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

7. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

8. Shell Programming and Scripting

Positional insertion for multibyte characters

Hi I have a requirement to insert a dot "." after a position in each line, say 110th position. For which, I have written the below command. cat filename | sed 's/./&\./110' > new_filename The code is working fine, but when we have multi byte (2 or 3) characters in the input file, the... (3 Replies)
Discussion started by: tostay2003
3 Replies

9. Shell Programming and Scripting

Remove first 2 characters and last two characters of each line

here's what im trying to do. i have a file containing lines similar to this: data.txt: 1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU 1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies

10. Shell Programming and Scripting

Outputting characters after a given string and reporting the characters in the row below --sed

I have this fastq file: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA +test-1 GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8... (10 Replies)
Discussion started by: Xterra
10 Replies
MB_EREG_REPLACE_CALLBACK(3)						 1					       MB_EREG_REPLACE_CALLBACK(3)

mb_ereg_replace_callback - Perform a regular expresssion seach and replace with multibyte support using a callback

SYNOPSIS
string mb_ereg_replace_callback (string $pattern, callable $callback, string $string, [string $option = "msr"]) DESCRIPTION
Scans $string for matches to $pattern, then replaces the matched text with the output of $callback function. The behavior of this function is almost identical to mb_ereg_replace(3), except for the fact that instead of $replacement parameter, one should specify a $callback. PARAMETERS
o $pattern - The regular expression pattern. Multibyte characters may be used in $pattern. o $callback - A callback that will be called and passed an array of matched elements in the $subject string. The callback should return the replacement string. You'll often need the $callback function for a mb_ereg_replace_callback(3) in just one place. In this case you can use an anonymous function to declare the callback within the call to mb_ereg_replace_callback(3). By doing it this way you have all information for the call in one place and do not clutter the function namespace with a callback function's name not used anywhere else. o $string - The string being checked. o $option - Matching condition can be set by $option parameter. If i is specified for this parameter, the case will be ignored. If x is specified, white space will be ignored. If m is specified, match will be executed in multiline mode and line break will be included in '.'. If p is specified, match will be executed in POSIX mode, line break will be considered as normal character. Note that e cannot be used for mb_ereg_replace_callback(3). RETURN VALUES
The resultant string on success, or FALSE on error. NOTES
Note The internal encoding or the character encoding specified by mb_regex_encoding(3) will be used as the character encoding for this function. EXAMPLES
Example #1 mb_ereg_replace_callback(3) example <?php // this text was used in 2002 // we want to get this up to date for 2003 $text = "April fools day is 04/01/2002 "; $text.= "Last christmas was 12/24/2001 "; // the callback function function next_year($matches) { // as usual: $matches[0] is the complete match // $matches[1] the match for the first subpattern // enclosed in '(...)' and so on return $matches[1].($matches[2]+1); } echo mb_ereg_replace_callback( "(d{2}/d{2}/)(d{4})", "next_year", $text); ?> The above example will output: April fools day is 04/01/2003 Last christmas was 12/24/2002 Example #2 mb_ereg_replace_callback(3) using anonymous function supported in PHP 5.3.0 or later <?php // this text was used in 2002 // we want to get this up to date for 2003 $text = "April fools day is 04/01/2002 "; $text.= "Last christmas was 12/24/2001 "; echo mb_ereg_replace_callback( "(d{2}/d{2}/)(d{4})", function ($matches) { return $matches[1].($matches[2]+1); }, $text); ?> SEE ALSO
mb_regex_encoding(3), mb_ereg_replace(3), Anonymous functions, information about the callback type. PHP Documentation Group MB_EREG_REPLACE_CALLBACK(3)
All times are GMT -4. The time now is 02:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy