Sponsored Content
Top Forums Shell Programming and Scripting Regex to hunt for a string in the right hand column Post 302995175 by gimley on Sunday 2nd of April 2017 10:55:32 PM
Old 04-02-2017
Regex to hunt for a string in the right hand column

I have a database which has the following structure
Code:
English word=IPA notation

as in the example below
Code:
huckleberry=ˈhʌkəl-bəriː
huddling=ˈhʌd-lɪŋ
huffish=ˈhʌ-fɪʃ
hugger-mugger=ˈhʌgər-mʌgər
hulling=ˈhʌ-lɪŋ
human=ˈhjuː-mən
humanitarian=hjuːˌ-mænɪˈteərɪyən
humanitarianism=hjuːˌ-mænɪˈteərɪənɪzəm
humanitarians=hjuːˌ-mænɪˈteərɪyənz
humbler=ˈhʌmb-lər
humblest=ˈhʌmb-lest
humbling=ˈhʌmb-lɪŋ
humid=ˈhjuː-mɪd
humming-top=ˈhʌmɪŋ-tɒp
humming-tops=ˈhʌmɪŋ-tɒps
humourless=ˈhjuːmər-læs
hunchback=ˈhʌnʧ-bæk
hunchbacks=ˈhʌnʧ-bæks
hundredfold=ˈhʌndred-fold
hunger-march=ˈhʌŋər-mɑːrʧ
hunger-marcher=ˈhʌŋər-ˌmɑːrʧər
hunger-marchers=ˈhʌŋər-ˌmɑːrʧərz

I want to identify all instances of a hyphen in the right-hand column i.e. IPA and which are missing in the left-hand column i.e. the English Words. A small example is given below
Code:
humourless=ˈhjuːmər-læs
hunchback=ˈhʌnʧ-bæk
hunchbacks=ˈhʌnʧ-bæks
hundredfold=ˈhʌndred-fold

I work under windows but a Perl or even Unix regex could help. Many thanks
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to get the most left hand string ??

Hi, I remember once seeing a way to get the left most string in a word. Let's say: a="First.Second.Third" (separated by dot) echo ${a#*.} shows --> Second.Third echo ${a##*.} shows --> Third How do I get the the left most string "First" Or "First.Second" ??? Tried to replace #... (2 Replies)
Discussion started by: jfortes
2 Replies

2. UNIX for Dummies Questions & Answers

regex on first string in a variable.

Hi all, this is driving me nuts. I need to evaluate if a variable in a shell script has a heading s or m character e.g. s92342394 or m9233489 if so then I need to get rid of them. I'm quite familiar with PERL and could do it there in 3 mins but I have not found a decent way to do this in a shell.... (1 Reply)
Discussion started by: Endo
1 Replies

3. Shell Programming and Scripting

Replace if regex on specific column matches expression?

I am attempting to convert rewrite rules to Nginx, and since due to the mass amount of rewrites we must convert, I've been trying to write a script to help me on a specific part, easily. So far I have this: rewrite ^action/static/(+)/$ staticPage.php?pg=$1&%$query_string; What I want done... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

4. Shell Programming and Scripting

filtering out duplicate substrings, regex string from a string

My input contains a single word lines. From each line data.txt prjtestBlaBlatestBlaBla prjthisBlaBlathisBlaBla prjthatBlaBladpthatBlaBla prjgoodBlaBladpgoodBlaBla prjgood1BlaBla123dpgood1BlaBla123 Desired output --> data_out.txt prjtestBlaBla prjthisBlaBla... (8 Replies)
Discussion started by: kchinnam
8 Replies

5. Shell Programming and Scripting

Merge left hand strings mapping to different right hand strings

Hello, I am working on an Urdu to Hindi dictionary which has the following structure: a=b a=c n=d n=q and so on. i.e. Headword separated from gloss by a = I am giving below a live sample بتا=बता بتا=बित्ता بتا=बुत्ता بتان=बतान بتان=बितान بتانا=बिताना I need the following... (3 Replies)
Discussion started by: gimley
3 Replies

6. Shell Programming and Scripting

How to use regex on particular column (Removing comma from particular column)?

Hi, I have pipe separated file which contains some data having comma(,) in it. I want to remove the comma(,) only from particular column without changing data in other columns. Below is the sample data file, I want to remove the comma(,) only from 5th column. $ cat file1 ABC | DEF, HIJ|... (6 Replies)
Discussion started by: Prathmesh
6 Replies

7. Shell Programming and Scripting

Grep with regex containing one string but not the other

Hi to you all, I'm just struggling with a regex problem and I'm pretty sure that I'm missing sth obvious... :confused: I need a regex to feed my grep in order to find lines that contain one string but not the other. Here's the data example: 2015-04-08 19:04:55,926|xxxxxxxxxx| ... (11 Replies)
Discussion started by: stresing
11 Replies

8. Shell Programming and Scripting

Search Replace Specific Column using RegEx

Have Pipe Delimited File: > BRYAN BAKER|4/4/2015|518 VIRGINIA AVE|TEST > JOE BAXTER|3/30/2015|2233 MockingBird RD|ROW2On 3rd column where the address is located, I want to add a space after every numeric value - basically doing a "s//&\ / ": > BRYAN BAKER|4/4/2015|5 1 8 VIRGINIA AVE|TEST > JOE... (5 Replies)
Discussion started by: svn
5 Replies

9. UNIX for Dummies Questions & Answers

Regex matching column awk

Hi all, I want to extract rows with the pattern ALPHANUMERIC/ALPHANUMNERIC in the 2nd column. I dont wan rows with more than 1 slash or without any slash in 2nd column. a a/b b a/b/c c a/b//c d t/y e r f /f I came up with the regex grep '\/$' file a a/b b a/b/c d t/y (3 Replies)
Discussion started by: jianp83
3 Replies
Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME
Locale::Codes::LangFam - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangFam; $lext = code2langfam('apa'); # $lext gets 'Apache languages' $code = langfam2code('Apache languages'); # $code gets 'apa' @codes = all_langfam_codes(); @names = all_langfam_names(); DESCRIPTION
The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in ISO 639-5. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5 language family codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langfam('apa','alpha'); $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages. This is the default code set. ROUTINES
code2langfam ( CODE [,CODESET] ) langfam2code ( NAME [,CODESET] ) langfam_code2code ( CODE ,CODESET ,CODESET2 ) all_langfam_codes ( [CODESET] ) all_langfam_names ( [CODESET] ) Locale::Codes::LangFam::rename_langfam ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangFam::add_langfam ( CODE ,NAME [,CODESET] ) Locale::Codes::LangFam::delete_langfam ( CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_alias ( NAME ,NEW_NAME ) Locale::Codes::LangFam::delete_langfam_alias ( NAME ) Locale::Codes::LangFam::rename_langfam_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::delete_langfam_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-5/id.php ISO 639-5 . AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Codes::LangFam(3pm)
All times are GMT -4. The time now is 08:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy