Replacing space with hyphen in a pattern.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing space with hyphen in a pattern.
# 1  
Old 08-26-2016
[Solved] Replacing space with hyphen in a pattern.

I have a huge text file, about 52 GB. In the file, there are patterns like these:
Code:
[[Renaissance humanism|renaissance]]
[[Taoism|Taoist]]
[[Foundation for Economic Education]]
[http://www.theanarchistlibrary.org/HTML/Daniel_Guerin__Anarchism__From_Theory_to_Practice.html#toc2 ''Anarchism: From Theory to Practice'']
[[self-governance|self-governed]]

One can see that there is text within patterns such as [] and [[]], and I am only interested in [[]]. There is text before and after all these patterns too, for example,
Code:
''Anarchism''' is a [[political philosophy]] that advocates [[self-governance|self-governed]] societies with voluntary institutions.

.

My aim is to replace space with a hyphen for words enclosed within the pattern [[]], for example, the expected out that I am aiming for is:
Code:
[[Renaissance-humanism|renaissance]]
[[Taoism|Taoist]]
[[Foundation-for-Economic-Education]]
[http://www.theanarchistlibrary.org/HTML/Daniel_Guerin__Anarchism__From_Theory_to_Practice.html#toc2  ''Anarchism: From Theory to Practice'']
[[self-governance|self-governed]]

I have written a C program (which I paste below) to solve this problem, but there seems to be two issues with it, one it is not perfect to place hyphen and second it seems to crash when a non-ASCII character is encountered. I was wondering whether there is a way to solve the same problem using sed or awk or something similar in BASH. The reason why I want to move to a regular expression parser in BASH is that if I spend a lot of time editing and perfecting this code, it might still crash when encountering non-ASCII characters as I found it difficult to get rid of all non-ASCII characters from the file.

Code:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>

int main ( int argc , char ** argv )
{
    FILE *text_file = NULL;
    text_file = fopen ( argv [ 1 ] , "r" );
    if(text_file == NULL )
    {
        fprintf(stderr,"file open error\n");
        exit(1);
    }
    char ch = '\0';
    
    while ( !feof ( text_file ))
    {
        ch = fgetc ( text_file );
        printf ("%c" , ch );
        if ( ch == '[')
        {
            ch = fgetc ( text_file );
            if ( ch == '[')
            {
                ch = fgetc ( text_file );
                while ( ch != ']' )
                {
                    printf("%c" , ch );
                    if ( isspace(ch))
                    {
                        ch='-';
                    }
                }
            }
            else
            {
                ungetc(ch,text_file);
            }
        }
    }
    
    fclose(text_file);
    return ( EXIT_SUCCESS);
}


Last edited by shoaibjameel123; 08-26-2016 at 11:53 AM.. Reason: Issue solved
# 2  
Old 08-26-2016
How about
Code:
sed ':L;s/\(\[\[[^ ]*\) \([^]]*\]\)/\1-\2/;tL' file
[[Renaissance-humanism|renaissance]]
[[Taoism|Taoist]]
[[Foundation-for-Economic-Education]]
[http://www.theanarchistlibrary.org/HTML/Daniel_Guerin__Anarchism__From_Theory_to_Practice.html#toc2 ''Anarchism: From Theory to Practice'']
[[self-governance|self-governed]]
''Anarchism''' is a [[political-philosophy]]-that-advocates-[[self-governance|self-governed]] societies with voluntary institutions.

# 3  
Old 08-26-2016
Thanks, and sorry for not being very clear in my initial post. The output that I am trying to get is:
Code:
'''Anarchism''' is a [[political-philosophy]] that advocates [[self-governance|self-governed]] societies with voluntary institutions.

Another example,
Code:
The first political philosopher to call himself an anarchist was [[Pierre-Joseph-Proudhon]], marking the formal birth of anarchism in the mid-nineteenth century.

Some noisy text:
Code:
[[Online-etymology-dictionary]].&lt;/ref&gt; The first known use of this word was in 1539.&lt;ref&gt;&quot;Origin of ANARCHY

Therefore, I only aim to replace space with a '-' within this pattern "[[ ]]" and no where else in the text file. The outputs that I get from the above two suggested solutions are putting hyphens everywhere.
# 4  
Old 08-26-2016
Add a closing square bracket to the first search pattern:
Code:
sed ':L;s/\(\[\[[^] ]*\) \([^]]*\]\)/\1-\2/;tL' file

and try again.
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Gawk --- produce the output in pattern space instead of END space

hi, I'm trying to calculate IP addresses and their respective calls to our apache Server. The standard format of the input is HOST IP DATE/TIME - - "GET/POST reuest" "User Agent" HOST IP DATE/TIME - - "GET/POST reuest" "User Agent" HOST IP DATE/TIME - - "GET/POST reuest" "User Agent" HOST... (2 Replies)
Discussion started by: busyboy
2 Replies

2. Shell Programming and Scripting

sed - replacing a substring containing a hyphen

I'm attempting to replace a substring that contains a hyphen and not having much success, can anyone point out where i'm going wrong or suggest an alternative. # echo /var/lib/libvirt/images/vm888b-clone.qcow | sed -e 's|vm888-clone|qaz|g' /var/lib/libvirt/images/vm888b-clone.qcow (1 Reply)
Discussion started by: squrcles
1 Replies

3. Shell Programming and Scripting

Finding the pattern and replacing the pattern inside the file

i have little challenge, help me out.i have a file where i have a value declared and and i have to replace the value when called. for example i have the value for abc and ccc. now i have to substitute the value of value abc and ccc in the place of them. Input File: go to &abc=ddd; if... (16 Replies)
Discussion started by: saaisiva
16 Replies

4. Shell Programming and Scripting

Replacing a pattern in different cases in different columns with a single pattern

Hi All I am having pipe seperated inputs like Adam|PeteR|Josh|PEter Nick|Rave|Simon|Paul Steve|smith|PETER|Josh Andrew|Daniel|StAlin|peter Rick|PETer|ADam|RAVE i want to repleace all the occurrence of peter (in any case pattern PeteR,PEter,PETER,peter,PETer) with Peter so that output... (5 Replies)
Discussion started by: sudeep.id
5 Replies

5. Shell Programming and Scripting

help - sed - insert space between string of form XxxAxxBcx, without replacing the pattern

If the string is of the pattern XxxXyzAbc... The expected out put from sed has to be Xxx Xyz Abc ... eg: if the string is QcfEfQfs, then the expected output is Qcf Ef Efs. If i try to substitute the pattern with space then the sed will replace the character or pattern with space,... (1 Reply)
Discussion started by: frozensmilz
1 Replies

6. Shell Programming and Scripting

Replacing / with a space using awk

I have a string and want to replace the / with a space. For example having "SP/FS/RP" I want to get "SP FS RP" However I am having problems using gsub set phases = `echo $Aphases | awk '{gsub(///," ")}; {print}'` (5 Replies)
Discussion started by: kristinu
5 Replies

7. Shell Programming and Scripting

Replacing a string with a space

I'm trying to replace a string "99999999'" with the blank where ever is there in the file. Could you please help in unix scripting. Thank You. (6 Replies)
Discussion started by: vsairam
6 Replies

8. UNIX for Advanced & Expert Users

sed help on replacing space before and after *

I would like to replace the value of * (which might have one or more whitespace(s) before and after *) using sed command in aix. Eg: Var='Hi I am there * Desired output: Hi I am there* (1 Reply)
Discussion started by: techmoris
1 Replies

9. UNIX for Dummies Questions & Answers

replacing space with pipe(delimiter)

Hello All, I have a file with thousands of records: eg: |000222|123456987|||||||AARONSON| JOHN P|||PRIMARY |P |000111|567894521|||||||ATHENS| WILLIAM k|||AAAA|L Expected: |000222|123456987|||||||AARONSON| JOHN |P|||PRIMARY |P |000111|567894521|||||||ATHENS| WILLIAM |k|||AAAA|L I... (6 Replies)
Discussion started by: OSD
6 Replies

10. UNIX for Dummies Questions & Answers

Replacing URL in a file with space

Hi, I have a file with a URL text written in it within double quotes e.g. "http://abcd.xyz.com/mno/somefile.dtd" I want the above text to get replaced by a single space character. I tried cat File1.txt | sed -e 's/("http)*(dtd")/ /g' > File2.txt But it didnt work out. Can someone... (5 Replies)
Discussion started by: dsrookie
5 Replies
Login or Register to Ask a Question