Change a character based on its position number | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Change a character based on its position number

UNIX for Dummies Questions & Answers


Tags
unix code

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 10-08-2012
a_bahreini a_bahreini is offline
Registered User
 
Join Date: May 2012
Last Activity: 14 April 2014, 4:10 PM EDT
Posts: 55
Thanks: 19
Thanked 0 Times in 0 Posts
Change a character based on its position number

Hi I have a text file that I want to change some of the characters based on their position. My file contain multiple lines and characters should be counted continuously line by line. For example, I want to convert the 150th T to C. What can I do? Here is a portion of my file:

Code:
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAAAATTTCCACCA
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC
CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT
TTTATCTTTAGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACA
TTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCAATACAACCCCC
GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC
CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCA
AAGCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACAA
ATAGGTTTGGTCCTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGC
AAGCATCCCCGTTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAG
GGACAAGCATCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCC
ACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGT
TTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACC
GCGGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTTT


Last edited by Scott; 10-08-2012 at 01:17 PM.. Reason: Code tags, please...
Sponsored Links
    #2  
Old 10-08-2012
pamu pamu is offline
Registered User
 
Join Date: Mar 2012
Last Activity: 14 April 2014, 6:10 AM EDT
Posts: 1,640
Thanks: 58
Thanked 476 Times in 472 Posts
try this...


Code:
awk -F "" '{if((max+NF)>150){for(i=1;i<=NF;i++){if((max+i) == 150 && $i ~ /T/){$i = "C"}}}else{max+=NF}}1' file

The Following User Says Thank You to pamu For This Useful Post:
a_bahreini (10-09-2012)
Sponsored Links
    #3  
Old 10-08-2012
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 17 April 2014, 5:28 PM EDT
Location: Houston, Texas, USA
Posts: 660
Thanks: 0
Thanked 200 Times in 192 Posts

Code:
awk -v p=150 -v l="C" '{for (i=1; i<=length($0); i++) {++c; o=$0; if (c==p) o=substr($0,1,i-1) l substr($0,i+1);};print o;}' infile

The Following User Says Thank You to rdrtx1 For This Useful Post:
a_bahreini (10-09-2012)
    #4  
Old 10-08-2012
Don Cragun's Avatar
Don Cragun Don Cragun is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 April 2014, 2:39 AM EDT
Location: San Jose, CA, USA
Posts: 3,484
Thanks: 141
Thanked 1,207 Times in 1,022 Posts
Here is a much longer alternative way to do this:

Code:
#!/bin/ksh
# Usage: tester [count [from [to]]]
#       Change the "count"th occurrence of the character specified by "from"
#       to the character specified by "to".  If not given on the command line,
#               "count" defaults to 150,
#               "from" defaults to "T", and
#               "to" defaujlts to "C".
awk -v cnt=${1:-150} -v from=${2:-T} -v to=${3:-C} 'BEGIN{tmpc = "\a"}
cnt>0 {
        # See if changing every "from" character on this line will go too far.
        if((n = gsub(from, from)) < cnt) {
                # No.  Reduce cnt by the number of "from" characters found and
                # print the unchanged line.
                cnt -= n
        } else {
                # The "from" character we need to change is on this line.
                # Change cnt - 1 "from" characters to "tmpc" characters.
                for(i = 1; i < cnt; i++) sub(from, tmpc)
                # Change the desired "from" charaacter to the "to" character.
                sub(from, to)
                # Change the "tmpc" characters inserted above back to "from"
                # characters.
                gsub(tmpc, from)
                # Note that we are done looking.
                cnt = 0
        }
        # Fall through to next action to print the processed line.
}
1
END {   if(cnt) {
                printf("Still looking for %d %s characters when EOF found.\n",
                        cnt, from)
                exit 1
        }
        exit 0
}' in

The Following User Says Thank You to Don Cragun For This Useful Post:
a_bahreini (10-09-2012)
Sponsored Links
    #5  
Old 10-09-2012
a_bahreini a_bahreini is offline
Registered User
 
Join Date: May 2012
Last Activity: 14 April 2014, 4:10 PM EDT
Posts: 55
Thanks: 19
Thanked 0 Times in 0 Posts
Hi Guys,
Thanks for the codes you sent. However, they don't work properly when I change the the number of the letter. Here I've attached the file which may help you work on it a little bit more easily. Thank you agian for putting effort and time on my problem.
Attached Files
File Type: txt mtDNA_GATK_reference_letters.txt (16.8 KB, 15 views)
Sponsored Links
    #6  
Old 10-09-2012
Don Cragun's Avatar
Don Cragun Don Cragun is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 April 2014, 2:39 AM EDT
Location: San Jose, CA, USA
Posts: 3,484
Thanks: 141
Thanked 1,207 Times in 1,022 Posts
Which letter do you want to change?
Which letter do you want to replace it?
Which occurrence of that latter do you want to change?

You said my script doesn't work properly. What does that mean? Did it produce a diagnostic message? Was some other character in the file changed? What system are you using?

When I try to change the 150th occurrence of T to C in the file you attached using the script I provided, the T in column 20 of line 13 does change from T to C just as you said you wanted.

Looking more closely at your input file I see that there are 4998 As, 5095 Cs, 2108 Gs, 3997 Ts, and 332 newlines in a file that contains 17,235 bytes. Could the problem be that there are four sequences of lowercase letters in the big file you attached, but only uppercase letters in the sample that you said was representative of your entire file?

Last edited by Don Cragun; 10-09-2012 at 09:58 PM.. Reason: The sample data was not representative of the actual data???
Sponsored Links
    #7  
Old 10-09-2012
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 17 April 2014, 5:28 PM EDT
Location: Houston, Texas, USA
Posts: 660
Thanks: 0
Thanked 200 Times in 192 Posts

Code:
awk -v p=150 -v r="T" -v l="C" '{for (i=1; i<=length($0); i++) {s=substr($0,i,1);if (s~r)n++;if (n==p)s=l;printf s}print ""}' infile

Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Linux script to remove a character in a file based on position. mailme0205 Linux 3 02-01-2012 03:55 PM
Cut multiple data based on character position zooby Shell Programming and Scripting 1 10-07-2010 11:27 AM
Change Position of word character cedrichiu Shell Programming and Scripting 6 03-12-2007 01:52 AM
Sorting a flat file based on multiple colums(using character position) cucubird Shell Programming and Scripting 8 07-25-2006 12:47 AM
Character position akrathi UNIX for Dummies Questions & Answers 4 10-26-2005 04:06 AM



All times are GMT -4. The time now is 08:12 AM.