Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Selective Replacements: Using sed or awk to replace letters with numbers in a very specific way Post 302665959 by shamrock on Tuesday 3rd of July 2012 12:27:55 PM
Old 07-03-2012
Quote:
Originally Posted by Mince
I'll try it out and get back to you. I am unfamiliar with awk, do you think you could give me a bit of an idea of what each part of the script is for?
All that the awk script does is convert a set of letter codes which encode the base 4 positional number system into a decimal number...much like the hexadecimal system does. So a string of letter codes like T or CA or GCT can be viewed as a base 4 number with the letters A C G T used to encode the numbers 0 1 2 3 as it would be in the base 4 number system. Now all that you have to do is convert a string of base 4 letter codes into a decimal number and that is all that the awk script I posted does.

So for ex. to convert TGC into a decimal number you would do...
Code:
TGC = T * (4^2) + G * (4^1) + C * (4^0)
TGC = 3 * (4^2) + 2 * (4^1) + 1 * (4^0)  #  since T==3 G==2 and C==1
TGC = 57           #  base 4 value
TGC = 58 (57 + 1)  #  actual value since A==1 C==2 G==3 and T==4

Quote:
Originally Posted by Mince
Also, is it necessary to have A set as 0? I forgot to mention that 0 in the format I am converting it to means "no data."

Thanks again!
The reason for setting A to 0 is to create an encoded base 4 number system...so can you clarify what you mean by posting a sample of the input that means "no data".
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk script selective insert between lines

Hi I have a file in the foll. format *RECORD* *FIELD NO* ....... ....... *FIELD TX* Data *FIELD AV* Data *FIELD RF* *RECORD* *FIELD NO* ....... ....... *FIELD TX* Data *FIELD RF* (4 Replies)
Discussion started by: dunstonrocks
4 Replies

2. Shell Programming and Scripting

using sed to replace a specific string on a specific line number using variables

using sed to replace a specific string on a specific line number using variables this is where i am at grep -v WARNING output | grep -v spawn | grep -v Passphrase | grep -v Authentication | grep -v '/sbin/tfadmin netguard -C'| grep -v 'NETWORK>' >> output.clean grep -n Destination... (2 Replies)
Discussion started by: todd.cutting
2 Replies

3. Shell Programming and Scripting

sed command, look for numbers following letters

If I have a set of strings, C21 F231 H42 1C10 1F113 and I want to isolate the ints following the char, what would the sed string be to find numbers after letters? If I do, *, I will get numbers after letters, but I am looking to do something like, sed 's/*/\t*/g' this will give me... (14 Replies)
Discussion started by: LMHmedchem
14 Replies

4. Shell Programming and Scripting

Replace specific field on specific line sed or awk

I'm trying to update a text file via sed/awk, after a lot of searching I still can't find a code snippet that I can get to work. Brief overview: I have user input a line to a variable, I then find a specific value in this line 10th field in this case. After asking for new input and doing some... (14 Replies)
Discussion started by: crownedzero
14 Replies

5. Shell Programming and Scripting

sed&awk: replace lines with counting numbers

Dear board, (I am trying to post this the 3rd time, seems there's some conflicts with my firefox with this forum, now use IE) ------ yes, I have searched the forum, but seems my ? is too complicated. ------------origianl file --------------- \storage\qweq\ertert\ertert\3452\&234\test.rec... (4 Replies)
Discussion started by: oUo
4 Replies

6. Shell Programming and Scripting

Selective Replace awk column values

Hi, I have the following data: 2860377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEGATIVE" 32340377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEG-DID"... (3 Replies)
Discussion started by: sdohn
3 Replies

7. Shell Programming and Scripting

awk : match only the pattern string , not letters or numbers after that.

Hi Experts, I am finding difficulty to get exact match: file OPERATING_SYSTEM=HP-UX LOOPBACK_ADDRESS=127.0.0.1 INTERFACE_NAME="lan3" IP_ADDRESS="10.53.52.241" SUBNET_MASK="255.255.255.192" BROADCAST_ADDRESS="" INTERFACE_STATE="" DHCP_ENABLE=0 INTERFACE_NAME="lan3:1"... (6 Replies)
Discussion started by: rveri
6 Replies

8. UNIX for Dummies Questions & Answers

sed - extract a group of Letters/numbers

I have a file with hundreds of lines in it. I wanted to extract anything that matches the following: KR followed by 4 digits: example KR1201 cat list | sed "s///g" Is the closest I've come, and obviously it is not what I want. This would remove all of the items that I want and leave me... (2 Replies)
Discussion started by: newbie2010
2 Replies

9. UNIX for Dummies Questions & Answers

Sed/awk to find negative numbers and replace with 1?

Greetings. I have a three column file, and there are some numbers in the second column that are <1. However I need all numbers to be positive, thus need to replace all those numbers with just one. I feel like there must be a simple way to use awk to find these numbers and sed to replace but can't... (5 Replies)
Discussion started by: Twinklefingers
5 Replies

10. UNIX for Beginners Questions & Answers

Decimal numbers and letters in the same collums: round numbers

Hi! I found and then adapt the code for my pipeline... awk -F"," -vOFS="," '{printf "%0.2f %0.f\n",$2,$4}' xxx > yyy I add -F"," -vOFS="," (for input and output as csv file) and I change the columns and the number of decimal... It works but I have also some problems... here my columns ... (7 Replies)
Discussion started by: echo manolis
7 Replies
wcstol(3C)						   Standard C Library Functions 						wcstol(3C)

NAME
wcstol, wcstoll, wstol, watol, watoll, watoi - convert wide character string to long integer SYNOPSIS
#include <wchar.h> long wcstol(const wchar_t *restrict nptr, wchar_t **restrict endptr, int base); long long wcstoll(const wchar_t *restrict nptr, wchar_t **restrict endptr, int base); #include <widec.h> long wstol(const wchar_t *nptr, wchar_t **endptr, int base); long watol(wchar_t *nptr); long long watoll(wchar_t *nptr); int watoi(wchar_t *nptr); DESCRIPTION
The wcstol() and wcstoll() functions convert the initial portion of the wide character string pointed to by nptr to long and long long rep- resentation, respectively. They first decompose the input string into three parts: 1. an initial, possibly empty, sequence of white-space wide-character codes (as specified by iswspace(3C)) 2. a subject sequence interpreted as an integer represented in some radix determined by the value of base 3. a final wide character string of one or more unrecognised wide character codes, including the terminating null wide-character code of the input wide character string They then attempt to convert the subject sequence to an integer, and return the result. If the value of base is 0, the expected form of the subject sequence is that of a decimal constant, octal constant or hexadecimal constant, any of which may be preceded by a `+' or `-' sign. A decimal constant begins with a non-zero digit, and consists of a sequence of decimal digits. An octal constant consists of the prefix `0' optionally followed by a sequence of the digits `0' to `7' only. A hexadecimal con- stant consists of the prefix `0x' or `0X' followed by a sequence of the decimal digits and letters `a' (or `A') to `f' (or `F') with values 10 to 15 respectively. If the value of base is between 2 and 36, the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a `+' or `-' sign, but not including an integer suffix. The letters from `a' (or `A') to `z' (or `Z') inclusive are ascribed the values 10 to 35; only letters whose ascribed values are less than that of base are permitted. If the value of base is 16, the wide-character code representations of `0x' or `0X' may optionally precede the sequence of let- ters and digits, following the sign if present. The subject sequence is defined as the longest initial subsequence of the input wide character string, starting with the first non-white- space wide-character code, that is of the expected form. The subject sequence contains no wide-character codes if the input wide character string is empty or consists entirely of white-space wide-character code, or if the first non-white-space wide-character code is other than a sign or a permissible letter or digit. If the subject sequence has the expected form and the value of base is 0, the sequence of wide-character codes starting with the first digit is interpreted as an integer constant. If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign (-), the value resulting from the conversion is negated. A pointer to the final wide character string is stored in the object pointed to by endptr, provided that endptr is not a null pointer. If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer. These functions do not change the setting of errno if successful. Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are returned on error and are also valid returns on success, an applica- tion wanting to check for error situations should set errno to 0, call one of these functions, then check errno. The wstol() function is equivalent to wcstol(). The watol() function is equivalent to wstol(str,(wchar_t **)NULL, 10). The watoll() function is the long-long (double long) version of watol(). The watoi() function is equivalent to (int)watol(). RETURN VALUES
Upon successful completion, these functions return the converted value, if any. If no conversion could be performed, 0 is returned and errno may be set to indicate the error. If the correct value is outside the range of representable values, {LONG_MIN}, {LONG_MAX}, {LLONG_MIN}, or {LLONG_MAX} is returned (according to the sign of the value), and errno is set to ERANGE. ERRORS
These functions will fail if: EINVAL The value of base is not supported. ERANGE The value to be returned is not representable. These functions may fail if: EINVAL No conversion could be performed. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-------------------------------------+ |ATTRIBUTE TYPE |ATTRIBUTE VALUE | +-----------------------------+-------------------------------------+ |Interface Stability |wcstol() and wcstoll() are Standard. | +-----------------------------+-------------------------------------+ |MT-Level |MT-Safe | +-----------------------------+-------------------------------------+ SEE ALSO
iswalpha(3C), iswspace(3C), scanf(3C), wcstod(3C), attributes(5), standards(5) NOTES
Truncation from long long to long can take place upon assignment or by an explicit cast. SunOS 5.11 1 Nov 2003 wcstol(3C)
All times are GMT -4. The time now is 08:16 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy