Visit Our UNIX and Linux User Community


Extracting fixed length number from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting fixed length number from a text file
# 1  
Old 03-20-2017
Extracting fixed length number from a text file

Hi,

I have a text file with sample records as

Code:
CASE ID: 20170218881083  
Original presentment record for ARN  [24013935350549886999873] not found
for Re-presentment

I want to extract the 23 digit number from this file. I thought of using grep but initially couldn't extract the required number. However, after googling, I found out the usage of
Code:
 -P, --perl-regexp: Interpret PATTERN as a Perl regular expression.

and
Code:
-o, --only-matching: Show only the part of a matching line that matches PATTERN.

I did manage to extract the 23 digit number from the sample text above using grep -Po as suggested in a forum, but am confused as to what the usage is for. Can someone please explain it and suggest any other commands which do the same work.

Code:
$ echo 'Original 123 presentment record for ARN  [24013935350549886999873] not found'|grep  -Po "\d{23}"

Code:
$ uname -a
Linux 2.6.18-417.el5 #1 SMP Sat Nov 19 14:54:59 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

# 2  
Old 03-20-2017
Code:
echo 'Original presentment record for ARN  [24013935350549886999873] not found' | awk -F'[][]' '/^Original presentment/ {print $(NF-1)}'

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 03-20-2017
Hi that command will extract all 23 digit numbers from the text. The -o option is a GNU and BSD grep extension.

You do not need the perl extension.
Code:
grep  -Eo "[0-9]{23}" file

should work as well

The problem with this command is that it will also return part of numbers that are larger than 23 digits if present..

So it should be:
Code:
grep  -Eo '\<[0-9]{23}\>' file

An equivalent awk command would be:
Code:
awk '$1~/^[0-9]{23}$/{print $1}' RS=\[ FS=\] file


---
( grep -Eo '\<\d{23}\>' file will also work with BSD grep )
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 03-20-2017
Sed alternative

Code:
sed -n 's/.*\[\([0-9]\{23\}\)\].*/\1/p' filename

This User Gave Thanks to andy391791 For This Post:
# 5  
Old 03-20-2017
@vgersh. awk -F ' ' is for defining delimiters as far as I know, what does your command mean
Code:
awk -F'[][]'

. The next part is the start of the string I got that and I think $NF would be number of fields, but why are you subtracting 1 from it and then printing that out.

@Scrutinizer: true my command did print out numbers more than 23 in length.


Code:
grep  -Eo '\<[0-9]{23}\>' file

the above command should work perfect as square brackets wont be in delimiters always so the awk command wont work in all occasions.

I am kinda confused as to why we are using single quotes in a grep expression. Cos I was reading the other day that single quotes remove any meaning from the special characters. Shouldn't we use double quotes?

Also the fact the you have used <,>. Are they working as a block to extract only 23 digits numbers/characters?

---------- Post updated at 04:33 PM ---------- Previous update was at 04:25 PM ----------

@andy391791

I tweaked your command a bit cos the square brackets may or may not be present in future texts for finding the 23 digit number. The tweak seems to be working fine

Code:
$ echo 'Original presentment record for ARN  24013935350549886999873 not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
24013935350549886999873
[dsiddiqui@lxserv01 scripts]$ echo 'Original presentment record for ARN  [24013935350549886999873] not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
24013935350549886999873

but when I increase the length of the 23 digit number and run the command, it just extracts 23 numbered digits from the complete numeric strings and pastes it, which is not the case that I want. I just want to print 23 digits only if present
# 6  
Old 03-20-2017
Quote:
Originally Posted by dsid
@vgersh. awk -F ' ' is for defining delimiters as far as I know, what does your command mean
Code:
awk -F'[][]'

. The next part is the start of the string I got that and I think $NF would be number of fields, but why are you subtracting 1 from it and then printing that out.
-F defines field delimiters. In this particular case the field delimiters are [] as the 23-digit long string is surrounded by [].
We subtract 1 from $NF because the LAST field is following the ]. The field next to last is your 23-digit long string.
This User Gave Thanks to vgersh99 For This Post:
# 7  
Old 03-20-2017
Quote:
Originally Posted by dsid
[..]

@Scrutinizer: true my command did print out numbers more than 23 in length.


Code:
grep  -Eo '\<[0-9]{23}\>' file

the above command should work perfect as square brackets wont be in delimiters always so the awk command wont work in all occasions.

I am kinda confused as to why we are using single quotes in a grep expression. Cos I was reading the other day that single quotes remove any meaning from the special characters. Shouldn't we use double quotes?

Also the fact the you have used <,>. Are they working as a block to extract only 23 digits numbers/characters?
[..]
Hi dsid

The single quotes are better at protecting the regular expression from the shell, than double quotes, so that is why I prefer to use them. When you read that they remove any meaning from the special characters, they meant shell special characters, not regex special characters ...

\< and \> are word boundary operators and match the empty string at the beginning/end of a word respectively..

So if the 23 digits are enclosed by anything other than word characters ( [0-9A-Za-z_], or more precisely: [[:alnum:]_] , including the start or end of a line) then it will match the 23 digits.

Last edited by Scrutinizer; 03-20-2017 at 02:08 PM..
This User Gave Thanks to Scrutinizer For This Post:

Previous Thread | Next Thread
Test Your Knowledge in Computers #104
Difficulty: Easy
Unix is a family of multitasking, portable, multi-user computer operating systems, which do not have time-sharing capability.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Length of a fixed width file

I have a fixed width file of length 53. when is try to get the lengh of the record of that file i get 2 different answers. awk '{print length;exit}' <File_name> The above code gives me length 50. wc -L <File_name> The above code gives me length 53. Please clarify on... (2 Replies)
Discussion started by: Amrutha24
2 Replies

2. UNIX for Dummies Questions & Answers

Fixed length file extracting values in columns

How do I extract values in a few columns in a row of a fixed length file? If there are 8 columns and I need to extract values of 2nd,4th and 6 th columns, how do i do that? I used cut command, this I used only for one column. How do I do it more than one column? The below command will give... (1 Reply)
Discussion started by: princetd001
1 Replies

3. Shell Programming and Scripting

Insert a variable to a text file after fixed number of lines

Hi, I am new to unix. I need to insert a variable which contains some lines of text into a text file after fixed number of lines.. Please help me on this.. Thanks in Advance, Amrutha (3 Replies)
Discussion started by: amr89
3 Replies

4. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

5. Shell Programming and Scripting

changing a variable length text to a fixed length

Hi, Can anyone help with a effective solution ? I need to change a variable length text field (between 1 - 18 characters) to a fixed length text of 18 characters with the unused portion, at the end, filled with spaces. The text field is actually field 10 of a .csv file however I could cut... (7 Replies)
Discussion started by: dc18
7 Replies

6. Shell Programming and Scripting

fixed length text file padding issues in AIX

Hi, I have a fixed length text file that needs to be cut into individual files in aix and facing padding issues. If I have multiple blank spaces in the file it is just making it one while cutting the files.. Eg:- $ - blank space filename:file.txt ... (2 Replies)
Discussion started by: techmoris
2 Replies

7. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

8. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

9. Shell Programming and Scripting

convert XML file into Text file(fixed length)

If someone out there could help me out with this problem. I would really appreciate it. I am trying to convert xml into text file(fixed length) using Unix Borne shell scripts. My xml file: <root> <header_rec recordtype="00"> <record_id>00</record_id> ... (0 Replies)
Discussion started by: ram2s2001
0 Replies

10. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies

Featured Tech Videos