Extracting fixed length number from a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting fixed length number from a text file
# 8  
Old 03-20-2017
sed command posted originally by @andy391791
Can someone please tell me why is the sed command in the below output printing 23 positions from the right to the left if I add more than 23 digits

Code:
$ echo 'Original presentment record for ARN 24013935362551925806644 not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
24013935362551925806644
[dsiddiqui@lxserv01 scripts]$ echo 'Original presentment record for ARN 24013935362551925806644123123 not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
35362551925806644123123
[dsiddiqui@lxserv01 scripts]$

# 9  
Old 03-20-2017
Quote:
Originally Posted by dsid
sed command posted originally by @andy391791
Can someone please tell me why is the sed command in the below output printing 23 positions from the right to the left if I add more than 23 digits

Code:
$ echo 'Original presentment record for ARN 24013935362551925806644 not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
24013935362551925806644
[dsiddiqui@lxserv01 scripts]$ echo 'Original presentment record for ARN 24013935362551925806644123123 not found'|sed -n 's/.*\([0-9]\{23\}\).*/\1/p'
35362551925806644123123
[dsiddiqui@lxserv01 scripts]$

this is because the leading .* is greedy - it "chows" as many characters as possible (including numbers) and then "chows up" 23 numbers and then everything else till the end of the record/line (that's the trailing .*)

To overcome this let's make .* greedy, but anchored by the trailing space:
Code:
echo 'Original presentment record for ARN 11124013935362551925806644 not found'|sed -n 's/.* \([0-9]\{23,\}\).*/\1/p


Last edited by vgersh99; 03-20-2017 at 02:57 PM..
This User Gave Thanks to vgersh99 For This Post:
# 10  
Old 03-20-2017
Quote:
Originally Posted by dsid

[...]
Code:
CASE ID: 20170218881083  
Original presentment record for ARN  [24013935350549886999873] not found
for Re-presentment

I want to extract the 23 digit number from this file. I thought of using grep but initially couldn't extract the required number. However, after googling, I found out the usage of
Code:
 -P, --perl-regexp: Interpret PATTERN as a Perl regular expression.

and
Code:
-o, --only-matching: Show only the part of a matching line that matches PATTERN.

[...]
I suggest to use Perl instead which is what that support was named after.
Code:
perl -nle '/\[(\d+)\]/ and print $1' dsid.file
24013935350549886999873

Code:
perl # Perl binary.
-n # loop through the lines of the file dsid.file
-l  # deal with newlines.
-e # execute what follows as Perl code.
/\[(\d+)\]/  # capture any amount of digits as long as there are inside opening and closing brackets.
and print $1 # if a capture was successful in the line, display what it was captured.

# 11  
Old 03-20-2017
To prevent consumption of leading digits you can exclude them from the leading pattern
Code:
sed -n 's/[^0-9]*\([0-9]\{23\}\).*/\1/p'

This User Gave Thanks to MadeInGermany For This Post:
# 12  
Old 03-21-2017
Quote:
Originally Posted by Aia
I suggest to use Perl instead which is what that support was named after.
Code:
perl -nle '/\[(\d+)\]/ and print $1' dsid.file
24013935350549886999873

Code:
perl # Perl binary.
-n # loop through the lines of the file dsid.file
-l  # deal with newlines.
-e # execute what follows as Perl code.
/\[(\d+)\]/  # capture any amount of digits as long as there are inside opening and closing brackets.
and print $1 # if a capture was successful in the line, display what it was captured.

I tweaked your code a little as it may happen that I get a text file from which I have to extract 23 digits and these digits can be surrounded by alphanumeric characters and the fact that I am only looking for 23 digits

Code:
$ cat ARNs.txt
AD. 16.03.

[adfasdfasdfa82401393536255192580664asdfjkadhfa]

CASE ID: 20170218881083
Original presentment record for ARN  [24013935350549886999873] not found
for Re-presentment

CASE ID: 20170218881444
Original presentment record for ARN  [24013935361551920891659] not found
for Re-presentment

CASE ID: 20170218881447
Original presentment record for ARN  [24013935356550908226927] not found
for Re-presentment

CASE ID: 20170221894303
Original presentment record for ARN  [24013936003600942122783] not found

CASE ID: 20170221894378
Original presentment record for ARN  [24013935362551925806644] not found
for Re-presentment

tweaked code

Code:
$perl -nle '/(\d{23})/ and print $1' ARNs.txt

can you please suggest your comments on the new code
# 13  
Old 03-21-2017
Also with perl you could add boundary operators :
Code:
perl -nle '/\b(\d{23})\b/ and print $1'

Note that this code differs from other approaches, in the sense that it only prints the first occurrence on the line..
This User Gave Thanks to Scrutinizer For This Post:
# 14  
Old 03-21-2017
Quote:
Originally Posted by Scrutinizer
Also with perl you could add boundary operators :
Code:
perl -nle '/\b(\d{23})\b/ and print $1'

Note that this code differs from other approaches, in the sense that it only prints the first occurrence on the line..
true, got your point. tried it out a sample file by adding a new 23 digit number next to an already present one and does not print the new number but the earlier code does, thanks for pointing that out.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Length of a fixed width file

I have a fixed width file of length 53. when is try to get the lengh of the record of that file i get 2 different answers. awk '{print length;exit}' <File_name> The above code gives me length 50. wc -L <File_name> The above code gives me length 53. Please clarify on... (2 Replies)
Discussion started by: Amrutha24
2 Replies

2. UNIX for Dummies Questions & Answers

Fixed length file extracting values in columns

How do I extract values in a few columns in a row of a fixed length file? If there are 8 columns and I need to extract values of 2nd,4th and 6 th columns, how do i do that? I used cut command, this I used only for one column. How do I do it more than one column? The below command will give... (1 Reply)
Discussion started by: princetd001
1 Replies

3. Shell Programming and Scripting

Insert a variable to a text file after fixed number of lines

Hi, I am new to unix. I need to insert a variable which contains some lines of text into a text file after fixed number of lines.. Please help me on this.. Thanks in Advance, Amrutha (3 Replies)
Discussion started by: amr89
3 Replies

4. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

5. Shell Programming and Scripting

changing a variable length text to a fixed length

Hi, Can anyone help with a effective solution ? I need to change a variable length text field (between 1 - 18 characters) to a fixed length text of 18 characters with the unused portion, at the end, filled with spaces. The text field is actually field 10 of a .csv file however I could cut... (7 Replies)
Discussion started by: dc18
7 Replies

6. Shell Programming and Scripting

fixed length text file padding issues in AIX

Hi, I have a fixed length text file that needs to be cut into individual files in aix and facing padding issues. If I have multiple blank spaces in the file it is just making it one while cutting the files.. Eg:- $ - blank space filename:file.txt ... (2 Replies)
Discussion started by: techmoris
2 Replies

7. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

8. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

9. Shell Programming and Scripting

convert XML file into Text file(fixed length)

If someone out there could help me out with this problem. I would really appreciate it. I am trying to convert xml into text file(fixed length) using Unix Borne shell scripts. My xml file: <root> <header_rec recordtype="00"> <record_id>00</record_id> ... (0 Replies)
Discussion started by: ram2s2001
0 Replies

10. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies
Login or Register to Ask a Question