|
cmd sequence to find & cut out a specific string
A developer of mine has this requirement - I couldn't tell her quickly how to do it with UNIX commands or a quick script so she's writing a quick program to do it - but that got my curiousity up and thought I'd ask here for advice.
In a text file, there are some records (about half of them) that have a specific string, say "ABC" followed by a 15 digit number, always at least 2 leading zeros. In rows that have this, it will appear twice, identically.
I essentially want to cut out these 18 chars into a file of their own. But, they are not in a fixed column position within the file.
Logically, the task is:
a) find the rows with ABC00
b) get the position of that first A
c) cut starting at that position for 18 characters and write to a new file.
example data:
ab cdefgABC000000000012345ABC000000000012345sadlfk
abcde fgABC000000000012346ABC000000000012346sadlfk
abc defgghi jklmn1349d5sadlfk
abcdef sldkfdgABC000000000056789ABC000000000056789abcdlkdfj134239d
and so on.
Desired output
ABC00000000012345
ABC00000000012346
ABC00000000056789
Thanks for having a look.
Lisa
|