10-20-2008
cmd sequence to find & cut out a specific string
A developer of mine has this requirement - I couldn't tell her quickly how to do it with UNIX commands or a quick script so she's writing a quick program to do it - but that got my curiousity up and thought I'd ask here for advice.
In a text file, there are some records (about half of them) that have a specific string, say "ABC" followed by a 15 digit number, always at least 2 leading zeros. In rows that have this, it will appear twice, identically.
I essentially want to cut out these 18 chars into a file of their own. But, they are not in a fixed column position within the file.
Logically, the task is:
a) find the rows with ABC00
b) get the position of that first A
c) cut starting at that position for 18 characters and write to a new file.
example data:
ab cdefgABC000000000012345ABC000000000012345sadlfk
abcde fgABC000000000012346ABC000000000012346sadlfk
abc defgghi jklmn1349d5sadlfk
abcdef sldkfdgABC000000000056789ABC000000000056789abcdlkdfj134239d
and so on.
Desired output
ABC00000000012345
ABC00000000012346
ABC00000000056789
Thanks for having a look.
Lisa
8 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
find . -type f -name "*.sql" -print|xargs perl -i -pe 's/pattern/replaced/g'
this is simple logic to find and replace in multiple files & folders
Hope this helps.
Thanks
Zaheer (0 Replies)
Discussion started by: Zaheer.mic
0 Replies
2. Shell Programming and Scripting
Hi,
I have following samp.txt file in unix.
samp.txt
01Roy2D3M000000
02Rad2D3M222222
.
.
.
.
10Mik0A2M343443
Desired Output
01Roy2A3M000000
02Rad2A3M222222
.
. (5 Replies)
Discussion started by: techmoris
5 Replies
3. Shell Programming and Scripting
i have list in file named sample.txt
eg
i want to cut the 3rd and 4th character i.e. 01,02,03....,24(max length is 24)
and i want to find the missing sequence .and display them
i.e. (15 Replies)
Discussion started by: sagar_1986
15 Replies
4. Shell Programming and Scripting
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies
5. Shell Programming and Scripting
Hi All-
We have a file data as below with delimiter as |#|
10|#|20|#|ABC
13|#|23|#|PBC
If I want to cut the 2nd field out of this, below command is not working as multiple pipe is causing an issue , it seems
cut -f2 -d"|#|" <file_name>
can you please help to provide the correct command... (7 Replies)
Discussion started by: sureshg_sampat
7 Replies
6. Shell Programming and Scripting
I have a file with some SQL query, I want to fetch only Table Name from that file line by line.
INPUT FILE
SELECT * FROM $SCHM.TABLENAME1;
ALTER TABLE $SCHM.TABLENAME1 ADD DateOfBirth date;
INSERT INTO $SCHM.TABLENAME1 (CustomerName, Country) SELECT SupplierName, Country FROM $SCHM.TABLENAME2... (2 Replies)
Discussion started by: Pratik Majithia
2 Replies
7. Shell Programming and Scripting
I need assistance with following requirement, I am new to Unix.
I want to do the following task but stuck with file creation date(sysdate)
Following is the requirement
I need to create a script that will read the abc/xyz/klm folder and look for *.err files for that day’s date and then send an... (4 Replies)
Discussion started by: PreetArul
4 Replies
8. UNIX for Beginners Questions & Answers
I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position,
AAGCZ-N16-AAGCZ
Z represents A, C or G (Except T)
N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
LEARN ABOUT LINUX
strverscmp
STRVERSCMP(3) Linux Programmer's Manual STRVERSCMP(3)
NAME
strverscmp - compare two version strings
SYNOPSIS
#define _GNU_SOURCE
#include <string.h>
int strverscmp(const char *s1, const char *s2);
DESCRIPTION
Often one has files jan1, jan2, ..., jan9, jan10, ... and it feels wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9. In
order to rectify this, GNU introduced the -v option to ls(1), which is implemented using versionsort(3), which again uses strverscmp().
Thus, the task of strverscmp() is to compare two strings and find the "right" order, while strcmp(3) only finds the lexicographic order.
This function does not use the locale category LC_COLLATE, so is meant mostly for situations where the strings are expected to be in ASCII.
What this function does is the following. If both strings are equal, return 0. Otherwise find the position between two bytes with the
property that before it both strings are equal, while directly after it there is a difference. Find the largest consecutive digit strings
containing (or starting at, or ending at) this position. If one or both of these is empty, then return what strcmp(3) would have returned
(numerical ordering of byte values). Otherwise, compare both digit strings numerically, where digit strings with one or more leading zeros
are interpreted as if they have a decimal point in front (so that in particular digit strings with more leading zeros come before digit
strings with fewer leading zeros). Thus, the ordering is 000, 00, 01, 010, 09, 0, 1, 9, 10.
RETURN VALUE
The strverscmp() function returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be earlier than,
equal to, or later than s2.
CONFORMING TO
This function is a GNU extension.
SEE ALSO
rename(1), strcasecmp(3), strcmp(3), strcoll(3), feature_test_macros(7)
COLOPHON
This page is part of release 3.27 of the Linux man-pages project. A description of the project, and information about reporting bugs, can
be found at http://www.kernel.org/doc/man-pages/.
GNU
2001-12-19 STRVERSCMP(3)