Sponsored Content
Top Forums Shell Programming and Scripting Isolate and Extract a Pattern Substring (Digits Only) Post 302299847 by netfreighter on Saturday 21st of March 2009 05:42:33 PM
Old 03-21-2009
Isolate and Extract a Pattern Substring (Digits Only)

Hi guys,

I have a text file report generated from egrepping multiple files.
The text files themselves are obtianed after many succesive refinements, so they contain already the desired number, but this is surrounded by unwanted characters, newlines, spaces, it is not always at the start of the line, as can be seen in sample below:
$ egrep [0-7]\{7}
dte--0072.txt:1596223
dte--0073.txt:1560379
dte--0075.txt:!!! !!�!!!!!! !!�! !�� !!!!!!!!!! !!?! 1623749
dte--0076.txt:1596014
dte--0077.txt: 1791213
dte--0078.txt: 1767933
dte--0079.txt:_____1777023

What I need to generate is a clean report that looks like this:
desired clean report
dte--0072.txt:1596223
dte--0073.txt:1560379
dte--0075.txt:1623749
dte--0076.txt:1596014
dte--0077.txt:1791213
dte--0078.txt:1767933
dte--0079.txt:1777023

How can I do this? I am too new to regex, so I was hoping maybe someone can help with negating the expression, or a sed oneliner.

Note: The string is always of the same pattern:
- digits only
- the same number of digits (in this report a 7digit number)
- there are no spaces or any other signs between the digit pattern, it is like 1234567
- only need the digits (nothing before or after the number pattern)
- would prefer to operate the command directly on the multiple files, as in the egrep, so the report file is already preserving the filenames on the same line with the contained number string


Can you please help?

Thanks!

Last edited by netfreighter; 03-22-2009 at 04:58 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to pattern match on digits and then increment?

I have a log file that ends in a ".xxx" where xxx are digits but I don't necessarily know what digits they are. The log file rotates automatically and is auto-incrementing - starting at .001. So the example would be: file-name.005 If the file ends in .005 and the log rotates, it logically... (2 Replies)
Discussion started by: sdutto01
2 Replies

2. Shell Programming and Scripting

Extract digits at end of string

I have a string like xxxxxx44. What's the best way to extract the digits (one or more) in a ksh script? Thanks (6 Replies)
Discussion started by: offirc
6 Replies

3. Shell Programming and Scripting

Need Help... to extract the substring

> tnsping $TWO_TASK | grep HOST Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.12.10.212)(PORT = 1540)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = OMTST15))) I want to extract like this HOST = 10.12.10.212 PORT = 1540 SERVICE_NAME = OMTST15 I... (4 Replies)
Discussion started by: dashok.83
4 Replies

4. Shell Programming and Scripting

Extract a substring.

I have a shell script that uses wget to grab a bunch of html from a url. URL_DATA=`wget -qO - "$URL1"` I now have a string $URL_DATA that I need to pull a substring out of..say I had the following in my string <p><a href="/scooby/929011567.html">Dog pictures check them out! -</a><font... (3 Replies)
Discussion started by: shellpower
3 Replies

5. UNIX for Dummies Questions & Answers

sed to isolate file paths separated by a pattern

Hi, I've been searching for a quick way to do this with sed, but to no avail. I have a file containing a long series of (windows) file paths that are separated by the pattern '@'. I would like to extract each file path so that I can later assign a variable to each path. Here is the file:... (2 Replies)
Discussion started by: nixjennings
2 Replies

6. Shell Programming and Scripting

extract digits from a string in unix

Hi all, i have such string stored in a variable var1 = 00000120 i want the o/p var1 = 120 is it possible to have such o/p in ksh/bash ... thanx in advance for the help sonu (3 Replies)
Discussion started by: sonu_pal
3 Replies

7. UNIX for Advanced & Expert Users

Regex pattern for multiple digits

Hello, I need to construct a pattern to match the below string (especially the timestamp at the beginning) 20101222100436_temp.dat The below pattern works _temp.dat However I am trying find if there are any other better representations. I tried {14}, but it did not work. I am on... (5 Replies)
Discussion started by: krishmaths
5 Replies

8. Shell Programming and Scripting

awk extract certain digits from file with index substr

I would like to extract a digit from $0 starting 2,30 to 3,99 or 2.30 to 3.99 Can somebody fix this? awk --re-interval '{if($0 ~ /{1}{2}/) {print FILENAME, substr($0,index($0,/{1}{2}/) , 4)}}'input abcdefg sdlfkj 3,29 g. lasdfj alsdfjasl 2.86 gr. slkjds sldkd lskdjfsl sdfkj kdjlksj 3,34 g... (4 Replies)
Discussion started by: sdf
4 Replies

9. Shell Programming and Scripting

Extract n-digits from string in perl

Hello, I have a log file with logs such as 01/05/2017 10:23:41 : file.log.38: database error, MODE=SINGLE, LEVEL=critical, STATE: 01170255 (mode main how can i use perl to extract the 8-digit number below from the string 01170255 Thanks (7 Replies)
Discussion started by: james2009
7 Replies

10. Shell Programming and Scripting

How can I extract digits at the end of a string in UNIX shell scripting?

How can I extract digits at the end of a string in UNIX shell scripting or perl? cat file.txt abc_d123_4567.txt A246_B789.txt B123cc099.txt a123_B234-012.txt a13.txt What can I do here? Many thanks. cat file.txt | sed "s/.txt$//" | ........ 4567 789 099 012 13 (11 Replies)
Discussion started by: mingch
11 Replies
gensprep(8)							 ICU 50.1.2 Manual						       gensprep(8)

NAME
gensprep - compile StringPrep data from files filtered by filterRFC3454.pl SYNOPSIS
gensprep [ -h, -?, --help ] [ -v, --verbose ] [ -c, --copyright ] [ -s, --sourcedir source ] [ -d, --destdir destination ] DESCRIPTION
gensprep reads filtered RFC 3454 files and compiles their information into a binary form. The resulting file, <name>.icu, can then be read directly by ICU, or used by pkgdata(8) for incorporation into a larger archive or library. The files read by gensprep are described in the FILES section. OPTIONS
-h, -?, --help Print help about usage and exit. -v, --verbose Display extra informative messages during execution. -c, --copyright Include a copyright notice into the binary data. -s, --sourcedir source Set the source directory to source. The default source directory is specified by the environment variable ICU_DATA. -d, --destdir destination Set the destination directory to destination. The default destination directory is specified by the environment variable ICU_DATA. ENVIRONMENT
ICU_DATA Specifies the directory containing ICU data. Defaults to /usr/share/icu/50.1.2/. Some tools in ICU depend on the presence of the trailing slash. It is thus important to make sure that it is present if ICU_DATA is set. FILES
The following files are read by gensprep and are looked for in the source /misc for rfc3454_*.txt files and in source /unidata for Normal- izationCorrections.txt. rfc3453_A_1.txt Contains the list of unassigned codepoints in Unicode version 3.2.0.... rfc3454_B_1.txt Contains the list of code points that are commonly mapped to nothing.... rfc3454_B_2.txt Contains the list of mappings for casefolding of code points when Normalization form NFKC is specified.... rfc3454_C_X.txt Contains the list of code points that are prohibited for IDNA. NormalizationCorrections.txt Contains the list of code points whose normalization has changed since Unicode Version 3.2.0. VERSION
50.1.2 COPYRIGHT
Copyright (C) 2000-2002 IBM, Inc. and others. SEE ALSO
pkgdata(8) ICU MANPAGE
18 March 2003 gensprep(8)
All times are GMT -4. The time now is 09:00 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy