Sponsored Content
Top Forums Shell Programming and Scripting Isolate and Extract a Pattern Substring (Digits Only) Post 302303715 by netfreighter on Friday 3rd of April 2009 10:20:05 AM
Old 04-03-2009
SED command or REGEX to extract only the number from a textfile

Hello and thank you both for the answers.
I apologise for the delay in my answer but been overwhelmed with tasks lately. It is now time to turn to this project and finish off.

I have tried both commands, but they only partially remove the unwanted characters.
I also discovered that they are not question marks, those are just replacement characters because the console terminal does not have enough characters to display the actual signs.

So, first I run
Code:
$  egrep [0-9]\{9\} *.txt > repfile

then
Code:
$ sed 's/\(.*txt:\)[^0-9]*\([0-9]\{7\}\).*/\1\2/' repfile


and the output file is showing matching lines that are only cleaned BEFORE the number, while after the number trailing characters remain:
repfile:dte--0055.txt:?! !?!!!!?! !!?! !!?! !!?!! !!!!!!!!! 001431616
repfile:dte--0056.txt:?? !?!!!!?! !???!!!?! 001548532______
repfile:dte--0057.txt:0015817
repfile:dte--0058.txt:!!!! ??!?? !!?! !!? )??? !!!!!!?! 001438615
repfile:dte--0059.txt:0016327
repfile:dte--0060.txt:!)!> !?!!!!?? ??!? !!!! ??! ??!? !?! 001467161

I opened the file in TextEditapp in MacOSX and I see strange characters like
"ª ´!!! ´ï)!ª´´´´(ª ´? ´!´ ______ " (may not show correctly but it's symbols that look like superscript and foreign letters)

Since these nnumbers are going to be extracted from multilingual files, non-English writing, thenn it is hard to predict what symbols are to be encountered, so to cleanup the number I gues I would need a SED command or REGEX to extract only the number from a textfile.

Some command that negates "anything else that is NOT a 9-digit number to be removed"

I have a feeling that egrep or a regular rexpression could do that, but do not know where to look.
Maybe some character classes like [punct] can be used? Smilie

What the final result should look like:
repfile:0057.txt:001581743
repfile:0058.txt:001438615
repfile:0059.txt:001632790
repfile:0060.txt:001467161

Where that number is the only such number available, one per each textfile.

Thanks!

Last edited by netfreighter; 04-03-2009 at 11:31 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to pattern match on digits and then increment?

I have a log file that ends in a ".xxx" where xxx are digits but I don't necessarily know what digits they are. The log file rotates automatically and is auto-incrementing - starting at .001. So the example would be: file-name.005 If the file ends in .005 and the log rotates, it logically... (2 Replies)
Discussion started by: sdutto01
2 Replies

2. Shell Programming and Scripting

Extract digits at end of string

I have a string like xxxxxx44. What's the best way to extract the digits (one or more) in a ksh script? Thanks (6 Replies)
Discussion started by: offirc
6 Replies

3. Shell Programming and Scripting

Need Help... to extract the substring

> tnsping $TWO_TASK | grep HOST Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.12.10.212)(PORT = 1540)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = OMTST15))) I want to extract like this HOST = 10.12.10.212 PORT = 1540 SERVICE_NAME = OMTST15 I... (4 Replies)
Discussion started by: dashok.83
4 Replies

4. Shell Programming and Scripting

Extract a substring.

I have a shell script that uses wget to grab a bunch of html from a url. URL_DATA=`wget -qO - "$URL1"` I now have a string $URL_DATA that I need to pull a substring out of..say I had the following in my string <p><a href="/scooby/929011567.html">Dog pictures check them out! -</a><font... (3 Replies)
Discussion started by: shellpower
3 Replies

5. UNIX for Dummies Questions & Answers

sed to isolate file paths separated by a pattern

Hi, I've been searching for a quick way to do this with sed, but to no avail. I have a file containing a long series of (windows) file paths that are separated by the pattern '@'. I would like to extract each file path so that I can later assign a variable to each path. Here is the file:... (2 Replies)
Discussion started by: nixjennings
2 Replies

6. Shell Programming and Scripting

extract digits from a string in unix

Hi all, i have such string stored in a variable var1 = 00000120 i want the o/p var1 = 120 is it possible to have such o/p in ksh/bash ... thanx in advance for the help sonu (3 Replies)
Discussion started by: sonu_pal
3 Replies

7. UNIX for Advanced & Expert Users

Regex pattern for multiple digits

Hello, I need to construct a pattern to match the below string (especially the timestamp at the beginning) 20101222100436_temp.dat The below pattern works _temp.dat However I am trying find if there are any other better representations. I tried {14}, but it did not work. I am on... (5 Replies)
Discussion started by: krishmaths
5 Replies

8. Shell Programming and Scripting

awk extract certain digits from file with index substr

I would like to extract a digit from $0 starting 2,30 to 3,99 or 2.30 to 3.99 Can somebody fix this? awk --re-interval '{if($0 ~ /{1}{2}/) {print FILENAME, substr($0,index($0,/{1}{2}/) , 4)}}'input abcdefg sdlfkj 3,29 g. lasdfj alsdfjasl 2.86 gr. slkjds sldkd lskdjfsl sdfkj kdjlksj 3,34 g... (4 Replies)
Discussion started by: sdf
4 Replies

9. Shell Programming and Scripting

Extract n-digits from string in perl

Hello, I have a log file with logs such as 01/05/2017 10:23:41 : file.log.38: database error, MODE=SINGLE, LEVEL=critical, STATE: 01170255 (mode main how can i use perl to extract the 8-digit number below from the string 01170255 Thanks (7 Replies)
Discussion started by: james2009
7 Replies

10. Shell Programming and Scripting

How can I extract digits at the end of a string in UNIX shell scripting?

How can I extract digits at the end of a string in UNIX shell scripting or perl? cat file.txt abc_d123_4567.txt A246_B789.txt B123cc099.txt a123_B234-012.txt a13.txt What can I do here? Many thanks. cat file.txt | sed "s/.txt$//" | ........ 4567 789 099 012 13 (11 Replies)
Discussion started by: mingch
11 Replies
ZIPGREP(1L)															       ZIPGREP(1L)

NAME
zipgrep - search files in a ZIP archive for lines matching a pattern SYNOPSIS
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...] DESCRIPTION
zipgrep will search files within a ZIP archive for lines matching the given string or pattern. zipgrep is a shell script and requires egrep(1) and unzip(1L) to function. Its output is identical to that of egrep(1). ARGUMENTS
pattern The pattern to be located within a ZIP archive. Any string or regular expression accepted by egrep(1) may be used. file[.zip] Path of the ZIP archive. (Wildcard expressions for the ZIP archive name are not supported.) If the literal filename is not found, the suffix .zip is appended. Note that self-extracting ZIP files are supported, as with any other ZIP archive; just specify the .exe suffix (if any) explicitly. [file(s)] An optional list of archive members to be processed, separated by spaces. If no member files are specified, all members of the ZIP archive are searched. Regular expressions (wildcards) may be used to match multiple members: * matches a sequence of 0 or more characters ? matches exactly 1 character [...] matches any single character found inside the brackets; ranges are specified by a beginning character, a hyphen, and an end- ing character. If an exclamation point or a caret (`!' or `^') follows the left bracket, then the range of characters within the brackets is complemented (that is, anything except the characters inside the brackets is considered a match). (Be sure to quote any character that might otherwise be interpreted or modified by the operating system.) [-x xfile(s)] An optional list of archive members to be excluded from processing. Since wildcard characters match directory separators (`/'), this option may be used to exclude any files that are in subdirectories. For example, ``zipgrep grumpy foo *.[ch] -x */*'' would search for the string ``grumpy'' in all C source files in the main directory of the ``foo'' archive, but none in any subdirectories. Without the -x option, all C source files in all directories within the zipfile would be searched. OPTIONS
All options prior to the ZIP archive filename are passed to egrep(1). SEE ALSO
egrep(1), unzip(1L), zip(1L), funzip(1L), zipcloak(1L), zipinfo(1L), zipnote(1L), zipsplit(1L) URL
The Info-ZIP home page is currently at http://www.info-zip.org/pub/infozip/ or ftp://ftp.info-zip.org/pub/infozip/ . AUTHORS
zipgrep was written by Jean-loup Gailly. Info-ZIP 20 April 2009 ZIPGREP(1L)
All times are GMT -4. The time now is 07:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy