Remove characters from text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove characters from text
# 1  
Old 08-30-2011
Remove characters from text

I have a file which looks like this. I only show first 11 lines of the file followed by some text that appears at the end of every file.

Code:
1. file:///path1/path2/path3/path4/251192.dat (score 3.849384, docid 142923)
2. file:///path1/path2/path3/path4/173859.dat (score 3.831033, docid 75365)
3. file:///path1/path2/path3/path4/120815.dat (score 3.770058, docid 19773)
4. file:///path1/path2/path3/path4/178982.dat (score 3.643835, docid 80750)
5. file:///path1/path2/path3/path4/188878.dat (score 3.599508, docid 91201)
6. file:///path1/path2/path3/path4/160844.dat (score 3.582469, docid 61641)
7. file:///path1/path2/path3/path4/153125.dat (score 3.581480, docid 53507)
8. file:///path1/path2/path3/path4/209083.dat (score 3.576286, docid 108580)
9. file:///path1/path2/path3/path4/189989.dat (score 3.556764, docid 92369)
10. file:///path1/path2/path3/path4/169844.dat (score 3.528978, docid 71137)
11. file:///path1/path2/path3/path4/157264.dat (score 3.505972, docid 57873)

2000 people took part and  of 144690 shown (took 0.021768 time)
21813 people included (excluding loading/unloading)


I want my output to be like this:

Code:
/path1/path2/path3/path4/251192.dat 3.849384
/path1/path2/path3/path4/173859.dat 3.831033
/path1/path2/path3/path4/120815.dat 3.770058
/path1/path2/path3/path4/178982.dat 3.643835
/path1/path2/path3/path4/188878.dat 3.599508
/path1/path2/path3/path4/160844.dat 3.582469
/path1/path2/path3/path4/153125.dat 3.581480
/path1/path2/path3/path4/209083.dat 3.57628
/path1/path2/path3/path4/189989.dat 3.556764
/path1/path2/path3/path4/169844.dat 3.528978
/path1/path2/path3/path4/157264.dat 3.505972

This is what I have tried:

sed 's/^\s*[0-9]\ file:////' FILENAME

I think this will remove the number and "file://" characters from the file but not sure how to chop off the other part of the text.
I am using Linux with BASH.
# 2  
Old 08-30-2011
Code:
sed -n '/file:/s/^.*file:\/\/\(.*dat\).*(score \(.*\),.*/\1 \2/p' FILENAME

This User Gave Thanks to sulti For This Post:
# 3  
Old 08-30-2011
Code:
$ ruby -ne 'print $_.gsub(/.*file:\/\/\/|\(score|,\s+docid.*$/,"") if /^\d+\./' file
path1/path2/path3/path4/251192.dat  3.849384
path1/path2/path3/path4/173859.dat  3.831033
path1/path2/path3/path4/120815.dat  3.770058
path1/path2/path3/path4/178982.dat  3.643835
path1/path2/path3/path4/188878.dat  3.599508
path1/path2/path3/path4/160844.dat  3.582469
path1/path2/path3/path4/153125.dat  3.581480
path1/path2/path3/path4/209083.dat  3.576286
path1/path2/path3/path4/189989.dat  3.556764
path1/path2/path3/path4/169844.dat  3.528978
path1/path2/path3/path4/157264.dat  3.505972

This User Gave Thanks to kurumi For This Post:
# 4  
Old 08-30-2011
with multiple commands Smilie
Code:
$ grep "^[0-9]\." inputfile | nawk -F"[/,]" ' { print "/"$4"/"$5"/"$6"/"$7"/"$8 } '| sed 's,(score ,,g'

This User Gave Thanks to jayan_jay For This Post:
# 5  
Old 08-30-2011
Might as well include an example in awk....


Code:
awk ' { gsub("file://","",$2) ; printf("%s %g\n",$2,$4)} ' filename

This User Gave Thanks to wabard For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove first 2 characters and last two characters of each line

here's what im trying to do. i have a file containing lines similar to this: data.txt: 1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU 1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies

2. Shell Programming and Scripting

Remove Special Characters Within Text

Hi, I have a "|" delimited file that is exported from a database. There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text. Hence, when i export the data, this record with the new... (4 Replies)
Discussion started by: tarun.trehan
4 Replies

3. Shell Programming and Scripting

Remove all junk characters from a text file

I am using flatfile, in that flat file we are getting the junk chars 1)I21001f<82>^Me<85>!h49 Service Charge 2) I21001f‚ e...!h49 Service Charge please tell me how to remove all junk chars in unix scripts. (1 Reply)
Discussion started by: Talari
1 Replies

4. UNIX for Dummies Questions & Answers

How do I remove ^M characters with VI

I have a file with all kinds of ^M at the end of each line. How the heck can these be removed? I tried a global search and replace, but it doesn't seem to work. Thanks! (8 Replies)
Discussion started by: HmmBerger
8 Replies

5. Shell Programming and Scripting

Remove special characters from text file

Hi All, i am trying to remove all special charecters().,/\~!@#%^$*&^_- and others from a tab delimited file. I am using the following code. while read LINE do echo $LINE | tr -d '=;:`"<>,./?!@#$%^&(){}'|tr -d "-"|tr -d "'" | tr -d "_" done < trial.txt > output.txt Problem ... (10 Replies)
Discussion started by: kkb
10 Replies

6. Shell Programming and Scripting

remove last 4 characters from a string

I'm tring to remove the last 4 characters from strings in a file i.e. cat /tmp/test iwishicouldremovethis icouldremovethos so i would end up with the last 4 characters from each of the above i.e. this thos I thought of using cut -c ... but I'm not sure how many characters will... (7 Replies)
Discussion started by: josslate
7 Replies

7. Shell Programming and Scripting

sed to remove 1st two characters every line of text file

what is the sed command to remove the first two characters of every line of a text file? each line of the text file has the same amount of characters, and they are ALL NUMERIC. there are hundreds of lines though. for example, >cat file1.txt 10081551 10081599 10082234 10082259 20081134... (20 Replies)
Discussion started by: ajp7701
20 Replies

8. UNIX for Advanced & Expert Users

remove characters

hi i have a file with these strings: 123_abc_X1116990 how to get rid of 123_abc_ and keep only X1116990? I have columns of these: 123_abc_X1134640 123_dfg_X1100237 123_tyu_X1103112 123_tyui_X1116990 thx (5 Replies)
Discussion started by: melanie_pfefer
5 Replies

9. Shell Programming and Scripting

remove special characters from text using PERL

Hi, I am stuck with a problem here. Suppose i have a variable which is assigned some string containing special charatcers. for eg: $a="abcdef^bbwk#kdbcd@"; I have to remove the special characters using Perl. The text is assigned to the variable implicitly. How to do it? (1 Reply)
Discussion started by: agarwal
1 Replies

10. UNIX for Dummies Questions & Answers

How to remove Characters before '~'

Hi, I am having a file which contains records as follows: DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131 DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131 DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131... (4 Replies)
Discussion started by: Amey Joshi
4 Replies
Login or Register to Ask a Question