Sponsored Content
Top Forums UNIX for Dummies Questions & Answers A faster equivalent for this sed command Post 302654429 by bobylapointe on Monday 11th of June 2012 11:19:05 PM
Old 06-12-2012
A faster equivalent for this sed command

Hello guys,

I'm cleaning out big XML files (we're talking about 1GB at least), most of them contain words written in a non-latin alphabet.

The command I'm using is so slow it's not even funny:

Code:
cat $1 | sed -e :a -e 's/&lt;[^&gt;]*&gt;//g;/&lt;/N;//ba;s/</ /g;s/>/ /g;s/_//g;s/-//g;s/–//g;s/(//g;s/)//g;s/,//g' | tr " " "\n" | sort | uniq >


I've tried to use tr -d but it breaks my files for some reason... some of my non-latin characters are completely messed up.

Do you guys know to optimize this command to make it a bit faster? Could I use awk to get the exact same result I get with the sed command above?

Thank you very much !

Last edited by Scrutinizer; 06-12-2012 at 12:50 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

command faster in crontab..

Hi all you enlightened unix people, I've been trying to execute a perl script that contains the following line within backticks: `grep -f patternfile.txt otherfile.txt`;It takes normally 2 minutes to execute this command from the bash shell by hand. I noticed that when i run this command... (2 Replies)
Discussion started by: silverlocket
2 Replies

2. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies

3. Shell Programming and Scripting

**HELP** need to split this line faster than cut-command

Hi, A datafile containing lines such as below needs to be split: 500000000000932491683600000000000000000000000000016800000GS0000000000932491683600*HOME I need to get the 2-5, 11-20, and 35-40 characters and I can do it via cut command. cut -c 2-5 file > temp1.txt cut -c 11-20 file >... (9 Replies)
Discussion started by: daytripper1021
9 Replies

4. Shell Programming and Scripting

faster command than find for sorting?

I'm sorting files from a source directory by size into 4 categories then copying them into 4 corresponding folders, just wondering if there's a faster/better/more_elegant way to do this: find /home/user/sourcefiles -type f -size -400000k -exec /bin/cp -uv {} /home/user/medfiles/ \; find... (0 Replies)
Discussion started by: unclecameron
0 Replies

5. HP-UX

Faster command for file copy than cp ?

we have 30 GB files on our filesystem which we need to copy daily to 25 location on the same machine (but different filesystem). cp is taking 20 min to do the copy and we have 5 different thread doing the copy. so in all its taking around 2 hr and we need to reduce it. Is there any... (9 Replies)
Discussion started by: shipra_31
9 Replies

6. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a... (21 Replies)
Discussion started by: chetan.c
21 Replies

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

8. Shell Programming and Scripting

solaris sed equivalent

Hi Experts, I am using this command to edit the file contents and also add the header to the existing file. I prepared this command on my VM (Linux) and it worked as I wanted it to work. But on solaris its not working :(. Please help as it is quite urgent. sample File: a b Output... (5 Replies)
Discussion started by: sugarcane
5 Replies

9. Shell Programming and Scripting

sed Equivalent for awk/grep

Any equivalent command using awk or grep? sed -n "/^$(date --date='10 minutes ago' '+%b %_d %H:%M')/,\$p" /abc.log (7 Replies)
Discussion started by: timmywong
7 Replies

10. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies
PYP(1)							      General Commands Manual							    PYP(1)

NAME
pyp - The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities SYNOPSIS
pyp [options] files ... DESCRIPTION
pyp, the Pyed Piper, is a command line tool for text manipulation. It is similar to awk and sed in functionality, but its subcommands are Python based, and thus more familiar to many programmers. It can operate both on a per-line base and on the complete input stream. Different features can be pipelined in a single command by using the pipe character familiar from shell commands. pyp backs up its input for reruns with modified commands, and can save commands as macros. On the downside, the rerun feature makes it unsuitable for continuous pipe operation. OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, use --manual. -h, --help Show this help message and exit. -m, --manual Prints out extended help. -l, --macro_list Lists all available macros. -s MACRO_SAVE_NAME, --macro_save=MACRO_SAVE_NAME Saves current command as macro. use "#" for adding comments EXAMPLE: pyp -s "great_macro # prints first letter" "p[1]". -f MACRO_FIND_NAME, --macro_find=MACRO_FIND_NAME Searches for macros with keyword or user name. -d MACRO_DELETE_NAME, --macro_delete=MACRO_DELETE_NAME Deletes specified public macro. -g, --macro_group Specify group macros for save and delete; default is user. -t TEXT_FILE, --text_file=TEXT_FILE Specify text file to load. For advanced users, you should typically cat a file into pyp. -x, --execute Execute all commands. -c, --turn_off_color Prints raw, uncolored output. -u, --unmodified_config Prints out generic PypCustom.py config file. -b BLANK_INPUTS, --blank_inputs=BLANK_INPUTS Generate this number of blank input lines; useful for generating numbered lists with variable 'n'. -n, --no_input Use with command that generates output with no input; same as --dummy_input 1. -k, --keep_false Print blank lines for lines that test as False. default is to filter out False lines from the output. -r, --rerun Rerun based on automatically cached data from the last run. Use this after executing "pyp", pasting input into the shell, and hitting CTRL-D. SEE ALSO
awk(1), grep(1), sed(1). AUTHOR
pyp was written by Toby Rosen <tobyrosen@gmail.com>. This manual page was written by Khalid El Fathi <khalid@elfathi.fr>, for the Debian project (and may be used by others). March 19, 2012 PYP(1)
All times are GMT -4. The time now is 06:19 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy