06-02-2009
hmmm 35Gb?? try awk.. or may be sed.. but it will put considerable load on CPU
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
i have a input file which does not have a delimiter
All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier
Input file
------------
abcd 12345 egfhijk ip 192.168.0.1 CNN.com
abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies
2. Shell Programming and Scripting
hi,
I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated...
sort: Write error while merging.
Thanks (6 Replies)
Discussion started by: greenworld
6 Replies
3. Shell Programming and Scripting
Hi,
I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies
4. Shell Programming and Scripting
Hi Guys,
I have a file as follows:
a b c 1 2 3 4
pp gg gh hh 1 2 fm 3 4
g h i j k l m 1 2 3 4
d e f g h j i k l 1 2 3 f 3 4
r t y u i o p d p re 1 2 3 f 4
t y w e q w r a s p a 1 2 3 4
I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies
5. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
6. Programming
Hi All,
I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies
7. Shell Programming and Scripting
Dear All,
I have two files both containing 10 Million records each separated by comma(csv fmt).
One file is input.txt other is status.txt.
Input.txt-> contains fields with one unique id field (primary key we can say)
Status.txt -> contains two fields only:1. unique id and 2. status
... (8 Replies)
Discussion started by: vguleria
8 Replies
8. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
9. Shell Programming and Scripting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies
10. Shell Programming and Scripting
Hi All!!
I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file.
222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C
222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C
222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
PYP(1) General Commands Manual PYP(1)
NAME
pyp - The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities
SYNOPSIS
pyp [options] files ...
DESCRIPTION
pyp, the Pyed Piper, is a command line tool for text manipulation. It is similar to awk and sed in functionality, but its subcommands are
Python based, and thus more familiar to many programmers.
It can operate both on a per-line base and on the complete input stream. Different features can be pipelined in a single command by using
the pipe character familiar from shell commands.
pyp backs up its input for reruns with modified commands, and can save commands as macros. On the downside, the rerun feature makes it
unsuitable for continuous pipe operation.
OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is
included below. For a complete description, use --manual.
-h, --help
Show this help message and exit.
-m, --manual
Prints out extended help.
-l, --macro_list
Lists all available macros.
-s MACRO_SAVE_NAME, --macro_save=MACRO_SAVE_NAME
Saves current command as macro. use "#" for adding
comments EXAMPLE:
pyp -s "great_macro # prints first letter" "p[1]".
-f MACRO_FIND_NAME, --macro_find=MACRO_FIND_NAME
Searches for macros with keyword or user name.
-d MACRO_DELETE_NAME, --macro_delete=MACRO_DELETE_NAME
Deletes specified public macro.
-g, --macro_group
Specify group macros for save and delete; default is user.
-t TEXT_FILE, --text_file=TEXT_FILE
Specify text file to load. For advanced users,
you should typically cat a file into pyp.
-x, --execute
Execute all commands.
-c, --turn_off_color
Prints raw, uncolored output.
-u, --unmodified_config
Prints out generic PypCustom.py config file.
-b BLANK_INPUTS, --blank_inputs=BLANK_INPUTS
Generate this number of blank input lines; useful for
generating numbered lists with variable 'n'.
-n, --no_input
Use with command that generates output with no input;
same as --dummy_input 1.
-k, --keep_false
Print blank lines for lines that test as False.
default is to filter out False lines from the output.
-r, --rerun
Rerun based on automatically cached data from the last run.
Use this after executing "pyp", pasting input into the shell,
and hitting CTRL-D.
SEE ALSO
awk(1), grep(1), sed(1).
AUTHOR
pyp was written by Toby Rosen <tobyrosen@gmail.com>.
This manual page was written by Khalid El Fathi <khalid@elfathi.fr>, for the Debian project (and may be used by others).
March 19, 2012 PYP(1)