I like the shell solution. It can become a little more I/O efficient (matters when the output files are written to a network file system):
The input file must be sorted on col1 (otherwise: remove previous output files and append with exec 3>>"$col1".txt)
Last edited by MadeInGermany; 08-24-2016 at 04:42 PM..
Reason: Removed comment about awk - close() releases the file descriptors!
This User Gave Thanks to MadeInGermany For This Post:
I have one large file, after every 200 line i have to split the file and the add header and footer to each small file?
It is possible to add different header and footer to each file? (1 Reply)
This may sound like a trivial problem, but I still need some help:
I have a file with ids and I want to split it 'n' ways (could be any number) into files:
1
1
1
2
2
3
3
4
5
5
Let's assume 'n' is 3, and we cannot have the same id in two different partitions. So the partitions may... (8 Replies)
Hi All,
I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like
12345- , i tried with egrep ,eg : egrep "+" text.txt
but which returns all the lines which contains any number of digits followed by hyhen ,... (19 Replies)
I have an extremely large csv file that I need to search the second field, and upon matches update the last field...
I can pull the line with awk.. but apparently you cant use awk to directly update the file? So im curious if I can use sed to do this... The good news is the field I want to... (5 Replies)
I have a large directory of web pages. I am doing a search through the web pages using grep and would like to get a list of unique file names of search results. The following command works fine to give me a list of file names where term appears:
grep -l term *.html
However, since these are... (3 Replies)
Hi,
I have a data file xyz.dat similar to the one given below,
2345|98|809||x|969|0
2345|98|809||y|0|537
2345|97|809||x|544|0
2345|97|809||y|0|651
9685|98|809||x|321|0
9685|98|809||y|0|357
9685|98|709||x|687|0
9685|98|709||y|0|234
2315|98|809||x|564|0
2315|98|809||y|0|537... (2 Replies)
Hi all,
I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example:
... (2 Replies)
I have 84 files with the following names splitseqs.1, spliseqs.2 etc.
and I want to change the .number to a unique filename.
E.g.
change splitseqs.1 into splitseqs.7114_1#24
and
change spliseqs.2 into splitseqs.7067_2#4
So all the current file names are unique, so are the new file names.... (1 Reply)
Hi All,
I have a very large single record file.
abc;date||bcd;efg|......... pqr;stu||record_count;date
when i do wc -l on this file it gives me "0" records, coz of missing line feed.
my problem is there is an extra pipe that is coming at the end of this record
like... (6 Replies)
Hi,
Anyone can help, I have a large textfile (one file), and I need to split into multiple file to break each file into ^L.
My textfile
==========
abc company
abc address
abc contact
^L
my company
my address
my contact
my skills
^L
your company
your address
========== (3 Replies)
Discussion started by: fspalero
3 Replies
LEARN ABOUT DEBIAN
pyp
PYP(1) General Commands Manual PYP(1)NAME
pyp - The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities
SYNOPSIS
pyp [options] files ...
DESCRIPTION
pyp, the Pyed Piper, is a command line tool for text manipulation. It is similar to awk and sed in functionality, but its subcommands are
Python based, and thus more familiar to many programmers.
It can operate both on a per-line base and on the complete input stream. Different features can be pipelined in a single command by using
the pipe character familiar from shell commands.
pyp backs up its input for reruns with modified commands, and can save commands as macros. On the downside, the rerun feature makes it
unsuitable for continuous pipe operation.
OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is
included below. For a complete description, use --manual.
-h, --help
Show this help message and exit.
-m, --manual
Prints out extended help.
-l, --macro_list
Lists all available macros.
-s MACRO_SAVE_NAME, --macro_save=MACRO_SAVE_NAME
Saves current command as macro. use "#" for adding
comments EXAMPLE:
pyp -s "great_macro # prints first letter" "p[1]".
-f MACRO_FIND_NAME, --macro_find=MACRO_FIND_NAME
Searches for macros with keyword or user name.
-d MACRO_DELETE_NAME, --macro_delete=MACRO_DELETE_NAME
Deletes specified public macro.
-g, --macro_group
Specify group macros for save and delete; default is user.
-t TEXT_FILE, --text_file=TEXT_FILE
Specify text file to load. For advanced users,
you should typically cat a file into pyp.
-x, --execute
Execute all commands.
-c, --turn_off_color
Prints raw, uncolored output.
-u, --unmodified_config
Prints out generic PypCustom.py config file.
-b BLANK_INPUTS, --blank_inputs=BLANK_INPUTS
Generate this number of blank input lines; useful for
generating numbered lists with variable 'n'.
-n, --no_input
Use with command that generates output with no input;
same as --dummy_input 1.
-k, --keep_false
Print blank lines for lines that test as False.
default is to filter out False lines from the output.
-r, --rerun
Rerun based on automatically cached data from the last run.
Use this after executing "pyp", pasting input into the shell,
and hitting CTRL-D.
SEE ALSO awk(1), grep(1), sed(1).
AUTHOR
pyp was written by Toby Rosen <tobyrosen@gmail.com>.
This manual page was written by Khalid El Fathi <khalid@elfathi.fr>, for the Debian project (and may be used by others).
March 19, 2012 PYP(1)