Huge files manipulation Post: 302255554

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of...

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect...

3. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

we have one file (11 Million) line that is being matched with (10 Billion) line. the proof of concept we are trying , is to join them on Unix : All files are delimited and they have composite keys.. could unix be faster than Oracle in This regards.. Please advice

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA...

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|.........

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

7. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM...

8. Shell Programming and Scripting

Compression - Exclude huge files

I have a DB folder which sizes to 60GB approx. It has logs which size from 500MB - 1GB. I have an Installation which would update the DB. I need to backup this DB folder, just incase my Installation FAILS. But I do not need the logs in my backup. How do I exclude them during compression (tar)? ...

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see...

10. Shell Programming and Scripting

Aggregation of Huge files

Hi Friends !! I am facing a hash total issue while performing over a set of files of huge volume: Command used: tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f' Pipe delimited file and 156 column is for hash totalling....

LEARN ABOUT DEBIAN

pyp

PYP(1)							      General Commands Manual							    PYP(1)

NAME

       pyp - The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities

SYNOPSIS

       pyp [options] files ...

DESCRIPTION

       pyp,  the  Pyed Piper, is a command line tool for text manipulation. It is similar to awk and sed in functionality, but its subcommands are
       Python based, and thus more familiar to many programmers.

       It can operate both on a per-line base and on the complete input stream.  Different features can be pipelined in a single command by  using
       the pipe character familiar from shell commands.

       pyp  backs  up  its  input  for reruns with modified commands, and can save commands as macros. On the downside, the rerun feature makes it
       unsuitable for continuous pipe operation.

OPTIONS

       These programs follow the usual GNU command line syntax, with long options starting with  two  dashes  (`-').   A  summary  of  options	is
       included below.	For a complete description, use --manual.

       -h, --help
	      Show this help message and exit.

       -m, --manual
	      Prints out extended help.

       -l, --macro_list
	      Lists all available macros.

       -s MACRO_SAVE_NAME, --macro_save=MACRO_SAVE_NAME
	      Saves current command as macro. use "#" for adding
	      comments	EXAMPLE:
	      pyp -s "great_macro # prints first letter" "p[1]".

       -f MACRO_FIND_NAME, --macro_find=MACRO_FIND_NAME
	      Searches for macros with keyword or user name.

       -d MACRO_DELETE_NAME, --macro_delete=MACRO_DELETE_NAME
	      Deletes specified public macro.

       -g, --macro_group
	      Specify group macros for save and delete; default is user.

       -t TEXT_FILE, --text_file=TEXT_FILE
	      Specify text file to load. For advanced users,
	      you should typically cat a file into pyp.

       -x, --execute
	      Execute all commands.

       -c, --turn_off_color
	      Prints raw, uncolored output.

       -u, --unmodified_config
	      Prints out generic PypCustom.py config file.

       -b BLANK_INPUTS, --blank_inputs=BLANK_INPUTS
	      Generate this number of blank input lines; useful for
	      generating numbered lists with variable 'n'.

       -n, --no_input
	      Use with command that generates output with no input;
	      same as --dummy_input 1.

       -k, --keep_false
	      Print blank lines for lines that test as False.
	      default is to filter out False lines from the output.

       -r, --rerun
	      Rerun based on automatically cached data from the last run.
	      Use this after executing "pyp", pasting input into the shell,
	      and hitting CTRL-D.

SEE ALSO

       awk(1), grep(1), sed(1).

AUTHOR

       pyp was written by Toby Rosen <tobyrosen@gmail.com>.

       This manual page was written by Khalid El Fathi <khalid@elfathi.fr>, for the Debian project (and may be used by others).

								  March 19, 2012							    PYP(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Discussion started by: kmkbuddy_1983

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

3. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

Discussion started by: magedfawzy

4. Shell Programming and Scripting

Split a huge data into few different files?!

Discussion started by: patrick87

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Discussion started by: lakteja

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

7. Shell Programming and Scripting

Comparing 2 huge text files

Discussion started by: linuxgeek