Sponsored Content
Top Forums Shell Programming and Scripting Unique values from a Terabyte File Post 302246851 by matrixmadhan on Tuesday 14th of October 2008 02:17:14 PM
Old 10-14-2008
Not really.

Running again a plain sort on a tera-byte problem wont scale up properly and that is not needed as well.

These type of problems for which computational complexity increases with more number of records to be processed can be handled by the map-reduce problem. This should probably be done by splitting the files into 'n' chunks and collaborating each of the processed chunks.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies

2. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ... (1 Reply)
Discussion started by: shivi707
1 Replies

3. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

4. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all... (4 Replies)
Discussion started by: Prega
4 Replies

5. Shell Programming and Scripting

List unique values and count instances in .csv file

I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance! Data example: 317476,317756,0 816063,318861,0 313123,319091,0... (4 Replies)
Discussion started by: batcho
4 Replies

6. Shell Programming and Scripting

Find and count unique date values in a file based on position

Hello, I need some sort of way to extract every date contained in a file, and count how many of those dates there are. Here are the specifics: The date format I'm looking for is mm/dd/yyyy I only need to look after line 45 in the file (that's where the data begins) The columns of... (2 Replies)
Discussion started by: ronan1219
2 Replies

7. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

8. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

9. Shell Programming and Scripting

Using grep and a parameter file to return unique values

Hello Everyone! I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18). I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines: My intention is to return one line per... (23 Replies)
Discussion started by: clippertm
23 Replies

10. Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

Hi, I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system. Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in... (13 Replies)
Discussion started by: manikandan23
13 Replies
NOWEB(1)						      General Commands Manual							  NOWEB(1)

NAME
noweb - a simple literate-programming tool SYNOPSIS
noweb [-t] [-o] [-Lformat] [-markup parser] [file] ... DESCRIPTION
Noweb is a literate-programming tool like FunnelWEB or nuweb, only simpler. A noweb file contains program source code interleaved with documentation. When noweb is invoked, it writes the program source code to the output files mentioned in the noweb file, and it writes a TeX file for typeset documentation. The noweb(1) command is for people who don't like reading man pages or who are switching from nuweb. To get the most out of noweb, use notangle(1) and noweave(1) instead. FORMAT OF NOWEB FILES
A noweb file is a sequence of chunks, which may appear in any order. A chunk may contain code or documentation. Documentation chunks begin with a line that starts with an at sign (@) followed by a space or newline. They have no names. Code chunks begin with <<chunk name>>= on a line by itself. The double left angle bracket (<<) must be in the first column. Chunks are terminated by the beginning of another chunk, or by end of file. If the first line in the file does not mark the beginning of a chunk, it is assumed to be the first line of a documentation chunk. Documentation chunks contain text that is copied verbatim to the TeX file (except for quoted code). noweb works with LaTeX; the first doc- umentation chunk must contain a LaTeX documentclass command, it must contain usepackage{noweb} in the preamble, and finally it must also contain a LaTeX egin{document} command. Code chunks contain program source code and references to other code chunks. Several code chunks may have the same name; noweb concate- nates their definitions to produce a single chunk, just as other literate-programming tools do. noweb looks for chunks that are defined but not used in the source file. If the name of such a chunk contains no spaces, the chunk is an ``output file;'' noweb expands it and writes the result onto the file of the same name. A code-chunk definition is like a macro definition; it contains references to other chunks, which are themselves expanded, and so on. noweb's output is readable; it preserves the indentation of expanded chunks with respect to the chunks in which they appear. If a star (*) is appended to the name of an output file, noweb includes line-number information as specified by the -Lformat option (or for C if no -Lformat option is given). The name itself may not contain shell metacharacters. Code may be quoted within documentation chunks by placing double square brackets ([[...]]) around it. These double square brackets are used to give the code special typographic treatment in the TeX file. If quoted code ends with three or more square brackets, noweb chooses the rightmost pair, so that, for example, [[a[i]]] is parsed correctly. In code, noweb treats unpaired double left or right angle brackets as literal << and >>. To force any such brackets, even paired brackets or brackets in documentation, to be treated as literal, use a preceding at sign (e.g. @<<). OPTIONS
-t Suppress generation of a TeX file. -o Suppress generation of output files. -Lformat Use format to format line-number information for starred output files. (If the option is omitted, a format suitable for C is used.) format is as defined by notangle(1); -markup parser Use parser to parse the input file. Enables use of noweb tools on files in other formats; for example, the numarkup parser under- stands nuweb(1) format. See nowebfilters(7) for more information. For experts only. BUGS
Ignoring unused chunks whose names contain spaces sometimes causes problems, especially in the case when a chunk has multiple definitions and one is misspelled; the misspelled definition will be silently ignored. noroots(1) can be used as a sanity checker to catch this sort of mistake. noweb is intended for users who don't want the power or the complexity of command-line options. More sophisticated users should avoid noweb and use noweave and notangle instead. If the design were better, we could all use the same commands. noweb requires the new version of awk. DEC nawk has a bug in that that causes problems with braces in TeX output. GNU gawk is reported to work. The default LaTeX pagestyles don't set the width of the boxes containing headers and footers. Since noweb code paragraphs are extra wide, this LaTeX bug sometimes results in extra-wide headers and footers. The remedy is to redefine the relevant ps@* commands; ps@noweb in noweb.sty can be used as an example. SEE ALSO
notangle(1), noweave(1), noroots(1), nountangle(1), nowebstyle(7), nowebfilters(7), nuweb2noweb(1) Norman Ramsey, Literate programming simplified, IEEE Software 11(5):97-105, September 1994. VERSION
This man page is from noweb version 2.11b. AUTHOR
Norman Ramsey, Harvard University. Internet address nr@eecs.harvard.edu. Noweb home page at http://www.eecs.harvard.edu/~nr/noweb. local 3/28/2001 NOWEB(1)
All times are GMT -4. The time now is 09:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy