Planning on writing a Guide to Working with Large Datasets


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Planning on writing a Guide to Working with Large Datasets
# 1  
Old 10-17-2008
Planning on writing a Guide to Working with Large Datasets:Need some feedback

In a recent research experiment I was handling, I faced this task of managing huge amounts of data to the order of Terabytes and with the help of many people here, I managed to learn quite a lot of things in the whole process. I am sure that many people will keep facing these situations quite often so I am planning on writing a general purpose guide on how to go about handling large amounts of data. Please note the following before reading further:
  1. This guide will not intended for a specific dataset but one or two tips might be definitely of use to you. Smilie
  2. Some (or most... depends on what level you are) of the tips may apply to the absolute beginner
  3. If you have some feedback, please don't hesitate to give your suggestions because I realized that if not for the tricks I learnt in this forum, I would've wasted hundreds of man hours.
  4. I will try my level best to provide with some concrete examples whenever possible but if you find an error somewhere, kindly let me know.
  5. Lastly, as I said, now all this information is mine, some of it was collected from various sources during my work and some of it was attained with the kind help of people here and some of it was through my experience.

The following is the excerpt of the Table Of Contents that I am planning to have in the guide:

Table of Contents
1. Introduction
2. Meet your friends - Discover the purpose of each tool
  • PuTTY
  • Screen
  • Bash Scripting
  • Awk
  • Sed
  • Perl
  • PHP
3. Extremely Useful Commands
4. Some Concepts you ought to know
5. Know your enemies - Have the constrains in mind
6. Downloading and Storing Huge Amounts of Data - Do it carefully or you'll be banned!
7. Database or not? - Is all the effort really worth it?
8. Parsing the Mammoth - The time has finally come
9. Last Minute tips for a Multiprocessor Environment
10. Things to Avoid - Bust the common myths

I am pretty much open and I would really love some feedback on adding/deleting some topics to the above list.

Last edited by Legend986; 10-17-2008 at 05:07 PM..
# 2  
Old 10-18-2008
Any use of Python?
# 3  
Old 10-18-2008
Actually I was thinking even PHP was not necessary but that being my core expertise, I thought I'd cover where it would be useful. Perl is more regex centric and so it seems to suffice for most large dataset processing but if anyone is kind enough to explain the power of Python, that would be great too! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Programming

Writing a Targa file not working from an array

Hello, I wrote code that generates an image that writes its contents to a Targa file. Due to modifications that I wish to do, I decided to copy the value of each pixel as they are calculated to a dynamically allocated array before write it to a file. The problem is now that I all I see is a big... (2 Replies)
Discussion started by: colt
2 Replies

2. UNIX for Dummies Questions & Answers

Who are all opening my datasets,?

Hi, I need a command/script, who opened my dataset, consider a situation like, if a user has opened the dataset few days back then, that command/script should list his/her id. I don't want audit on my dataset, i need only list of users who are using my dataset. Thank you. (10 Replies)
Discussion started by: subbarao12
10 Replies

3. Solaris

Editor for working with large files

Hi, We have file which is about 756 MB of size and vi/vim do not work when we try to edit this file. I'm looking for any editor ( ok if its NOT free ) which has the ability to open/edit a file of 1+GB seamlessly. The OS is SUN Solaris 10 ( Sparc ) Thanks in Advance Maverick (13 Replies)
Discussion started by: maverick_here
13 Replies

4. Shell Programming and Scripting

sed and awk not working on a large record file

Hi All, I have a very large single record file. abc;date||bcd;efg|......... pqr;stu||record_count;date when i do wc -l on this file it gives me "0" records, coz of missing line feed. my problem is there is an extra pipe that is coming at the end of this record like... (6 Replies)
Discussion started by: Gurkamal83
6 Replies

5. Programming

Working with extremely large numbers in C

Hi All, I am just curious, not programming anything of my own. I know there are libraries like gmp which does all such things. But I really need to know HOW they do all such things i.e. working with extremely large unimaginable numbers which are beyond the integer limit. They can do add,... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

6. Solaris

Copy data from zfs datasets

I 've few data sets in my zfs pool which has been exported to the non global zones and i want to copy data on those datasets/file systems to my datasets in new pool mounted on global zone, how can i do that ? (2 Replies)
Discussion started by: fugitive
2 Replies

7. UNIX for Dummies Questions & Answers

Writing large files to tape

I have a zipped file that is ~ 10GB. I tried tarring it off to a tape, but I receive: tar: <filename> too large to archive. Use E function modifier. The file is stored on a UFS mount, so I was unable to use ufsdump. What other options do I have? (I don't have a local file system large... (3 Replies)
Discussion started by: FredSmith
3 Replies
Login or Register to Ask a Question