The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Advanced & Expert Users
.
google unix.com



UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
tools for capacity planning itik SUN Solaris 2 05-23-2008 11:30 PM
Collaborative Filtering on Skewed Datasets iBot UNIX and Linux RSS News 0 05-22-2008 10:00 PM
capacity planning on aix itik AIX 1 05-12-2008 02:27 PM
Writing large files to tape FredSmith UNIX for Dummies Questions & Answers 3 01-22-2008 12:12 PM
Sed working on lines of small length and not large length thanuman UNIX for Dummies Questions & Answers 3 04-15-2005 06:12 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-17-2008
Legend986 Legend986 is offline
Registered User
  
 

Join Date: Sep 2007
Posts: 171
Planning on writing a Guide to Working with Large Datasets:Need some feedback

In a recent research experiment I was handling, I faced this task of managing huge amounts of data to the order of Terabytes and with the help of many people here, I managed to learn quite a lot of things in the whole process. I am sure that many people will keep facing these situations quite often so I am planning on writing a general purpose guide on how to go about handling large amounts of data. Please note the following before reading further:
  1. This guide will not intended for a specific dataset but one or two tips might be definitely of use to you.
  2. Some (or most... depends on what level you are) of the tips may apply to the absolute beginner
  3. If you have some feedback, please don't hesitate to give your suggestions because I realized that if not for the tricks I learnt in this forum, I would've wasted hundreds of man hours.
  4. I will try my level best to provide with some concrete examples whenever possible but if you find an error somewhere, kindly let me know.
  5. Lastly, as I said, now all this information is mine, some of it was collected from various sources during my work and some of it was attained with the kind help of people here and some of it was through my experience.

The following is the excerpt of the Table Of Contents that I am planning to have in the guide:

Table of Contents
1. Introduction
2. Meet your friends - Discover the purpose of each tool
  • PuTTY
  • Screen
  • Bash Scripting
  • Awk
  • Sed
  • Perl
  • PHP
3. Extremely Useful Commands
4. Some Concepts you ought to know
5. Know your enemies - Have the constrains in mind
6. Downloading and Storing Huge Amounts of Data - Do it carefully or you'll be banned!
7. Database or not? - Is all the effort really worth it?
8. Parsing the Mammoth - The time has finally come
9. Last Minute tips for a Multiprocessor Environment
10. Things to Avoid - Bust the common myths

I am pretty much open and I would really love some feedback on adding/deleting some topics to the above list.

Last edited by Legend986; 10-17-2008 at 04:07 PM..
  #2 (permalink)  
Old 10-18-2008
fulat2k fulat2k is offline
Registered User
  
 

Join Date: Feb 2008
Posts: 6
Any use of Python?
  #3 (permalink)  
Old 10-18-2008
Legend986 Legend986 is offline
Registered User
  
 

Join Date: Sep 2007
Posts: 171
Actually I was thinking even PHP was not necessary but that being my core expertise, I thought I'd cover where it would be useful. Perl is more regex centric and so it seems to suffice for most large dataset processing but if anyone is kind enough to explain the power of Python, that would be great too!
Sponsored Links
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:55 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0