Visit The New, Modern Unix Linux Community

SAS or SSD for Ubuntu 14.04 and data analysis

Thread Tools Search this Thread
Special Forums Hardware SAS or SSD for Ubuntu 14.04 and data analysis
# 1  
SAS or SSD for Ubuntu 14.04 and data analysis

I am in the process of building a workstation and have a question related to performance. I am a scientist who deals with big data (average file size 30-50gb). My OS is ubuntu 14.04 and so far I have a 128gb dual xeon E5-2630 with 6 cores each. I/O buffering is an issue so I am adding a 256/512? PCIe card and either 2 SSD or SAS drives for the OS and software. Since the PCIe will be separate its main purpose will be for file transfer, so would a SAS or SSD be a better fit for the OS? I am leaning towards SAS for the buffering issue, but wanted to ask more knowledgeable users. I forgot to mention that there will be a separate 1 or 2TB drive. Any recommendations for the size of the SAS or SSD? Thanks Smilie.
# 2  
do you access your data files randomly or sequentially?
# 3  
I access files sequentially. Thank you Smilie.
# 4  
SSD is more than an order of magnitude (or much) faster than SAS high-rpm disks.
SSD is limited - usually to 1-2 TB of storage. With 128GB of memory, you could easily use SSD disks to load whatever file you want into memory - e.g., usual term is a RAMDISK. Ubuntu supports this. It also caches files very effectively without much human intervention other than configuration.

Learn about pdflush: The Linux Page Cache and pdflush

There is also vmtouch. You can force any file to be read entirely into memory. Which would definitely favor SSD. Also note some other tools on that site.

So, I would suggest: SSD's and vmtouch (or an analagous tool.)
This User Gave Thanks to jim mcnamara For This Post:
# 5  
You might also consider m sata ssd
Samsung SSD 840 EVO mSATA | Samsung SSD
This User Gave Thanks to jgt For This Post:
# 6  
Not directly related but i had a longer workshop yesterday about our new storage system (EMC VMax 200k). EMC claims that they had intended the 300GB 15k-SAS drives for high-performance, but phase them out now because (quoting from memory) with the development of Flash-SSDs its just not worth it any more. They also claim that, because they use SLC-based hardware, they have even lower rates of disk-replacement, even in heavy-duty transactional storage systems, than with rotational disks, to which a much lower energy consumption of the SSDs compared to the 15k-SAS disks contributes. There is simply less heat involved and that shows when you pack some ~2500 disks into a rack.

You haven't said where you are going to place the workstation, but in case it is going to be somewhere near your desk: 15k-disks are awefully LOUD in addition to be premier heating devices while SSDs are completely silent.

I hope this helps.

These 2 Users Gave Thanks to bakunin For This Post:
# 7  
Thank you all Smilie.

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #292
Difficulty: Easy
In 2011, Google introduced the Chromebook, a web thin client running Android..
True or False?

6 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What should I format my SSD with?

Hello All, I recently received a new SSD that I am going to use for the purpose of Booting Virtual Machines. I use VMWare Player to boot Windows Guest Operating Systems onto my Linux Laptop. I currently have a SSD drive that I use for this exact same purpose that is formatted as ext3 and I'm... (3 Replies)
Discussion started by: mrm5102
3 Replies

2. UNIX for Dummies Questions & Answers

Data analysis, Regular Expression - Unix

Hey every one! I have a dataset like this : 1 100 1 0 5 100 1 8 7 50 1 0 7 100 2 0 10 20 1 8 10 30 1 8 10 100 3 8 15 50 5 0 20 90 1 0 20 99 9 0 I wanna check if the 4th column is 0 or 8 If it's zero write the 1st column itself, if it's 8 write sum of 1st and second something... (2 Replies)
Discussion started by: @man
2 Replies

3. Shell Programming and Scripting

Help with analysis data based on particular column content

Input file: Total_counts 1306726155 100% Number_of_count_true 855020282 Number_of_count_true_1 160014283 Number_of_count_true_2 44002825 Number_of_count_true_3 18098424 Number_of_count_true_4 24693745 Number_of_count_false 115421870 Number_of_count_true 51048447 Total_number_of_false ... (2 Replies)
Discussion started by: perl_beginner
2 Replies

4. Red Hat

What is the best tools for performance data gathering and analysis?

Dear Guru, IHAC who complaint that his CentOS is getting performance issue. I have to help him out of there. Could you please tell me which tools is better to gathering the whole system performance data? -- CPU/Memory/IO(disk & Network)/swap I would like the tools could be... (6 Replies)
Discussion started by: devyfong
6 Replies

5. BSD

Using SSD in FreeBSD

Now that SSD drives are becoming mainstream, I had a few questions on installing a SSD drive in a FreeBSD environment. Can FreeBSD be made SSD aware, that is, somehow let FreeBSD know that reads and writes should be limited or deferred to extend the disk's life? Is there a setting for wear... (0 Replies)
Discussion started by: figaro
0 Replies

6. UNIX and Linux Applications

help needed- data analysis-table-chart-2d plot software

Hi all, I posted the same message under 'Kaleidagraph like software for Ubuntu' thread. I guess there may not be many people familiar with Kaleidagraph. So I post my message under another subject name. I need a tool for Ubuntu 8.10, -which is quick and easy to learn and use (as I am... (1 Reply)
Discussion started by: apprentice
1 Replies

Featured Tech Videos