readdir and dynamic array memory corruption

01-27-2011

Registered User

174, 8

Join Date: Oct 2003

Last Activity: 15 February 2013, 11:18 PM EST

Posts: 174

Thanks Given: 1

Thanked 8 Times in 7 Posts

Quote:

Originally Posted by torbium

Hm,
if readdir returned the SAME pointer every time
list.entries[c]->pde would have the SAME value.
BUT they don't.

It likely does return the SAME pointer every-time (and if it doesn't it's likely freeing the pointer it returned the previous time). What changes is the data within the region to which the pointer points, the data "inside" the pointer if you will.

Quote:

Of course what you propose solves the problem, thanks you.

Because you're now complying with the contract proposed by readdir.

Quote:

BUT WHY and by WHOM heap memory is corrupted after list.entries has been reallocated?

Even if you didn't realloc, you'd have the same problem. Nothing is "corrupting" anything after you realloc. It's the fact that as we both said above, the next time you call readdir (which so happens to coincide with the realloc call) the data from the previous call is overwritten.

Also...your incrementing the size with <<= 1 may be a little more...extreme than you'd like. I hope you know exactly why you're doing that. The array size you'll be allocating will grow quite quickly. If there just so happen to be 257 entries in the directory you'll jump your array to 512 entries to hold them...you may prefer to jump in additive steps rather than exponential...but that's just my humble opinion. Even better still, use a linked list and there will be no wastage.

DreamWarrior

View Public Profile for DreamWarrior

Find all posts by DreamWarrior

01-27-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by DreamWarrior

Also...your incrementing the size with <<= 1 may be a little more...extreme than you'd like. I hope you know exactly why you're doing that. The array size you'll be allocating will grow quite quickly.

Never more than twice as much memory as is currently used. It'll be sane enough unless there's frankly insane numbers of directory entries. Reducing the number of times you call malloc() for tiny things cuts down on memory fragmentation, and once the readdir() loop finishes he's free to shrink it, too.

Quote:

Even better still, use a linked list and there will be no wastage.

Ever try calling qsort() on a linked list?

I swear, the amount of time some programs take to sort things is simply obscene, and could've been avoided if they'd kept it in a numerically-addressable form.

Last edited by Corona688; 01-27-2011 at 07:03 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

01-27-2011

Registered User

5, 0

Join Date: Jan 2011

Last Activity: 27 January 2011, 6:51 PM EST

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

)) I am totaly with Corona688.
There is good video, in my opinion, on STL at channel9.msdn.com
starting with 17th minute he explains why multiplying is better then addition in memory allocation.
Sorting with qsort is the next what i am going to do with this array.

---------- Post updated at 02:51 AM ---------- Previous update was at 02:49 AM ----------

C9 Lectures: Stephan T. Lavavej - Standard Template Library (STL), 1 of n | Going Deep | Channel 9

torbium

View Public Profile for torbium

Find all posts by torbium

01-28-2011

Registered User

174, 8

Join Date: Oct 2003

Last Activity: 15 February 2013, 11:18 PM EST

Posts: 174

Thanks Given: 1

Thanked 8 Times in 7 Posts

I'm not convinced on the fragmentation "issue". It may speed things up because you call malloc fewer times, but you'd need to know how your malloc works to know if fragmentation will be an issue. Since most malloc implementations use pools for various sizes anyway, I don't see calling realloc with smaller jumps as bad. Once you call it with a large enough amount to matter then you're in the large pool and either filling in gaps or pushing out the break point.

Though...I suppose on your side, calling malloc with large sizes typically does nothing anyway until the pages are touched...so...it's not as if your calling malloc and asking for a gig (on most operating systems anyway) is going to hurt unless you start actually putting data into the pages "given" to your process when it (inevitably) calls brk.

Either way...it's a blanket statement that one is better than the other. If you had a process that was multi-threaded and read ten million items in with ten threads would you ask for it exponentially in each thread? If you did you would probably regret it, because you'd likely reach the memory limits of a 32 bit process and your mallocs would start failing and you'd have to back down to an additive method at some point.

Anyway, maybe I need to "think on it" some more, but for now I remain unconvinced.

P.S. I'd watch that video...but I don't have Silverlight installed on my Linux machine.

DreamWarrior

View Public Profile for DreamWarrior

Find all posts by DreamWarrior

01-28-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by DreamWarrior

I'm not convinced on the fragmentation "issue". It may speed things up because you call malloc fewer times, but...

A bit of fragmentation doesn't hurt malloc() much. A small percentage of wasted space? Boo hoo. Unless it's truly horrific you'd never notice or care.

The cost of fragmentation for repeated realloc() is exponentially-growing wasted CPU time as your existing data leapfrogs through every available heap hole in order of size. I think that's worth minimizing.

If you knew the sizes of your heap pools, you could aim large enough to probably be at the end of a pool but small enough to not spill into the next one. We don't, but exponential growth is a decent approximation.

To a point, anyway. For ludicrous sizes I'll admit a cap might be good.

Quote:

Either way...it's a blanket statement that one is better than the other. If you had a process that was multi-threaded and read ten million items in with ten threads would you ask for it exponentially in each thread?

I think that kind of situation needs careful tuning no matter what allocation method you pick.

I might use large fixed-size blocks of memory anonymously mmap()-ed in, one to a thread. That'd let you parcel out fractions of address space pretty precisely without exhausting your memory map or wasting memory. If you knew a little about your address space you could pick an entire contiguous region for them and MAP_FIXED things into place to prevent your address space from growing motheaten.

You could even extend that to accommodate more items than you can hold in your address space by making the mappings file-backed... if you fill a mapping, just truncate() the file one chunk larger and move your map further into the file. This'd also prevent you from exhausting system memory and swap. To sort it, you could qsort in chunks then merge the chunks.

Last edited by Corona688; 01-28-2011 at 10:08 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

Programming

readdir and dynamic array memory corruption

10 More Discussions You Might Find Interesting

1. Programming

Memory corruption in dynamic array of strings

Discussion started by: migurus

2. Solaris

Solaris 10 Shared Memory Corruption with X11

Discussion started by: salerno

3. Programming

* glibc detected * ./a.out malloc() memory corruption

Discussion started by: dare

4. Programming

* glibc detected * : malloc(): memory corruption (fast)

Discussion started by: mpjobsrch

5. Programming

* glibc detected * ./a.out: malloc(): memory corruption (fast):

Discussion started by: exgenome

6. Programming

./match_pattern.out: malloc(): memory corruption: 0x0000000013a11600 ***

Discussion started by: shoaibjameel123

7. SCO

SCO openserver Dynamic linker corruption

Discussion started by: javad1_maroofi

8. Programming

Why does this occur? * glibc detected * malloc(): memory corruption: 0x10013ff8 ***

Discussion started by: cdbug

9. UNIX for Dummies Questions & Answers

'memory corruption' error when using Awk

Discussion started by: kooyee

10. Programming

Creating an array to hold posix thread ids: Only dynamic array works

Discussion started by: kmehta