In essence, the monitor becomes nothing but a big block of memory where each block of 4 bytes represents one pixel. Set the right block to 0xffffffff, and one pixel turns white. The memory's in the video card, not the monitor, but the monitor's contents and the video memory's contents are directly related.
In practice it's a little less direct than that. With all the interrupts and context switches and such happening all the time, if you tried to write to video memory directly all the time, it'd look like junk -- you could never guarantee you'd be able to keep up with the screen, and would see half-done updates and tearing all the time. So they put extra memory in the video card for you to write to without altering the screen, and can rapidly tell the video card "okay, this area of memory's ready, show it next frame".
Dealing with a window manager makes it even more complicated because now you're only allowed to write to
part of the screen, and other windows might be sitting on top of yours. Still, it ends up working much the same way, with the video driver and window manager handling the complicated bits of which bytes go where on the screen, you just write to your own little buffer.
libSDL is a nice interface library which lets you use fairly raw video the same way in Windows and Linux and many other operating systems. I wrote
SDL_plattest using it, check out raster.c.