Sponsored Content
The Lounge What is on Your Mind? Google Webmaster Tools Shows Problems with Soft 404 Errors Post 303042264 by Neo on Friday 20th of December 2019 12:50:55 AM
Old 12-20-2019
Still running this cron to create similar man pages from man pages, and just tested some of the results and the similar man pages are looking good.

FWIW, here is the PHP code I quickly put together for this caper:

Code:
<?php
include_once '/var/www/global.php';
global $vbulletin, $neo_global, $quickload;
$t1 = time();
if ($quickload > 5) {
    $makeLimit = 1;
} elseif ($quickload > 4) {
    $makeLimit = 10;
} elseif ($quickload > 3) {
    $makeLimit = 20;
} elseif ($quickload > 2) {
    $makeLimit = 25;
} else {
    $makeLimit = 30;
}

$strlen = 2500;

$neo_conn = getManDBConn();
$sql = 'select manid, text from neo_man_page_entry where similarman = "blank" and strlen < ' . $strlen . ' order by strlen ASC LIMIT ' . $makeLimit;
$maninfo = mysqli_query($neo_conn, $sql);
$t1 = time();
mysqli_query($neo_conn, "SET sort_buffer_size = 2048000");
$counter = 0;
while ($man = mysqli_fetch_assoc($maninfo)) {
    $out = '';
    $searchstr = $man['text'];
    $searchstr = html_entity_decode($searchstr);
    $searchstr = stripslashes($searchstr);
    $searchstr = strip_tags($searchstr);
    $searchstr = str_replace("'", " ", $searchstr);
    //echo $searchstr ."<p>";
    $sql2 = "select manid, MATCH(text) AGAINST ('" . $searchstr . "' IN NATURAL LANGUAGE MODE) as score,strlen FROM neo_man_page_entry where strlen > 2000 AND strlen < 2000000  ORDER BY score DESC LIMIT 15";
    $resultMan = mysqli_query($neo_conn, $sql2);

    $counter++;
    while ($manpage_raw = mysqli_fetch_assoc($resultMan)) {
        //echo $manpage_raw['manid']."<P>";
        $out .= $manpage_raw['manid'] . ",";
    }
    $out = substr($out, 0, -1);
    //echo $out."<p>";
    $insert = "UPDATE neo_man_page_entry set similarman = '" . $out . "' where manid =" . $man['manid'];
    mysqli_query($neo_conn, $insert);
    //echo $insert . "<p>";

}

$sql2 = 'select count(1) as count from neo_man_page_entry where similarman = "blank" and strlen < ' . $strlen;
$countinfo = mysqli_query($neo_conn, $sql2);
while ($count_raw = mysqli_fetch_assoc($countinfo)) {
    $rem = $count_raw['count'] / ($makeLimit * 60);
    $remaining = number_format($count_raw['count'] / ($makeLimit * 60), 1);
    if ($rem > 0) {
        break;
    }

}
$t2 = time();
$td = $t2 - $t1;
error_log(time() . " Time: " . $td . " Inserts: " . $counter . " Floor: " . $strlen . " Limit: " . $makeLimit . " ToDo: " . $count_raw['count'] . " RemainingTime: " . $remaining . " Hours QLoad: " . $quickload . "\n", 3, '/var/log/apache2/debug/neo_sim_man_man_pages_timing.log');
closeManDBConn($neo_conn, $maninfo);
closeManDBConn($neo_conn, $resultMan);

Code:
ubuntu@ tail -f neo_sim_man_man_pages_timing.log
1576821537 Time: 53 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56594 RemainingTime: 31.4 Hours QLoad: 1.56
1576821597 Time: 53 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56564 RemainingTime: 31.4 Hours QLoad: 1.59
1576821656 Time: 54 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56534 RemainingTime: 31.4 Hours QLoad: 1.59
1576821716 Time: 54 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56504 RemainingTime: 31.4 Hours QLoad: 1.79
1576821776 Time: 54 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56474 RemainingTime: 31.4 Hours QLoad: 1.77
1576821826 Time: 44 Inserts: 25 Floor: 2500 Limit: 25 ToDo: 56449 RemainingTime: 37.6 Hours QLoad: 2.07
1576821895 Time: 53 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56419 RemainingTime: 31.3 Hours QLoad: 1.83
1576821956 Time: 51 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56389 RemainingTime: 31.3 Hours QLoad: 1.84
1576822014 Time: 52 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56359 RemainingTime: 31.3 Hours QLoad: 1.53
1576822078 Time: 52 Inserts: 30 Floor: 2500 Limit: 30 ToDo: 56329 RemainingTime: 31.3 Hours QLoad: 1.12

As a side note,

I was a bit surprised to see how good the results are so far. When I checked about 20 similar man page entries, they were "spot on" and will be helpful for readers / future voyagers to the site. Of course man page with "nothing very similar" get more mixed results.

In a few days, I will add the code to the man pages that checks the strlen of the man page requested and if under 2000 (or maybe 1500), it will include another man page underneath in a section called something like "Check Out this Similar Man Page".. or something like that. This should mitigate the "soft 404" errors keeping certain man pages from being indexed by Google. (Status Update: This Todo is DONE)
This User Gave Thanks to Neo For This Post:
 

7 More Discussions You Might Find Interesting

1. SCO

Tape Status shows 2 Hard errors and 5 Underruns on new tape

when I do a tape status /dev/rStp0 I get the following on a new tape and I have tried several: Status : ready beginning-of-tape soft errors : 0 hard errors: 2 underruns: 5 My BackupEdge has stopped backing up my system because it asks for a new volume yet my total system data is under 20... (5 Replies)
Discussion started by: psytropic
5 Replies

2. Shell Programming and Scripting

Detecting hard or soft disk errors in Solaris

I am looking for some tips or suggestions in how to do the following. 1) From a Solaris server, I run the command iostat -En and receive output that is similiar to the following which shows your disks along with the cdrom/dvdrom: c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport... (1 Reply)
Discussion started by: sunsysadm2003
1 Replies

3. Solaris

soft errors in soalris

hi friends, How will you clear the soft error on disk ? (2 Replies)
Discussion started by: rajaramrnb
2 Replies

4. Solaris

Solaris with Soft Errors in XIV

Hi guys, I had a solaris box, with veritas controled disk. 1 disc is showing soft errors, how can I repair the soft errors? Please help. Cheers; (4 Replies)
Discussion started by: Mujakol
4 Replies

5. Shell Programming and Scripting

help with wget and 404 errors

I am trying to use wget to make a local copy of this website accuscore.com/fantasy-sports/nfl-fantasy-sports/Current-Week-DEF-ST (i have the http:// in front, but the forum will not allow me to put it in at this time) Whenever i try to use wget i receive an "Error 404: Not found". I tried the... (4 Replies)
Discussion started by: problemss
4 Replies

6. What is on Your Mind?

Google Search Console - Mobile Usability - No Errors or Issues - New Milestone

For the first time in the history of the site Google Search Console (GSC) has unix.com showing "no mobile viewability errors". This is no small achievement considering the hundreds of thousand of lines of legacy code we run at a site which has been around much longer than Facebook or LinkedIn: ... (0 Replies)
Discussion started by: Neo
0 Replies

7. What is on Your Mind?

YouTube: Search Engine Optimization | How To Fix Soft 404 Errors and A.I. Tales from Google Search

Getting a bit more comfortable making quick YT videos in 4K, here is: Search Engine Optimization | How To Fix Soft 404 Errors and A.I. Tales from Google Search Console https://youtu.be/I6b9T2qcqFo (0 Replies)
Discussion started by: Neo
0 Replies
All times are GMT -4. The time now is 04:57 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy