Not signed in (Sign In)

Vanilla 1.1.9 is a product of Lussumo. More Information: Documentation, Community Support.


    Every so often we get people asking questions that are clearly at the level of, say, standard homework problems in calculus courses or below. These are summarily dismissed. (I think some people are a bit too harsh in the way they dismiss these questions -- I feel that we should do our best not to anger people, let they get the idea that all mathematicians are assholes. But that's a different story.)

    How are these people finding Math Overflow? Do we know? I understand what motivates them to ask their questions once they find the site -- there's a chance they may get an answer, and the worst that can happen is that some people on the Internet that they've never met tell them to go away -- but how are they getting here in the first place?

    • CommentAuthorMariano
    • CommentTimeMar 25th 2010

    You probably meant to write lest instead of let :)


    Good question. My first guess would be google. But actually, I have a hard time finding a google search likely to be used by a student that shows MO on the first page. Curious. If MO is that hard to find now, imagine what it's going to be like when it becomes easier to find.

    By the way, this prompts a feature request. Could closed questions be equipped with <meta name='robots' content='none'> in the head? Then at least typical homework questions won't by themselves lead googling students to MO.


    There are places which maintain a list of links to active StackExchange sites, and MO is featured pretty prominently since we're relatively active. We probably also get a lot of referrals from SO, although that shouldn't account for most of them.

    The obvious keywords I can think of for math homework help don't bring up MO, so I don't think Google's to blame.

    • CommentAuthorMariano
    • CommentTimeMar 25th 2010

    Harald, the questions in question should be deleted instead.

    Lots of questions have been closed "because there is no point in allowing more answers" (not that I think that is a sensible reason!) and I don't think you want to disallow robots seeing those and people finding them.


    Harald, I'm not sure MO will become any easier to find than it is now. Google already seems to find us very quickly, and that's basically how everyone finds things.


    Do "we" have access to the server access logs? That would tell us how they find us.


    We do not have access to the server access logs as far as I know.


    Shame. That would provide some useful and interesting information. Any chance of requesting it? (Duly sanitised, of course.)


    What information is contained in server access logs?


    @Anton: It can vary, but typically you find a time stamp, originating IP address, HTTP command (GET, POST, etc), result code, browser info, and referrer. If the user clicks on a link in a certain web page, the browser will typically include the address of that page as the referrer when asking for the linked page. For images, the referrer will be the name of the containing page.


    Here's a sample of the access log from my blog:

    ::1 - - [21/Mar/2010:15:34:28 +0100] "POST /~astacey/wordpress/wp-admin/admin-ajax.php HTTP/1.1" 200 237 "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv: Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
    ::1 - - [21/Mar/2010:15:34:52 +0100] "GET /~astacey/wordpress/wp-admin/images/button-grad-active.png HTTP/1.1" 200 284 "http://localhost/~astacey/wordpress/wp-admin/css/colors-fresh.css?ver=20091217" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv: Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
    ::1 - - [21/Mar/2010:15:34:52 +0100] "POST /~astacey/wordpress/wp-admin/post.php HTTP/1.1" 302 - "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv: Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
    ::1 - - [21/Mar/2010:15:34:58 +0100] "POST /~astacey/wordpress/wp-cron.php?doing_wp_cron HTTP/1.0" 200 - "-" "WordPress/2.9.2; http://localhost/~astacey/wordpress"
    ::1 - - [21/Mar/2010:15:34:54 +0100] "GET /~astacey/wordpress/wp-admin/post.php?action=edit&post=7&message=6 HTTP/1.1" 200 41745 "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv: Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"

    Normally, you get the IP address at the start but since I was on the machine running the server, it didn't register. Then you get the time-stamp, the request, the HTTP sever code (200 good, 40X bad), Not sure what that next number is (time the request took, perhaps?). Then the referring page (if any). Finally, the user-agent string (so you can see that I claimed to be using Firefox on an old Mac from the US running linux. In fact, I appear to be running fedora 11 and Firefox 3.5.8. Actually, you haven't a clue what I'm actually using as user-agents are customisable by the browser.).

    If you don't have access to the server logs, you may still be able to pick this up using a bit of nifty javascript. All of this information gets passed to any program and it may be accessible in the javascript. I'm not a js expert so I don't know if js has access to this, and if it does whether or not it can do anything with it. Certainly a server-side script could do it. What you would want is for the page to ajax-like call a program on the server passing it all the environment variables. That program can then log it wherever you want.

    Seems a bit of an effort, though. Far simpler just to request the access logs.

    • CommentAuthorgrp
    • CommentTimeMar 27th 2010
    Usually, the number after the return code occurs after a successful lookup, and is the number of bytes in the "core" of the response, often the size of the returned file.

    The server logs might contain some useful referrer information, but it will be at best suggestive as to how some people are finding the site. I would also use something like Google to find pages that link to, and see if any of those pages also provide links to other "homework" sites.

    Gerhard "Ask Me About System Design" Paseman, 2010.03.27

    It sounds like most of that information is collected by Google Analytics, but it's presented to me in a way that makes it hard to track individuals. Here's an example of the sort of information I can extract. Consider this question. I can ask analytics to show me how people got to this question:
    As you can see from the last column, indexing isn't costing anything!†

    But that only gives information about who visits the question after it's been asked. The real question is how the asker got to the site, and it seems like that would be non-trivial to extract even with the access logs. I guess we could find the first occurrence of the asker's IP.

    †You should probably ignore that. In the mornings, I'm the only one who thinks I'm funny.

    • CommentAuthorgrp
    • CommentTimeMar 27th 2010
    Some setups I can't resist.

    To Anton: I'm sure other people think you're funny in the morning, just not in the way you think you're funny.
    • CommentAuthorHarry Gindi
    • CommentTimeMar 27th 2010 edited

    As you can see from the last column, indexing isn't costing anything!†



    My wife tells me, "your jokes are funny in a very different way."


    @Anton: +1. I hope our wives never meet ...