Locklin on science

Search engines for grownups

Posted in semantic web, tools by Scott Locklin on March 15, 2013

Google is an amazing company. It is so all-pervasive it has become a verb. It also annoys the hell out of me, and I avoid it whenever I can. No matter how annoying their interface becomes, or how many weird and privacy invading things they do, no matter how many crypto-religious fruitcakes they hire, they’re  the only game in town for most people.  I don’t like monopolies. I think monopolies are inherently evil and should be shunned by people  with a conscience, or tamed by the judicial system. Since the US government is presently composed of ninnyhammers obsessed with irrelevant things, and geldings who have forgotten about the anti-trust laws, it falls to the individual to do something about it. Where is Teddy Roosevelt when you need him?

Health Care Long Haul Analysis

There are alternatives available. The problem is, nobody knows about them. Google dominates people’s thoughts about search the way Microsoft used to dominate people’s ideas about computers in general. Some of the alternatives are very much worth knowing about, even if you are happy with using Google.

For most people, the best alternative is Yandex.com. Yandex is the biggest player in the Russian market. It’s been around for  longer than Google has, it is run by mature computer scientists who specialize in machine learning, and is one of the best search engines you have never heard of.  The English language version of their search engine is considered experimental, but the results are very good. For general search, it is as good or better than Google. The results are uncannily accurate, and the clutter is practically nonexistent. Speaking of clutter: I’m really happy with how their page looks; no clutter. The English language page is missing some “searchy” features at present: for example -no English language news aggregator  (which means, no news results in the basic search either). This feature exists in Russian, so I assume it is coming. Multimedia? Well, they’re not so hot here, but searching for funny pictures is a rare task for me. Google has a marginal win on maps for the US, mostly for the public transit option that works (Yandex seems OK for driving maps). The Russian language translation facilities at Yandex are, of course, excellent: much better than Google. As a slavophile, I find this invaluable.

One privacy advantage Yandex has which Google never will: Yandex does not do business with American intelligence agencies.  I do not like the fact that Google has become an arm of US intelligence agencies. It is to their credit that Google discloses their relationship with the US government (most of Silicon Valley is in bed with the spooks, but they don’t talk about  it). It is the surveillance  state that I abhor. Yandex may very well be doing the same thing with the Russian government, but the FSB is a much smaller threat to American civil rights than our own spooks. While I see no immanent dangers from the all-seeing eye, and I am far from paranoid, the US is going through a weird time right now, and history is a dark and bloody subject. Do I really want the future government  to know what websearches I was doing in 2010? No, thanks,  tovarich.

clouseau

As a crypto-academic consultant, I end up doing a lot of searches for technical papers. Google is OK at this (I have found no utility in “google scholar” -the regular search results are equivalent). Yandex actually does significantly better.  Of course, these kinds of searches are a broad net. If you have a decent idea of what you’re looking for, INSPEC is still the gold standard. You have to pay for INSPEC, or walk to a university library, but that is what serious people use for deep search in an academic subject.

Yandex does fail one important use case for me. One of the fundamental ways people get work done on computers is searching for error messages and bugs and “how-tos” on message boards. If you’re dealing with a computer problem, chances are good that someone else had the problem, and asked others about it on an online forum; whether it is a compiler directive or a wonky KDE feature. This is a tremendously helpful knowledge base. Google beats everyone at this at present, mostly because you can sort by date. Close behind google for this use is duckduckgo.com.

I have high hopes for Yandex. While Google hires a lot of rock star programmers and well known computer scientists, Google also seems unfocused and adolescent (read the takimag article for more concrete criticisms). The Yandex guys: they’re grownups. They have succeeded in a country of flinty hard men.  People actually died trying to do business in Russia in the 90s; these guys made it. They’ve only been doing English for a little while, and they’re already better than Google at quite a few things. Search in Russian is much harder than search in English, as the language is strongly inflected. So, Yandex solved a much harder problem than Google did at the outset. Google wastes its time with nonsense like Google+ or attempts to bring about the “singularity” by hiring Crazy Ray Kurzweil. Meanwhile, Yandex is using its technology to assist particle physicists at CERN, which seems a bit more impressive. I’ve seen significant improvements in Yandex search results over the past few months. It is very exciting to watch a complex contraption like this improving so quickly. Consider this: they have achieved all this on revenues which are 1/60 of what Google takes in.  The flabby marshmallows at Google may not be worried now, but these guys are coming for them. If I had a bunch of steel hard brainy Russian cossacks in my rear view mirror, I’d be nervous.

Meanwhile-in-Russia

On a slightly different topic: one of the hardest things a technical or fact-oriented person looks for on the internets is data. Most search engines are completely useless for this type of thing. It’s really a different type of problem from ordinary search. I have only found two search engines which do this well.

One is Wolfram Alpha, which I made fun of at one point. I now find it indispensible for looking up simple facts and figures, using an English language query. It doesn’t have large amounts of data, but it’s easy to get to the data: just tell it what you need. Kudos to them for getting this right. It ain’t bad for doing integrals and such either; certainly more convenient than using some long-in-the-tooth open source computer algebra system like Axiom or Maxima. While it kind of sucked when it first came out, the suck is all gone: this is an excellent product every numerate individual should avail themselves of.

The other is quandl.com. I have been using it for only a few weeks, and don’t know how I lived without it. I had a lot less data to work with, and I went through a lot more trouble to obtain it. For quants, this is an indispensible tool for historical economic data. For datanauts in general; ditto. Before quandl, you had to scrape publicly available data from myriad websites. Post-quandl; well, it’s easy to get at, and if you register with them, you can download dynamically updated data in easily parsed CSV format all damn day. Hooray for Quandl! Please don’t sell out to gigantor corp that will make you suck. If you must, sell out to Yandex!

meanwhile-in-russia-big-priest-with-big-gun

About these ads

22 Responses

Subscribe to comments with RSS.

  1. Brian said, on March 15, 2013 at 3:53 pm

    Love the pics. Thanks for the tips. Yandex looks good. Never had luck with Wolfram. I don’t think I’m their customer target.

    • Scott Locklin said, on March 15, 2013 at 4:29 pm

      I tried Wolfram over time: if you haven’t used it lately, try it again. They seem to have curated a lot of helpful stuff. It’s mostly economic, scientific and “facts and figures” stuff. Searches I’ve used it for which win a lot: “what is the total economic output/population of Latin America/the EU/Asia” Google failed at that. Wolfram got it perfectly. One thing I could have used it for a recent blog post, but unfortunately didn’t: “what is the efficiency of a heat engine.” Not only does it tell you, it gives you a calculator and plots helpful charts and diagrams. That’s really helpful and is starting to look eerily like brain in a jar.
      Of course, they have to have guessed what you’re looking for, so presumably it’s mostly going to reflect Wolfram’s customers; financial and scientific people. If I ask it, “how many bombs can an F-22 carry?” it tells me about Fluorine. So, they’re not going to beat Google or Yandex for standard search. Google and Yandex correctly points me to Wiki entries and aerospaceweb on that one.

  2. seanrwcrawford said, on March 15, 2013 at 4:37 pm

    Scott, thanks for the kind words about Quandl! For anyone who may be interested, you can find more info about our new excel add-in here: http://www.quandl.com/help/excel-add-in

    We also have a R package, and next week we’re releasing a MATLAB package as well.

    • Scott Locklin said, on March 15, 2013 at 9:14 pm

      Dude, thanks for building the goddamned thing; it is truly wonderful. Of the three or four search engines I mentioned here, Quandl is the one that actually allows for really new capabilities. My only concern is it might some day go away the way opentick did.
      FWIIW, I have a cheap-o plug in for the J language. I’ll ping you when it’s good enough to stick on github. It’s kind of an obscure language, but a port from J to Q/Kx should be fairly straightforward. Lots of data-hungry quants in that ecosystem.

  3. Rod Carvalho said, on March 16, 2013 at 2:19 am

    Yandex may beat Google at interior design. Take a look at their Saint Petersburg office.

    • Scott Locklin said, on March 16, 2013 at 5:11 am

      I saw that. Reminds me a lot of Yahoo HQ.

  4. Geoff said, on March 16, 2013 at 4:34 am

    Any suggestions on gmail alternatives? I’d like to migrate away from Google for privacy concerns, but haven’t come across acceptable solutions.

    • Scott Locklin said, on March 16, 2013 at 5:07 am

      I still use Yahoo for my long term fallback email. It works fine for me. Gmail’s UI always gave me the creeps.
      FWIIW, yandex has email; it’s what most Russians use. Didn’t check it out, but I assume it is reliable at least.

      • Sonu said, on March 24, 2013 at 7:05 am

        Wow, nice to hear that you still use Yahoo. I actually like it a lot. It has some issues with search and it probably might be useful for it to have conversations like in gmail. But for some reason if people see me use Yahoomail (although I do have multiple gmail accounts), they see me as somebody from the 12th century.

    • Petro said, on March 16, 2013 at 5:19 pm

      Spend $60 bucks a year and go with someone like http://www.01.com (there are other providers out there, YMMV)

      I’ve have my bounty.org domain hosted there since 2010, and they do a great job. They use ZImbra for the mail server, and it does most of what gmail does, and other stuff besides. The Wife and I both have addresses under that and we share calendar stuff sometimes (not that much really, but it’s possible).

      I’ve been really happy with them.

  5. brucecharlton said, on March 16, 2013 at 10:55 am

    I share your concerns, so I tried yandex… Sorry, but for a non-techie, Google is miles better…

    • Scott Locklin said, on March 16, 2013 at 7:42 pm

      If you’re looking for current events or anything on a message board, Yandex fails. Otherwise it works really well for me. What kinds of things does it fall down on for you?

      • brucecharlton said, on March 16, 2013 at 9:09 pm

        Well, it didn’t seem to do searches by time (past hour, day, week, month, year).

        Actually Google have removed (for a couple of years) one of their most valuable facilities for me, which was to do a search for ‘most recent’ web-mentions – which showed about ten items and was continuously updated.

        • Scott Locklin said, on March 16, 2013 at 9:34 pm

          I agree with that criticism. However, I only use that feature rarely, and only when searching for computer issues which must be recent. For that task, duckduckgo gives me decent answers without the filter. I am assuming Yandex will give us the filter eventually. It exists in the Russian version.
          Most of my “general academic” searches work better without the filter. Machine learning and optimization are limited fields with few useful results, and either I need something bleeding edge or I need to know what has been done before. For physics; same thing, or else I need INSPEC.

  6. Petro said, on March 16, 2013 at 6:00 pm

    There are other US search engines, duckduckgo.com, bing, and ixquick.com for an example. I don’t know how good their underlying search engines are though.

    • Scott Locklin said, on March 16, 2013 at 7:48 pm

      Bing is ‘orrible. Duckduckgo works well for me on debugging linux things, and almost nothing else. Like the rooskies say, there are only 5 nations with search engines; 20 with space programs. Getting search right is difficult.

      • Petro said, on March 16, 2013 at 9:26 pm

        As anyone who’s bought an Eastern Block AK knows, Russians are big on robust and short on accuracy.

        A fast search shows that there’s at LEAST 16 countries with their own search engines.

        And search engines aren’t *nearly* as good for international prestige as pretending the US didn’t beat you into space by decades.

        There are some allegations (of course there are) that Google is politicizing the results of certain searches. Dunno. I’d be glad to see them taken down a notch.

        • Scott Locklin said, on March 16, 2013 at 9:36 pm

          Might have been true when he said it. Either way, the search engines that count come from around 5 countries. Anyone can write a simple search engine which indexes some subset of internet content. I think Bram wrote one for torrents in a couple of weeks. Getting it right in the large is the hard problem.

          • Petro said, on March 17, 2013 at 7:34 pm

            True. The Anglosphere has very little use for a Korean or chinese based search engine, even if it’s really, really good, and no matter how good it is, if it biases in favor of AU or NZ links it’s not going to do a lot of good when I’m searching for pizza in Aurora, CO.

            Getting it right is very difficult because the factors that are important when you search for a linux kernel bug are VERY different when you search for pictures of Ashley’s Juggs, or when looking for a tweed jacket. The latter is the search that Google is most interested in, since it’s easier to convert that in to a penny or two.

  7. SUNDAY EVENING LINKAGE | Iced Borscht said, on March 18, 2013 at 4:12 am

    [...] Scott Locklin: Search Engines for Grownups [...]

  8. Toddy Cat said, on May 17, 2013 at 8:00 pm

    “the US is going through a weird time right now”

    To say the least. If you had told me back in the 1980’s that I would trust the Russian goverment more than I do my own, I’d have thought that you were nuts. But right now USG seems a lot more commie than the Russians do. Crazy world out there…

    • Scott Locklin said, on May 17, 2013 at 8:28 pm

      Someone posted one of them funny polls about “what would you do in a modern ‘Red Dawn’ situation?” in a right wing forum. “Welcome them as liberators” was a common reply.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 289 other followers

%d bloggers like this: