Locklin on science

NSA scandal notes

Posted in privacy by Scott Locklin on June 12, 2013
  • Almost 3 months ago, I wrote the following passage:

“One privacy advantage Yandex has which Google never will: Yandex does not do business with American intelligence agencies. I do not like the fact that Google has become an arm of US intelligence agencies. It is to their credit that Google discloses their relationship with the US government (most of Silicon Valley is in bed with the spooks, but they don’t talk about it). It is the surveillance state that I abhor. Yandex may very well be doing the same thing with the Russian government, but the FSB is a much smaller threat to American civil rights than our own spooks. While I see no immanent dangers from the all-seeing eye, and I am far from paranoid, the US is going through a weird time right now, and history is a dark and bloody subject. Do I really want the future government to know what websearches I was doing in 2010? No, thanks, tovarich.”

  • Breakdown of the history behind this at Takimag, along with a  proposal I reproduce here:

“I have a modest proposal. If there is truly nothing to worry about, all domestic government employees, officials, lobbyists, apologists, and contractors should be compelled by law to publish their telephone metadata records and personal Internet communications to the general public. Private-sector data-miners (AKA me) will keep track of it and report on our findings. Doubtless it will provide interesting information about the rampant corruption and foreign-agent contacts in our government. While terrorism is a problem, it seems to me that corruption and foreign agents’ influence on domestic politics is a more serious problem. If they have nothing to hide, they should have nothing to worry about. Surely the “transparency president” would agree?”

  • Credit where it is due, Google is doing damage control, and attempting to release some real data about the FISA requests
“We therefore ask you to help make it possible for Google to publish in our Transparency Report aggregate numbers of national security requests, including FISA disclosures—in terms of both the number we receive and their scope. Google’s numbers would clearly show that our compliance with these requests falls far short of the claims being made. Google has nothing to hide.”
  • Did you know that “boundless informant” runs on open source platforms? Well, it does (see page 4 here). Does anyone who publishes open source software want this to happen? I certainly don’t (though I only have one item up which could possibly be used as such). Some lawyer at the Gnu foundation or EFF needs to get to work on a license which excludes spooks and their contractors. Not that it will actually stop such people, but there is no point making it easy for them.

BTC bubbles

Posted in econophysics by Scott Locklin on April 17, 2013

Not surprisingly, Bitcoin prices are well described by the  log periodic power laws describing the dynamics of bubbles. A reminder of what a LPPL model looks like; here is a simple one:

p(t) = A + B(t_c - t)^\beta + C(t_c - t)^\beta \cos( \omega \log(t_c-t)+\phi)

I didn’t profit from this. I thought of applying LPPL to the BTC bubble well before the crash during a bullshit session with a friend, but I didn’t run the analysis until after. I have better things to do with my time than play with weird monopoly money, and the “exchanges” presently offering shorts are not even close to useful. I also think anyone who trades on LPPL is basically gambling. The most interesting parameter, t_c is hardest to fit, and, well, with all those parameters I could fit a whole lot of elephants. Just the same it is a useful enough concept to justify further research. No, I won’t be telling the world about that research on my blog. A man’s got to eat, after all. Doing bubble physics costs money.

If you don’t know about LPPL models, click on these two helpful links. The “hand wavey” idea is, if the price is formed by market participants looking at what other market participants are doing, as with Dutch tulips, pets.com, and market prices in various eras, the price is an irrational bubble which will eventually burst. This isn’t an original idea: Charles Mackay was talking about it 180 years ago. The original idea is mapping this behavior onto an Ising model,  running some renormalization group theory on it, and fitting to the result to get a forecast of bubble burstings.  Sornette, Ledoit,  Johanson, Bouchaud and Freund did it and told the world about it; may the eternal void bless them with healthy returns for being kind enough to share this interesting idea with us.

Here’s a plot of BTC close prices from MtGox (via quandl), with the LPPL model fit 10 days before the bubble pop. I wasn’t real careful with the fit; no unit root tests were done, no probabilistic estimates were made and no Ornstein Uhlenbeck processes were taken into account. This is just curve fitting. The result is compelling enough to talk about. As you can see, with these parameters, the out of sample top is fit fairly well. Amusingly, so is the decline.

test

What can we learn from this? You can see a “fair value” of around $20/BTC due to be hit in a few weeks, with perhaps a full mean reversion to $10/BTC.  BTC doesn’t seem to have a helpful “anti-bubble” decay; if anything, it is decaying faster than expected so far (it is possible I mis-fit the \omega). The fit parameters for this version of the model tell us a few interesting things about the herding behavior which you can read about in Sornette’s book.

I don’t have any strong opinions about using BTC as a currency. I think most of its enthusiasts  are naive and do not understand the nature of money and what it is good for. I do think BTC would work a lot better as a store of value with a properly functioning foreign exchange futures market. There are no properly functioning BTC futures exchanges at present; just an assortment of dreamers and borderline crooks cashing in on hype. This is more of an engineering and legal problem than it is an inherent problem with using BTC as a currency. The way things are presently set up, without shorts, any extra media attention will result only in people buying the damn things. Without the ability to easily short them, price discovery is impossible, and herding behavior is the rule. It ain’t a market without shorts. It’s a bubble maker. Shorts don’t guarantee there will be no bubbles; we see plenty in shortable markets, but a lack of shorts will virtually guarantee future BTC bubbles.

The largest computer ever built

Posted in big machines by Scott Locklin on March 28, 2013

While cold war jets are an old interest of mine, almost everything built to fight the cold war fascinates me. All ages are characterized by madness; only a few have that madness captured in physical objects. Consider the largest computer ever built: the “Semi-Automatic Ground Environment” or SAGE system.

The SAGE system was designed to solve a data fusion problem. Radar installations across North America kept watch against Soviet bombers. These needed to be networked together and coordinated with air defense missiles and interceptors. Seems simple, right? In the 1950s and 1960s, this was not simple. The country is big; hundreds of radar stations and sensors needed to be integrated. It wasn’t as easy as it was in England in WW-2, when enemy aircraft location was plotted by hand on maps as the radar data came in: North America is much larger, and the planes traveled much faster in the 50s and 60s. No group of people could really make the decisions in time to mount an effective defense. You needed some kind of computer to make the decisions.

These giant electric brains took up an acre or so of real estate, and were encased in huge windowless concrete pillboxes all over the country.

StewartAFBNY_SAGE-DC2_Life-8

The  SAGE system had many firsts: it was the first nation wide networked computer system. While it used special leased telephone lines and some of the first modems (at a blistering 1300 baud), it was effectively the internet, long before the internet. It was the first to use CRT screens. The first to use a “touch screen interface” via the use of light pens on the CRT. It was the first to use magnetic core memory. It was the first real time, high availability computer system. It was the first computer system to use time sharing. Many people attribute the genesis of computer programming as a profession to the SAGE system. Modern air traffic control, and computer booking systems of course, descend from the SAGE system.

StewartAFBNY_SAGE-DC2-Scope_Life-7

Each of the 27 computers that made up the system was a dual core 32 bit CPU made  of 60,000 vacuum tubes 175,000 diodes, and 12,000 of those newfangled transistors. Memory: 256k of magnetic core RAM (invented for this project) clocking in every 6 microseconds. These things weighed 300 tons, consumed 3 megawatts of electrical power and ran a blistering 75,000 operations per second. The dual cores weren’t used for multiprocessing. One was kept on hot standby in case the other failed. Since our grandfathers knew about fault tolerance, they had a system to replace the tubes before they burned out: downtime was typically only a couple hours a year.

sage_bb14

Each one drove 50 to 150 GUI workstations, and interacted with more than a hundred radars, interceptors and missile batteries. Remember that the next time you whine that your computer ain’t fast enough. No. This thing is less powerful than even a shitty cell phone (the 386 was probably approximately equivalent), and it did significantly more than your PC does. Each one was capable of coordinating the air defense of the entire North American continent.

sage_bb13

It is worth pointing out that these machines not only ran all that equipment, and dealt with all that data, they also guided interceptors to their target locations. The F-106 and F-102 could be directly controlled by the SAGE system after takeoff. We think of “drones” as the latest and greatest newfangled thing in warfare: they have actually existed for a very long time. In many ways the SAGE system was more impressive than, say, the Predator system.

SAGE_floorplan_large

Another interesting piece of the SAGE system was the  BOMARC missile system.  The BOMARC (made by Boeing and the Michigan Aeronautic Research Center -which no longer exists) was primarily ramjet powered, and carried either a small nuke, or a half ton of conventional explosives. It was entirely dependent on the SAGE system for guidance to target. It was also incredibly stupid and dangerous: the original rocket boosters used hypergolic fuels, and would occasionally spectacularly explode in their silos, spreading dangerous plutonium around.

The SAGE system started running in 1958, and didn’t stop until 1984. Was it necessary? Like many interesting cold war artifacts, SAGE was more or less made obsolete by missiles around the time it deployed. While it did cost around $90 billion in 2013 dollars, it also was responsible for a good fraction of the technological things we now take for granted. Only barbarians do not remember their history, so anyone involved in modern technological projects should study it for lessons in engineering practice on long term and large scale projects.

First lesson I take from the SAGE system: solving the right problem. SAGE solved an important problem, that of air defense from enemy aircraft. It did so beautifully. The problem was, by the time it was deployed, bomber attack was a secondary issue: the primary threat was ballistic missiles. The US probably needed something like it anyway, but it is worth noticing that long-term, large-scale projects could very well be made redundant by deploy time.

Second lesson I take from the SAGE system: assemble a team that knows how to solve similar problems. MIT already had a computer on hand which constituted half of the solution, so it made perfect sense to scale up some parts of MIT into the MITRE corporation and Lincoln labs. Let’s say the government got a bug up its ass to spend $100 billion building a computer with the same capabilities as a dog’s brain, or one that could program itself. What institution would be most qualified to do this? I can’t answer this question, because pretty much nobody knows how to do something similar.

The third and final lesson I’ll take from the development of SAGE: break down the problem into manageable pieces, and solve them. They used the technology on hand in the 50s; vacuum tubes, telephone lines and CRTs. They didn’t postulate any significant breakthroughs in order to get ‘er done. They made do with what they knew was possible As such, the path to success was obvious. Engineering genius came along the way. If you don’t have manageable pieces, you don’t have a real project: you have a wish. What are the manageable pieces needed to make “nanotech” or controlled nuclear fusion a reality? What are the manageable pieces needed to make quantum computing or deriving all electrical power from the sun a reality? I don’t know, and I don’t know of anybody else who does: therefore, such things do not count as legitimate long term projects.

“One of the outstanding things… was the esprit de corps—the spirit that pervaded the operation. Everyone had a sense of purpose—a sense of doing something important. People felt the pressure and had the desire to solve the air defense problem, although there was often disagreement as to how to achieve that end. Energy was directed more toward solving individual problems, such as making a workable high-speed memory or a useable data link, than it was toward solving the problem of the value of the finished product. It was an engineer’s dream.” -John F. Jacobs Former Senior Vice President The MITRE Corporation

Insanely fascinating SAGE manuals found here:

http://bitsavers.trailing-edge.com/pdf/ibm/sage/

Search engines for grownups

Posted in semantic web, tools by Scott Locklin on March 15, 2013

Google is an amazing company. It is so all-pervasive it has become a verb. It also annoys the hell out of me, and I avoid it whenever I can. No matter how annoying their interface becomes, or how many weird and privacy invading things they do, no matter how many crypto-religious fruitcakes they hire, they’re  the only game in town for most people.  I don’t like monopolies. I think monopolies are inherently evil and should be shunned by people  with a conscience, or tamed by the judicial system. Since the US government is presently composed of ninnyhammers obsessed with irrelevant things, and geldings who have forgotten about the anti-trust laws, it falls to the individual to do something about it. Where is Teddy Roosevelt when you need him?

Health Care Long Haul Analysis

There are alternatives available. The problem is, nobody knows about them. Google dominates people’s thoughts about search the way Microsoft used to dominate people’s ideas about computers in general. Some of the alternatives are very much worth knowing about, even if you are happy with using Google.

For most people, the best alternative is Yandex.com. Yandex is the biggest player in the Russian market. It’s been around for  longer than Google has, it is run by mature computer scientists who specialize in machine learning, and is one of the best search engines you have never heard of.  The English language version of their search engine is considered experimental, but the results are very good. For general search, it is as good or better than Google. The results are uncannily accurate, and the clutter is practically nonexistent. Speaking of clutter: I’m really happy with how their page looks; no clutter. The English language page is missing some “searchy” features at present: for example -no English language news aggregator  (which means, no news results in the basic search either). This feature exists in Russian, so I assume it is coming. Multimedia? Well, they’re not so hot here, but searching for funny pictures is a rare task for me. Google has a marginal win on maps for the US, mostly for the public transit option that works (Yandex seems OK for driving maps). The Russian language translation facilities at Yandex are, of course, excellent: much better than Google. As a slavophile, I find this invaluable.

One privacy advantage Yandex has which Google never will: Yandex does not do business with American intelligence agencies.  I do not like the fact that Google has become an arm of US intelligence agencies. It is to their credit that Google discloses their relationship with the US government (most of Silicon Valley is in bed with the spooks, but they don’t talk about  it). It is the surveillance  state that I abhor. Yandex may very well be doing the same thing with the Russian government, but the FSB is a much smaller threat to American civil rights than our own spooks. While I see no immanent dangers from the all-seeing eye, and I am far from paranoid, the US is going through a weird time right now, and history is a dark and bloody subject. Do I really want the future government  to know what websearches I was doing in 2010? No, thanks,  tovarich.

clouseau

As a crypto-academic consultant, I end up doing a lot of searches for technical papers. Google is OK at this (I have found no utility in “google scholar” -the regular search results are equivalent). Yandex actually does significantly better.  Of course, these kinds of searches are a broad net. If you have a decent idea of what you’re looking for, INSPEC is still the gold standard. You have to pay for INSPEC, or walk to a university library, but that is what serious people use for deep search in an academic subject.

Yandex does fail one important use case for me. One of the fundamental ways people get work done on computers is searching for error messages and bugs and “how-tos” on message boards. If you’re dealing with a computer problem, chances are good that someone else had the problem, and asked others about it on an online forum; whether it is a compiler directive or a wonky KDE feature. This is a tremendously helpful knowledge base. Google beats everyone at this at present, mostly because you can sort by date. Close behind google for this use is duckduckgo.com.

I have high hopes for Yandex. While Google hires a lot of rock star programmers and well known computer scientists, Google also seems unfocused and adolescent (read the takimag article for more concrete criticisms). The Yandex guys: they’re grownups. They have succeeded in a country of flinty hard men.  People actually died trying to do business in Russia in the 90s; these guys made it. They’ve only been doing English for a little while, and they’re already better than Google at quite a few things. Search in Russian is much harder than search in English, as the language is strongly inflected. So, Yandex solved a much harder problem than Google did at the outset. Google wastes its time with nonsense like Google+ or attempts to bring about the “singularity” by hiring Crazy Ray Kurzweil. Meanwhile, Yandex is using its technology to assist particle physicists at CERN, which seems a bit more impressive. I’ve seen significant improvements in Yandex search results over the past few months. It is very exciting to watch a complex contraption like this improving so quickly. Consider this: they have achieved all this on revenues which are 1/60 of what Google takes in.  The flabby marshmallows at Google may not be worried now, but these guys are coming for them. If I had a bunch of steel hard brainy Russian cossacks in my rear view mirror, I’d be nervous.

Meanwhile-in-Russia

On a slightly different topic: one of the hardest things a technical or fact-oriented person looks for on the internets is data. Most search engines are completely useless for this type of thing. It’s really a different type of problem from ordinary search. I have only found two search engines which do this well.

One is Wolfram Alpha, which I made fun of at one point. I now find it indispensible for looking up simple facts and figures, using an English language query. It doesn’t have large amounts of data, but it’s easy to get to the data: just tell it what you need. Kudos to them for getting this right. It ain’t bad for doing integrals and such either; certainly more convenient than using some long-in-the-tooth open source computer algebra system like Axiom or Maxima. While it kind of sucked when it first came out, the suck is all gone: this is an excellent product every numerate individual should avail themselves of.

The other is quandl.com. I have been using it for only a few weeks, and don’t know how I lived without it. I had a lot less data to work with, and I went through a lot more trouble to obtain it. For quants, this is an indispensible tool for historical economic data. For datanauts in general; ditto. Before quandl, you had to scrape publicly available data from myriad websites. Post-quandl; well, it’s easy to get at, and if you register with them, you can download dynamically updated data in easily parsed CSV format all damn day. Hooray for Quandl! Please don’t sell out to gigantor corp that will make you suck. If you must, sell out to Yandex!

meanwhile-in-russia-big-priest-with-big-gun

Follow

Get every new post delivered to your Inbox.

Join 169 other followers