« July 2007 | Main | September 2007 »

August 31, 2007

Use handset software to build mobile covereage maps

Over the past few weeks, while researching other subjects, I've stumbled on multiple websites providing mobile signal coverage maps.  Two days ago, I hit an excellent HSDPA vs EVDO coverage map for the US that I can't seem to locate right now.  :-)   Today, I stumbled on Signal Map, a very interesting US centric site that aggregates user experiences.

Signal_map

Earlier I had bookmarked Cell Phone Tower Search, a site based on the FCC's database of US towers, and Sitefinder, an official Ofcom site mapping UK basestations.

I know I have seen other sites that, like Signal Map, attempt to aggregate actual user experiences, for example, Dead Cell Zones.  But I haven't seen anyone who's taken the next logical step in acquiring accurate user data.

Why not offer mobile phone users some downloadable handset software that automatically captures as much information as possible and sends it to a common site like Signal Map?  Yes, that puts a cost on the subscriber who agrees to participate, but I know my mobile plan includes a bundle of SMS messages that I never exceed and a data plan that I don't typically exceed.  If I could configure how many SMS message were sent or how much data went out per month, I'd be happy to participate in such a scheme.

This approach may be problematic in areas where SMS and/or data is expensive, but with generous bundles and the (relatively) low cost of mobile service in the US, it seems likely this scheme could work.  And, one such phone might generate 20-50 reports per month, much more user created content that you are likely to get by asking users to type information into your website.

To the folks at Signal Map, what do you think?

August 29, 2007

Best Skype outage post mortem so far

It's been more than ten days since the global Skype outage time to reconsider what actually happened.  The most credible analysis is not from Skype, but from Julian Cain in a series of comments (here, here and here) that he made to a Gigom article about the outage (or see the single file in "References" below).  Julian is lead architect at Pando and, earlier, was head of Mac development for Kazaa at Sharmen Networks.  So he knows a lot about peer-to-peer networks and his work at Sharmen put him in a position to know quite a bit about the P2P technology that's also used by Skype (and likely by Joost).

Background

Skype's P2P technology was evolved from FastTrack, originally developed for Kazaa.  Their P2P network consists of clients and supernodes.  Skype distributes client software which includes all necessary supernode software, so any client that has appropriate capacity and connectivity can be promoted to become a supernode.  Supernodes dynamically link to other supernodes to support a distributed database and distributed index (called the distributed hash table or DHT).  For Skype, the DHT layer is responsible for maintaining client presence info, contacts and icons/avatars, and handling call routing.

Root Cause

But as I pointed out in several posts during the outage, there's also a centralized component to the Skype network.  That's the login servers.  Julian refers to them at the "authentication servers" and/or "login/connectivity servers."  They are implemented as one cluster of about 50 machines.  As for the root cause of the outage, he asserts:

Skype employees introduced code into the "login/connectivity" server farm that was not compatible with current Skype clients.

Other Issues

While that was the root cause, it was helped along by other network characteristics, notably that each client connects to only one supernode at a time.  According to Julian, there are 300+ clients per supernode and if a supernode goes off line, the 300 or so clients connected to it must reenter their "connecting" sequence, i.e., find and connect to another supernode.

A network with 8 million on-line users implies ~27K supernodes, a figure that's consistent with the ~20K supernodes estimated by Desclaux and Kortchinsky in 2005-2006 (see their June 2006 Recon presentation, PDF here).  The other point from measurements by Desclaux and Kortchinsky is that each supernode attempts to maintain a list of all other supernodes which means there is a substantial amount of traffic between supernodes.  This clearly contributed to the slow recovery, during which Julian commented:

Right now there are approximately 10,000 Skype networks instead of one single "in sync" network.

Scaling?

So I wonder, apart from the login server cluster as a single point of failure, is there also a scaling issue?  FastTrack's breakthrough was the use of supernodes to make the system more scalable.  But was that just one layer of scalability?  If so, what happens when there are 300 million on-line users and one million supernodes?  Perhaps Julian (or another P2P expert) could comment...

References:

I've extracted and assembled a complete copy of Julian's relevant comments.

Vanilla Skype part 1 and part 2, by Fabrice Desclaux and Kostya Kortchinsky, Recon, June 17th, 2006.

Skype traffic during the week of the outage, captured by Phil Wolff of Skype Journal.

Skype_outage_aug_2007

August 22, 2007

Patent infringement defense just got a little easier

On Monday, the US Court of Appeals for the Federal Circuit (CAFC) issued a decision that substantially eases the burden on defendants in patent cases. Patently-O has a brief posting here which has attracted numerous comments

For those who don't follow patent matters closely, the CAFC is the court that handle appeals in US patent cases and this was a rare unanimous en banc opinion, i.e. all ten circuit court judges participated and all agreed.  The opinion was on a procedural matter brought by Seagate Technologies but the key results were:

  1. A new "willfulness standard" which now requires clear and convincing evidence of "objectively reckless" conduct.  "Willfulness" determines whether a defendant can be liable for "enhanced damages" (more money than the patent holder's actual damages).  This new standard makes it easier for defendants to avoid being liable for "enhanced damages."
  2. If a defendant relies on a pre-litigation "opinion of counsel" as proof they were not "willfully reckless," the use of that pre-litigation opinion no longer constitutes a waiver of attorney-client privilege with their trial counsel.

Small steps perhaps, but encouraging none-the-less.  On the other hand, I really enjoyed Duncan Bucknell's comment to the Patently-O posting:

If the US routinely had the losing party liable for (at least some of) the winning party's attorney's fees, then I think this would have an equally punitary effect. The award in each case will (presumably? - sorry to take a shot at US litigation expense) be lower, but the overall award of $ would be much higher.

Would also clean up the 'patent troll' problem, ...

Well, we can dream...

August 21, 2007

Skype's centralized control of P2P network parameters

The blogsphere is abuzz with reaction to Skype's second attempt to explain what caused the recent crash of their entire "peer-to-peer" network, but I haven't seen any comment on the one thing that struck me (in their 4th paragraph):

Once we found the algorithmic fix to ensure continued operation in the face of high numbers of client reboots, the efforts focused squarely on stabilising the P2P core. The fix means that we’ve tuned Skype’s P2P core so that it can cope with simultaneous P2P network load and core size changes similar to those that occurred on August 16.

As I commented earlier, we know from presentations by Desclaux & Kortchinsky at Blackhat Europe (PDF) in March 2006 and at Recon in June 2006 (PDF in 2 files: one and two), that there is substantial traffic between the (3rd-party-owned, distributed, P2P) supernodes that form the core of the Skype P2P network and Skype's (centralized) login servers.

If Skype's explanation is correct, it's clear Skype also has a way of distributing parameters to supernodes that tune their behavior.  I'm not surprised.  It's a logical to design in both measurement and tuning capabilities. 

But such centralized capabilities also represent a potential venerability.  What would happen if a black hat got access to those tuning capabilities...

August 20, 2007

Understanding the Skype Outage

Skype's official explanation.  Phil Wolff has a good set of interpolated comments on the official explanation.  There are two things to add.

1.  As the Register points out, last Tuesday was Microsoft’s monthly patch day and those patches required a re-boot.  If we believe Skype that their problem started with excessive login attempts, this is the only plausible explanation on the table.

2.  There was no patch for the Skype client (i.e. this was routine and hasn't been widely adopted) so either:

  • we face another problem next time Microsoft issues patches requiring a reboot, or
  • they fixed something on their servers. 

I suggest the latter.  As I pointed out during the outage, Skype generates a lot of traffic between the login servers and supernodes (see slide 16 in  DESCLAUX and KORTCHINSKY's presentation.  I suggest Skype has patched something on the login servers.  It's well known (e.g. Desclaux & Kortchinsky) that Skype login is a centralized function.

Meanwhile, it will be interesting to see if any additional comments or new client releases appear from Skype in the coming days.  I suspect not, as their approach to security has always in included both encryption and obfuscation.

August 16, 2007

Skype crash probably not DOS

Skype says it's a software issue that they are working on.  For me, it's six hours now and I haven't had more than a few minutes of "connectivity" in the past hour.  And, when connected, the on-line count was at a new low:

Skype_807_crash_with_65k_online

Another glance through DESCLAUX and KORTCHINSKY part 2 suggests, in addition to traffic from clients to login servers, Skype generates a lot of traffic between the login servers and supernodes (see slide 16 for example).  This makes the login servers a doubly critical piece of Skype infrastructure.  Of course, if they merely introduced a bug in their login servers, it shouldn't take 12-24 hours to do a roll back.

Skype network crash - could it have been a DOS attack?

At BlackHat Europe, March 2nd and 3rd, 2006 (PDF here) and then at Recon, June 17th, 2006, Fabrice DESCLAUX and Kostya KORTCHINSKY gave a rather detailed analysis (in two pdfs, part 2 here) of how Skype obfuscates what they are doing. 

One of their conclusions that really struck me at the time was this bullet:

Fully trusts anyone who speaks Skype

As is made clear in their analysis, Denial of Service attacks are possible.  I wonder...

Skype Network Crashes

As others are reporting, Skype clients have been disconnecting and reconnecting around the world.  Here is Boston, I've been off line and back again at least four times in the past hour.  And when I've reconnect, an amazing small number of others are seen as on-line:

Skype_network_crash_16_aug_2007

In recent weeks, I've been seeing over 9 million users on line at this time of day, so 615K suggests very little of the global Skype network is accessibile to me, if they are on-line at all.  A few minutes ago, Jan was seeing  773K other users from his site in Malaysia, so this really is global.

We've known, at least since 2004, that Skype's peer-to-peer network wasn't strictly P2P.  The vast majority of traffic (control and media) is P2P, but everytime a client comes on line (well at least at startup and each login), it interrogates the Skype Login Server at skype.com.  We also know that a Skype client must establish a connection to a super node to successfully login.  But I don't know if there is anything about supernodes that cause them to crash if they can't reach a centralized or semi-centralized Skype server.

It will be interesting to see how this develops.  Hopefully Skype will be forthcoming, but if not, I'm sure third parties will piece together an answer.

August 10, 2007

No Quechup please

Quechup - a social networking site that starts each new subscription with anti-social spamming!

I'm interested in social networks and community sites, so I've joined many such services, only a few of which I actually use with any regularity.  A few minutes ago, I got an invitation to Quechup and went ahead and signed up.  Unfortunately, I didn't Google their name and check other people's comments in advance. [I'm just back from vacation and not thinking???]   Worse, I blasted through their sign up procedure without my usual caution.

During the signup process, Quechup.com suggests it search your address book to check if some of your email contacts have already signed up as well, so as to give the networking process a head start.  I've seen this before and I'm usually very suspicious, but this time I acted like a total newbie.  I let them see one of my address books, in which they found only the person who had invited me.  What they didn't mention is they immediately spam each of the addresses they got access to.

If you got such spam, I deeply apologize.  I've been on-line for years.  I should know better.  I do know better!  What else can I say?  I'm sorry.

Latency and throughput using iPhone on EDGE

The following represents just one set of tests on just one evening at just one location in the Boston suburbs, but perhaps someone will find it useful.

Thanks to Jacob Barss-Bailey who made the measurements (on August 1st) and gave me permission to reprint them.  From Jacob's email:

The latency I was able to see last night, which was likely on an unloaded network (it was quite late, and I'm out in the country) was between 500ms and 2500ms, about what I was expecting.  About 75% of the time, I was seeing latency between 500ms and 700ms though, so it seems that latency is usually pretty good, but when its not, its pretty bad.

FYI the bandwidth I was seeing downloading random data was between 60kb/s and 120kb/s, heavily skewed towards the top of the range.

Obviously, as it's an iPhone, these are measurements on AT&T's EDGE network.

Well it's consistently better than dial up...

My Photo

NMS Home

  • NMS Communications Logo

Search this Blog

Subscribe by Email

My Online Status

Copyright 2007 NMS Communications

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Technorati


Site Meter

Upcoming Travel & Conferences


Links