June 03, 2008

Education due for radical disruption

The Internet has disrupted music publishing, encyclopedias, classified advertising, news and is poised to disrupt both television and the movies.  What else?

Education is overdue, but most of today's academics still don't get it.

For decades, even centuries, we've seen increased productivity in agriculture, in industry and most recently in services.  Productivity gains show up more useful output per person employed.  One place where this has not happened is education, as the graphs below illustrate.  Indeed, in education, schools compete on the basis of lower student faculty ratios, i.e. lower faculty productivity.

Education productivity comparisonEducational productivity graph Individuals vary in how they best learn things but, in most cases, direct interaction with a teacher is a significant benefit.  So if the Internet is just about better access to information or remote video interactions with teachers, it may be wonderful, but it's hardly disruptive. What's needed is some way to multiply the number of teachers, dramatically.

A suggestion of how this might come about, i.e of how the Internet might disrupt the traditional academic process, comes from italki.com.  To quote Gang Lu,

italki is a language social network providing free language learning content. Unlike the traditional one-on-one education approach, italki is building an online community where any user can play two roles, tutor and learner. In italki, users can find language partners, post their foreign language questions which can be answered by other users, and join groups for language learning. The new version introduced a new feature called knowledge which is basically a Wiki service. Now users can not only share the language learning materials they uploaded but also collaborate on creating free language learning textbooks by their own. So italki is building probably the largest Wiki for language learning content!

This is not some academics opening their university course materials to the world (although I'm sure that's useful).  These are individuals contributing course materials and, more importantly, tutoring each other.  This approach tackles the hard problem, the student teacher ratio.

I'm sure there are other examples that I haven't stumbled on as yet.  In any event, it will be interesting to see how this sector evolves.  italki.com started in Shanghai and is already dealing with 16 languages and has global participation.

May 30, 2008

SIP revolution, massively delayed — but there's hope

The SIP Center asked for an article which I finally wrote the weekend before last.  My article was actually rather negative, but they published it anyway.  Now I'm feeling a little guilty as there is an optimistic note I could have used as my conclusion.  So let me try again...

First let me summarize my problem.  When SIP emerged in 1996, it's support for direct connections from one user to another was extremely compelling.  This was the VoIP protocol which would lead to a complete revolution in communications.  Yes, you might refer to a directory service, but you wouldn't need an operator to make a phone call.  You could do it yourself, directly.  Unfortunately, that revolution never happened.

So far, no revolution

The biggest change in telecommunications in the past 12 years has been the global deployment of three billion mobile phones, all based on conventional circuit-switching and Intelligent Network technology — nothing to do with SIP. And arguably, the most interesting telephony service enhancement, after mobility, came from Skype with its seamless integration of presence, instant messaging, wideband audio and video. But Skype is based on proprietary protocols, not SIP. Finally, VoIP technology has helped drive down the cost of international calling, but using MGCP, H.248 &/or H.323 protocols much more than SIP, at least so far.

SIP has been adopted by PBX manufacturers in recent years, but this doesn’t seem to have changed business practices at all. The IT department still buys the PBX and the telephone sets from a single vendor and then contracts with a service provider to handle calls outside the enterprise.

And then there's IMS

SIP has been adopted for use in the IP Multimedia Subsystem (IMS), but this completely warps the original SIP vision.  IMS is a centralized system — a next generation network for mobile and fixed operators.  It's the complete opposite of the original vision for SIP.

Why have things gone so far astray?

SIP assumed an end-to-end Internet

SIP assumes it's possible to make end-to-end connections over the Internet and therefore a SIP session can know about and use globally valid IP addresses.  That was a naive assumption, even in 1996-1999 when SIP was being defined.  The real Internet contains firewalls, network address translators (NATs) and other "middle boxes."  They are not going away, it's only getting worse over time.  Today, applications must be aware of and able to work around middle boxes and other network problems. 

Many middlebox issues can be overcome with the help of client software and central servers implementing Interactive Connectivity Establishment (ICE), a recently completed IETF proposed standard that in turn relies on STUN, TURN and/or RSIP.  A continuing obstacle for direct user-to-user connections is the need for central servers for STUN, etc..

So it there no chance for the original SIP vision of direct user-to-user communication?

P2PSIP — a reason for optimism

Actually, there is some reason for optimism.  The advent and widespread adoption of Skype showed what was possible and suggested how one might distribute central services among peers, potentially avoiding the need for an explicit service provider.  The past few years have seen rising interest in peer-to-peer SIP which has resulted in an IETF working group under the name p2psip.  Their goal is "to leverage the distributed nature of P2P to allow for distributed resource discovery in a SIP network, eliminating (or at least reducing) the need for centralized servers."

Assuming this is completed (during 2008 & 2009), we'll have the elements with which one could make a SIP-based open peer-to-peer communications system.  It will be interesting to see actual software implementing the ideas of the p2psip group.  We may yet see a revolution!

January 27, 2008

How tiered Internet pricing could actually facilitate P2P

Time Warner Cable's planned experiment with tiered charging for Internet access has generated a flurry of coverage in the blogsphere, but no new insights (at least that I've seen).

The primary problem ISP's complain about is that 5% of their customers use 90% of the available bandwidth and when they examine this traffic, it's mostly peer-to-peer file sharing.  A reasonable question is how to allow as much of this traffic as possible without increasing an ISP's variable costs or slowing down their other users.

This may not be as difficult as it appears.   Indeed if Internet access was as competitive as mobile telephony, we might already have seen what I'm about to propose — a combination of bundled pricing equivalent to mobile's "free nights and weekends" and "free on-net calls" with a way to facilitate P2P traffic that leverages exactly these "free" periods.

An ISP's costs

ISPs have some costs which are relatively fixed and others that are tied to usage.  A network is a relatively fixed cost and when it's not full, the incremental cost of adding traffic is zero!  This is the reason mobile operators give away free nights and weekend.  They've built their mobile network for the peak daytime traffic, so it costs them nothing to run promotions that add incremental traffic at off hours.  Peak hours and off hours may be different for an ISP, but the concept is the same. When a data pipe is lightly loaded the ISP's cost of adding incremental traffic is zero.

On the other hand, some ISP costs are usage based, for example "IP Transit" or more properly, Internet Transit.  This is the ISP's upstream cost to send and receive traffic to/from the rest of the Internet.  However, even here, usage-based costs occur at heavy usage.  Light usage periods don't save money.  To understand what's happening, it's worth a digression on Internet Transit.

Internet Transit

Internet access is monopoly or duopoly or a heavily regulated industry.  The middle mile connections from the local network to the Internet backbone may or may not be competitive depending on where you are.  But the Internet backbone itself is extremely competitive.  If you can get to a major Internet Exchange Point in the US or Europe, there are many providers offering extremely competitive rates for Internet Transit.  Typically these services are priced on a megabit per second per month basis (Mbit/s/Month) with lower rates for higher volume commitments.  The other key idea is that charges are based on the 95th percentile of all the five minute data rate samples taken during the month.  So an ISP can have a few bursts above their typical rate, as long as they represent less than 5% of the sampled intervals.

But this also means there is no extra cost to run at or near the typical rate at all times.

Local traffic

Even more important, if file sharing is done with other computers on the same ISP's network, then there is no need to pay for Internet Transit at all.  The question is how to figure out which potential peers are "on-net" and which are "off-net."

Sending signals to P2P software

Most P2P file sharing software has relatively little knowledge of locality.  Some P2P software practices "prefix awareness," for example, Joost gives preference to peers in the same /24 IP address block when they are available.  But if a major operator provided an automatic way for P2P client software to determine whether a prospective peer's IP address was currently reachable "for free", it seems likely the file sharing community would leap on it, and if there's money to be saved, active file sharers would download the new clients immediately.

A standard way to present such information might be via an extension to the XML-based response codes in one of the whois information exchange proposals, e.g. from ICANN or from APNIC.  Also, while what I'm proposing might start as a pricing plan rather like a mobile operator's "free nights and weekends" and "free on-net calling," it's not hard to see extensions where an ISP could offer dynamic access to underused capacity to those programs that were prepared regularly interrogate an ISP's server and use just the advertised off-hours capacity.

In closing

People liked fixed price deals.  Unlimited is great, but there's plenty of experience with bundles of minutes and the idea of data bundles has already showed up in 3G mobile data plans.  The combination of several tiered data bundle prices with the availability of "free" connectivity for "on-net" peers and during off peak intervals is likely to appeal to file sharers and produce better results for both the sponsoring ISPs and file sharers alike.

January 03, 2008

Joost entering world's most advanced P2P TV market, i.e. China

Joost_logo_2 Gang Lu reports rather specific rumors that Joost is going to enter the China market on or around the Chinese New Year (Feb 7th) by partnering with the Chinese portal Tom.com.  That's not surprising as Skype partnered with Tom.com when they entered the Chinese market.  What's different is Skype was innovative everywhere in the world. 

Pplive_logo_2 Joost will have to play catch up in China, as the Chinese are the world leaders in P2P TV and P2P streaming media.  Well established Chinese firms like PPLive and PPStream pre-date Joost by nearly two years (see my earlier comments).  And today, in China, the P2P market is clogged with many more players like UUSee, Vakaka and Vatata.

Early entrants like PPLive focused on live TV in 4:3 ratio with simple user controls, perhaps for those familiar with TV but not with TIVO, however this is changing rapidly.  What's more, friends report performance has been excellent, even in early 2005.

Joost has not worked so well within China.  From one friend and from Google translations of Chinese reviews, it appears there are many places in China where the bandwidth requirements of Joost cause performance problems, even while PPLive works well.  Presumably this will be cured by local Joost support within China. In other words, I suspect a lack of local peers within China means, for now, too much Joost content must flow over clogged international links.  A local presence should cure that.

Vatata_network_2Meanwhile, Gang Lu describes a hybrid system (streaming servers and P2P bandwidth sharing) from Vatata which:

"... supports most of the video formats, including Microsoft, Real, Flash, Apple, MPEG1/2/4, OGG/MKV etc and H.264. Vatata system consists of two sub-system: Vata, the back-end streaming platform and Tata the front-end player. Tata is absolutely fascinating. It supports On Screen Display (OSD) and allows plugins, which means you can run multiple modules (e.g. instant-messenger, channel list, etc) on top of the video screen, which just sounds like what Joost does."


So it's clear Joost is moving into a very advanced market.  It will be interesting to see the resulting cross fertilization.

December 16, 2007

Emerging Communications Conference 2008

I'll in California quite a bit in March and April, but the highlight is my first week, when I'll be speaking at a new conference, eComm 2008, March 12-14.  While the conference in new, the community is established and fascinating.  eComm 2008 being put together by Lee Dryburgh, who was on the program committee for O'Reilly's eTel conferences.  When O'Reilly cancelled eTel 2008, Lee took the initiative to keep that incredible community alive.  He was soon joined by many others.

Ecomm_2008_logo_2

Click through the logo at the left for conference info.  Right now there's a board of advisors, an incredible list of speakers with more on the way, a wiki and a Facebook group with 170 friends!

The first thing I look for in a conference is interesting people, then new ideas.  eComm promises an abundance of each.  The focus is next generation personal communications and the schedule is set up for rapid fire delivery inlcuding many 5 minute and 15 minute sessions.  As far as new ideas goes, this will be a fire hose!

*** Correction: 12/21 ***

The conference is being held in the Computer History Museum in Mountain View.  This easily beats the typical conference facility, but it means there are only 300 paid admissions available.  Registration has opened, here.  If you register before the end of 2007, the $1495 registration fee is marked down to $1195.

I look forward to seeing you there.

December 02, 2007

Managed Storage Futures

Recently I wrote about differences in the exponential growth rates of computing and networking and promised to say more about how these differences cause substantial shifts in the technology landscape.  Managed storage is one example.  The relevant doubling rates (from that earlier post) are:

Doubling Rates

Technology Measure Months
Computing performance 18
Storage capacity 12
Networking performance 15
Access connectivity 20-26

The increase in storage capacity per dollar has been phenomenal and is one of the reasons that Google can offer Gbytes of free storage for email and that Amazon can offer their Simple Storage Service (Amazon S3) at extremely low rates.

But it's also caused headaches for IT directors, as installed equipment becomes obsolete long before it's fully depreciated, and employees and department heads grip about inflated internal billing rates for storage.  Pity the IT staffer who sends a broadcast message justifying corporate email storage limits because "it costs the company X cents per megabyte per month."  I've seen such messages, and the employee ridicule they engender.

This sounds like a perfect opportunity for a managed service — provide an interface that looks a storage area network or network attached storage, using multiple (for reliability and arbitrage) Internet-based storage services to provide the actual storage.  But now differential growth rates become a factor.

Storage costs decline a bit more rapidly than the cost of Internet transit.  So, already it's the case that network-based storage is extremely low cost for backup but less affordable for transactions.

But the real problem is the cost of access connectivity.  If you're selling managed services to IT departments, you need to provide services at their premises.  Local connectivity is not fast, cheap or reliable, and the pace at which it improves is glacial in comparison with storage or Internet transit.

Has this prevented the emergence of managed storage solutions?  Of course not.  But most existing solutions focus on remotely managing equipment that's physically on the enterprise premises.

Is there opportunity for network-based managed services.  Also, yes.  But you will need considerable focus on local connectivity, both for the numbers you use in your business plan and for the specifics of how you implement the service.  Some thoughts:  interface your managed service via a remotely managed on-premise box that includes caching?  use a dedicated access link to guarantee QoS?  ???

In any event, three years from now, you can count of disk storage being ~8X more affordable, Internet transit being perhaps 4x more affordable, but local connectivity only 2x or 2.5x.  Don't give up your great  business idea, but plan accordingly.

October 22, 2007

Availability -- more than presence and a nice implementation to boot

I’ve never liked the term presence or the way the function is implemented in instant messaging systems.  I want to indicate my availability — something that, at any given moment, may be different for my wife, my co-workers or my friends in the blogsphere.  And, if I check my PC for messages at 6am, just before walking the dog, that doesn’t mean I’m planning to respond to those messages or accept calls or chats at that moment — my dog is desparate and she’s letting me know it!

Now there’s a new kid on the block, EnThinnai, that’s launched the beta of an information sharing site featuring privacy and control.  They also include a concept of availability that looks very much as I desire.

In addition, they’ve done a peer-to-peer implementation with a choice of query (you only ask when you’re interested in knowing my availability) or subscribe (you want to be notified when I transition to a specific state).  This makes a lot more sense to me than a central server farm monitoring everything I do and continuously broadcasting it to people who only contact me once or twice a year.

Aswath Rao has more info at the EnThinnai blog.

 

September 16, 2007

Architecturally induced denial of service susceptability

Fundamental design decisions have a big impact on how a network behaves under stress.  I just noticed two otherwise unrelated posts both touching on network design issues that result in (or protect against) network collapse under stress.

Yesterday, I wrote about August's Skype outage where the system took over 36 hours to recover from a collapse.  The issue was too many clients attempting to reconnect at once.  Instead of clients “backing off” when supernodes weren’t immediately available, they kept hammering away, trying to log in.

Last week I wrote about the ways the Internet and the Telecoms networks respond to congestion in the backbone.  In December 2006 an earthquake off the coast of Taiwan broke multiple fiber cables causing a congestion collapse on the Internet backbone in significant parts of Asia.  The issue was too many computers attempting to reestablish TCP sessions with the result there was no capacity left for any session to actually send data.

In both cases, there were externally induced problems but, rather than recovering, there was the equivalent of a denial of service attack, self inflicted!

Both of these failures have a fairly simple solution, at least architecturally.  Under conditions of severe overload, the system must be able to restrict new attempts (new TCP sessions, new Skype logins, etc.) to some small percentage of the available capacity.  This allows the rest of the capacity to serve the logins, sessions or calls that do get through, with the result that what capacity remains is put to good use.

As I commented last week, this is one place where the telecoms industry has the correct architecture.  When disaster strikes subjecting some part of the network to overload, it's easy to restrict new call attempts on trunks into the congested area, for example by call gapping. This limits the amount of new traffic to that which the network can handle.  Thus, if only 30% capacity is available, at least the network handles 30% of the calls, not 3% or zero.

Here's one place where network architects can learn from established telecoms practice.

September 15, 2007

Skype outage: post mortem evolves

Since my last summary, some new information has emerged.  Gerry Blackwell interviewed Skype's Director of Operations, Michael Jackson, got some comment from Martin Geddes, and wrote an interesting article.

I've also had an email exchange with Julian Cain, which I reprint below, and a brief exchange with Philippe Biondi of SecDev.org who referred my to an interesting blog post (in French) by his colleague Cédric Blancher at  EADS France, Innovation Works (Suresnes).

New information from Michael Jackson (in Gerry Blackwell's article) includes the idea of five supernodes per cell (of 300 users):

Each supernode handles about 300 nearby users. Skype configures five in each cell for redundancy. So with upwards of nine million users online, it takes something like 150,000 supernodes to make Skype work.

The triggering event is still attributed to massive computer reboots after Microsoft's Patch Tuesday.  Everything else in the article is consistent with the best earlier explanations, including Julian's which I summarized here, i.e. as Blackwell puts it:

... the real culprit, Skype now says—was a resource allocation algorithm in the client software that could not adapt to such a set of circumstances. Instead of clients “backing off” on their attempts to validate on the network when supernodes weren’t immediately available and waiting for the ship to right itself, they kept hammering away, trying to log in.

And the solution, having clients back off when supernodes aren't immediately responsive, is obvious.  What's left to understand?

1.  I still haven't seen a plausible explanation of why Microsoft's "Patch Tuesday" resulted in problems on Thursday morning, only a lot of questions.  If the problem was induced by massive reboots, why didn't it happen on Wednesday morning?

2.  I still haven't seen a reasonable discussion of scaling.  As I wrote back in August,

I wonder, apart from the login server cluster as a single point of failure, is there also a scaling issue?  FastTrack's breakthrough was the use of supernodes to make the system more scalable.  But was that just one layer of scalability?  If so, what happens when there are 300 million on-line users and one million supernodes?  Perhaps Julian (or another P2P expert) could comment...

Indeed I emailed Julian about scaling and also about Joost.  This was his reply.
On 09/05/2007 05:06 PM, Julian Cain wrote:

Brough,

Skype and Joost are utterly different however Skype is more like Fasttrack(Kazaa). Joosts' Network architecture is mainly "Centralized", they have their own server farm of Supernodes as well as Authentication and Jabber servers. The nature of Joost is less dependent on peer to peer routing as it's basis is tuned towards QoS. Joost peers route traffic and relay UDP based payload as media data streams as well as keep a small cache of what they have recently viewed, however currently every Joost peer is directly connected back to the Joost home servers unlike Skype and Kazaa where once authentication occurs it's "out of our hands".

I agree on the extent of Skype scalability being very limited because of the nature of the Supernodes. At any one time the Supernodes hold ~300-500 child nodes and maintain an "Overlay" network which consists of another several hundred Supernode to Supernode connections. Ie* The Supernode network is very dense in order to provide for best means routing of least cost, however the flaw in this architecture is where the "Overlay" network reaches a capacity and is unable to reliably route traffic. I do not currently have any statistics on how the "Overlay" layer is not scalable but as more Supernodes arise the management of the "Decentralized Data Store" becomes a very hard task as well as keeping this "Overlay" in one single "in sync" network. This was proven with the Skype outage as the network was "trying to heal" it had to start from many 10s of thousands of "Overlay" networks which very slowly were able to sync again as a "single" network however is still an issue today with presence.

For the current Skype "Overlay" network to scale indefinitely while maintaining  a "Single" network infrastructure it needs in place an organizational hierarchy of Supernodes and a level of Service for each of these Supernodes. *Ie. If Skype Supernodes worked in a way such as in Fasttrack then when the network reached 100 million users it would began to crawl. This is due to the dense nature of the upper "Overlay". I can only assume that Skype has thought of this and that when the Supernode ratio is beginning to "bottle neck" then there would be some ordered Hierarchy as to what role each Supernode was playing, otherwise the more Supernodes the more dense the "Overlay" the more the data is relayed back and forth before considering the "Supernode Overlay" into it's own Denial of Service attack.

I hope this helps to some degree, let me know if you have any other questions.

~Julian

So to the extent I have time to look into P2P technology further, I plan to explore what's been written about hierarchy in P2P networks.  Here are some references (which I've found but have not read as yet):

RFC 4981 on Survey of Research towards Robust Peer-to-Peer Networks: Search Methods

Hierarchical Peer-to-peer Systems by L. Garces-Erice, E.W. Biersack, P.A. Felber, K.W. Ross, and G. Urvoy-Keller.

An efficient peer-to-peer file sharing exploiting hierarchy and asymmetry, by G. Kwon and K. D. Ryu in the Proceedings of the 2003 Symposium on Applications and the Internet, 27-31 Jan. 2003 Page(s): 226 - 233.
< unfortunately only available on an IEEE pay-for site >

August 29, 2007

Best Skype outage post mortem so far

It's been more than ten days since the global Skype outage time to reconsider what actually happened.  The most credible analysis is not from Skype, but from Julian Cain in a series of comments (here, here and here) that he made to a Gigom article about the outage (or see the single file in "References" below).  Julian is lead architect at Pando and, earlier, was head of Mac development for Kazaa at Sharmen Networks.  So he knows a lot about peer-to-peer networks and his work at Sharmen put him in a position to know quite a bit about the P2P technology that's also used by Skype (and likely by Joost).

Background

Skype's P2P technology was evolved from FastTrack, originally developed for Kazaa.  Their P2P network consists of clients and supernodes.  Skype distributes client software which includes all necessary supernode software, so any client that has appropriate capacity and connectivity can be promoted to become a supernode.  Supernodes dynamically link to other supernodes to support a distributed database and distributed index (called the distributed hash table or DHT).  For Skype, the DHT layer is responsible for maintaining client presence info, contacts and icons/avatars, and handling call routing.

Root Cause

But as I pointed out in several posts during the outage, there's also a centralized component to the Skype network.  That's the login servers.  Julian refers to them at the "authentication servers" and/or "login/connectivity servers."  They are implemented as one cluster of about 50 machines.  As for the root cause of the outage, he asserts:

Skype employees introduced code into the "login/connectivity" server farm that was not compatible with current Skype clients.

Other Issues

While that was the root cause, it was helped along by other network characteristics, notably that each client connects to only one supernode at a time.  According to Julian, there are 300+ clients per supernode and if a supernode goes off line, the 300 or so clients connected to it must reenter their "connecting" sequence, i.e., find and connect to another supernode.

A network with 8 million on-line users implies ~27K supernodes, a figure that's consistent with the ~20K supernodes estimated by Desclaux and Kortchinsky in 2005-2006 (see their June 2006 Recon presentation, PDF here).  The other point from measurements by Desclaux and Kortchinsky is that each supernode attempts to maintain a list of all other supernodes which means there is a substantial amount of traffic between supernodes.  This clearly contributed to the slow recovery, during which Julian commented:

Right now there are approximately 10,000 Skype networks instead of one single "in sync" network.

Scaling?

So I wonder, apart from the login server cluster as a single point of failure, is there also a scaling issue?  FastTrack's breakthrough was the use of supernodes to make the system more scalable.  But was that just one layer of scalability?  If so, what happens when there are 300 million on-line users and one million supernodes?  Perhaps Julian (or another P2P expert) could comment...

References:

I've extracted and assembled a complete copy of Julian's relevant comments.

Vanilla Skype part 1 and part 2, by Fabrice Desclaux and Kostya Kortchinsky, Recon, June 17th, 2006.

Skype traffic during the week of the outage, captured by Phil Wolff of Skype Journal.

Skype_outage_aug_2007

My Photo

NMS Home

  • NMS Communications Logo

Search this Blog

Subscribe by Email

My Online Status

Copyright 2007 NMS Communications

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Technorati


Site Meter

Upcoming Travel & Conferences


Links