Tag Archive for big data

Microsoft “eScience in the Cloud” 2014

I had the honor to represent the project at Microsoft’s “eScience in the Cloud” workshop.  This was a small conference oriented towards domain scientists hoping to utilize cloud data storage & processing in their research projects, held on the Microsoft campus.

Building 99 (Microsoft Research)

Building 99 (Microsoft Research)

The main themes of the conference were:

  • Impacts of the cloud on research communities
  • Streaming environmental data and urban science
  • Perspectives on data science
  • The leading edge of cloud design
  • Deriving information from data
  • Data analytics and simulation

The first day of the workshop was an Azure training session.  My fellow students and I were led through a series of trainings on the various features of Azure.  Microsoft did a great job on tailoring the training towards researchers, with several examples revolving around R, MatLab, and iPython Notebook.

Not to come across as too much of a Microsoft fanboy, but I have to say, Azure is pretty awesome.  The ability to spin up a virtual server or cluster of your choice on-demand unlocks a lot of potential for data scientists.  Need a Hadoop cluster for a massive mapreduce job?  You can have it up and running in minutes, spin up a website to display your findings, and store all your data in a CKAN data repository in the cloud for reference by collaborators.  All through Azure and all very quickly. It’s pretty amazing tech.

The second and third day consisted of a series of talks by Microsoft personnel as well as domain scientists & researchers.  Too much was covered to really go into detail here.  I particularly enjoyed some of the talks in the “Leading Edge of the Cloud” segment, which covered some of the techniques Microsoft is doing to improve cloud performance (i.e. micro server farms called “cloudlets” geographically located in close proximity to your research facility, as well as software networking approaches for better bandwidth within a data center).  I also really enjoyed a talk by Gabriel Antoniu entitled “Scalable Data-Intensive Processing for Science on Azure Clouds: A-Brain and Z-CloudFlow”, covering his work tackling challenges so large in processing demand that they require collaboration of multiple geographically-diverse data centers.

All in all, it was a great workshop and I’m glad to have had the opportunity to see some cutting edge work by fellow researchers!