HLS Redmond

25 %
75 %
Information about HLS Redmond

Published on January 24, 2008

Author: Urban

Source: authorstream.com

Microsoft Research and Big Databases Information at your fingertips :  Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.com & TBarclay@Microsoft.com Microsoft Research Presentation to US Dept. Homeland Security 7 April 2004 Outline:  Outline Overview of Microsoft Research Big-Database Research TerraServer: Geospatial app SkyServer: data mining app Q&A Microsoft Is Different:  Microsoft Is Different It is a software company: Almost entirely an IP company Margins on successful products are enormous The cost of failure is enormous – missed market It is BIG and so must look for BIG bets High-velocity business: Product mix shifts every decade. If you miss a shift, you are dead. Most R&D Is D How to Do Basic Research in Industry? Critical questions (from Rick Rashid):  Most R&D Is D How to Do Basic Research in Industry? Critical questions (from Rick Rashid) How can I create and maintain a world class research organization in an industrial setting? How do I keep the lines of communication open between product teams and researchers? How do I get new technology into products quickly? Approach Adapt the Academic Model:  Approach Adapt the Academic Model Organizational goal: Advance state of the art University organizational model Flat structure, critical mass groups Open research environment Aggressive publication in peer-reviewed literature Frequent visitors, daily seminars Strong ties to University Research Nearly 15% of basic research budget directly invested in Universities Lab grants, research grants, fellowships, etc. Hundreds of interns and visitors Microsoft Research:  Microsoft Research Founded in 1991 Staff of over 700 in over 55 areas Internationally recognized research teams Lab locations : Redmond, Washington, USA 75% Cambridge, United Kingdom 10% Beijing, People’s Republic of China 10% Mountain View, California, , USA 5% San Francisco, California , USA 1% Microsoft Research Expanding the State of the Art:  Microsoft Research Expanding the State of the Art Thousands of peer-reviewed publications 10%…30% of papers at our focus conferences graphics, programming, systems, data management… Community leadership Professional societies Journals Conferences Mentoring Interns Hosting academic summers and sabbaticals Special workshops How To Build A Group:  How To Build A Group Identify a promising area Hire the leader (internal or external) Support her/him Build team around senior researcher Look for people who Want to have impact Have passion for their ideas Same template works for whole labs Cambridge, Beijing, Silicon Valley Keeping Open The Lines Of Communication To Product Teams:  Keeping Open The Lines Of Communication To Product Teams Co-location helps: 75% “on campus” “How can I help?” attitude demonstrates willingness to “get dirty” to help product succeed Product group spin-offs build strong ties Over time a number of product groups evolved from research (e.g., Windows Media) Researchers involved in all corporate product reviews MSR Relationship To MS Products:  MSR Relationship To MS Products Virtually every research group actively engaged with product groups E.G., Windows, Office, streaming media, SQL, Exchange, IIS, commerce server, visual studio, office, consumer products, MSN, etc. Tech transfer: Ideas Code People Contacts Recruiting Focused Technology Transfer Quickly getting technology into products :  Focused Technology Transfer Quickly getting technology into products Program management team with sole focus on tech transfer Researchers on product “advisory” boards “Mind-swaps” – joint product/research off-sites Joint product/research teams, e.g., ClearType (Windows XP) Datamining (SQL 2000) Natural Language & Speech (Office) TabletPC Smart Personal Objects (SPOT) Encourage and recognize contributions MSR Techfest :  MSR Techfest Internal open house for Microsoft Research Annual event since 2001 ~ 7000 attendees 170 demos, 26 lectures “Research in progress” Breadboard demos This is research idea/prototype Great networking event: Breaks down barriers Serendipitous connections. Examples Of Technology Transfer:  Examples Of Technology Transfer Critical support technologies Memory Optimization Technology enabled sim-ship of Win95/Office95 Automated bug detection in Windows 2000 Key technologies that drive products E.G., MS audio 4.0, ClearType, intelligent search, collaborative filtering, Intellimirror, etc. Incubated major products Windows streaming media Windows CE, TabletPC, eBook Ecommerce, Datamining Natural language and speech technologies, etc. MSR Mission Statement:  MSR Mission Statement Expand the state of the art in each of the areas in which we do research Rapidly transfer innovative technologies into Microsoft products Ensure that Microsoft products have a future BARC’s Research Agenda:  BARC’s Research Agenda Scaleable Servers TerraServer – US map online SkyServer – All astronomy data online Databases Advancing Databases and data storage Media Management Organizing your digital shoebox How Can HLS & MSR Cooperate?:  How Can HLS & MSR Cooperate? Lots of research at MSR on HLS relevant areas. Data mining and visualization Distributed systems. Cryptography, security,… Etc.,,, Invite MS Researchers to HLS workshops study groups. HLS visiting scientists at MSR? Outline:  Outline Overview of Microsoft Research Big-Database Research TerraServer: Geospatial app SkyServer: data mining app Q&A Numbers Terabytes and Gigabytes are BIG!:  Numbers Terabytes and Gigabytes are BIG! Mega – a house in California Giga – a very rich person (billionaire) Tera – ~ The national debt Peta – more than all the money in the world A Gigabyte: the Human Genome A Terabyte: 150 mile long shelf of books. How much information is there?:  How much information is there? Soon everything can be recorded and indexed Most bytes will never be seen by humans. Data summarization, trend detection anomaly detection are key technologies See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html See Lyman & Varian: How much information http://www.sims.berkeley.edu/research/projects/how-much-info/ Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book .Movie All books (words) All Books MultiMedia Everything! Recorded A Photo 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli e-Science Has BIG DATA:  e-Science Has BIG DATA Data captured by instruments Or data generated by simulator Processed by software Placed in a files or database Scientist analyzes files / database Virtual laboratories Networks connecting e-Scientists Strong support from funding agencies Better use of resources Primitive today The eScience Big Picture:  The eScience Big Picture facts facts answers questions Data ingest Managing a petabyte Common schema How to organize it? How to reorganize it How to coexist with others Query and Vis tools Support/training Performance Execute queries in a minute Batch query scheduling ? The Big Problems facts facts e-Science is Data Mining:  e-Science is Data Mining There are LOTS of data people cannot examine most of it. Need computers to do analysis. Manual or Automatic Exploration Manual: person suggests hypothesis, computer checks hypothesis Automatic: Computer suggests hypothesis person evaluates significance Given an arbitrary parameter space: Data Clusters Points between Data Clusters Isolated Data Clusters Isolated Data Groups Holes in Data Clusters Isolated Points Nichol et al. 2001 Slide courtesy of and adapted from Robert Brunner @ CalTech. Data Analysis:  Data Analysis Looking for Needles in haystacks – the Higgs particle Haystacks: Dark matter, Dark energy Needles are easier than haystacks Global statistics have poor scaling Correlation functions are N2, likelihood techniques N3 As data and computers grow at same rate, we can only keep up with N logN A way out? Discard notion of optimal (data is fuzzy, answers are approximate) Don’t assume infinite computational resources or memory Requires combination of statistics & computer science Outline:  Outline Overview of Microsoft Research Big-Database Research TerraServer: Geospatial app SkyServer: data mining app Q&A TerraServer/TerraService http://terraService.Net/ http://TerraServer-USA.com/ :  TerraServer/TerraService http://terraService.Net/ http://TerraServer-USA.com/ US Geological Survey Photo (DOQ) & Topo (DRG) images On Internet since June 1998 Operated by Microsoft Cross Indexed with Demographics, A web service 20 TB data source 10 M web hits/day USGS Image Data:  USGS Image Data Digital OrthoQuads 15 TB, 280,000 files uncompressed Digitized aerial imagery 96% coverage conterminous US 1 meter resolution < 15 years old Digital Raster Graphics 1 TB compressed TIFF, 65,000 files Scanned topo maps 100% U.S. coverage 1:24,000, 1:100,000 and 1:250,000 scale maps Maps vary in age Urban Area 1 foot resolution Natural Color 133 major U.S. cities 30 available 2004 2001 or later Produced by NIMA for Homeland Security Image Coverage:  Image Coverage 100% U.S., Topo Maps (light green) 2m to 1024m resolution 96% 48 Conterminous States, (dark green) Ortho Imagery, 1m to 1024m resolution Urban Area Cities Seattle, Portland, Stockton, Modesto, Fresno, Sacramento, Chicago, Orlando, Atlanta, Amarillo, Houston, Lubbock, Springfield, Birmingham, Dallas, Albuquerque, Oklahoma City, El Paso, Lincoln, Lexington, Tampa, Washington DC, Mobile Ft Wayne, Colorado Springs, Baton Rouge, … User Interface Concept:  User Interface Concept Display Imagery: 316 m 200 x 200 pixel images 7 level image pyramid Resolution 1 meter/pixel to 64 meter/pixel Navigation Tools: 1.5 m place names “Click-on” Coverage map Longitude and Latitude search U.S. Address Search External Geo-Spatial Links to: USGS On-line Stream Gauges Home Advisor Demographics Home Advisor Real Estate Encarta Articles Steam flow gauges Concept: User navigates an ‘almost seamless’ image of earth Buttons to pan NW, N, NE, W, E, SW, S, SE Click on image to zoom in Links to switch between Topo, Imagery, and Relief data Links to Print, Download and view meta-data information New “Urban Area” Data:  New “Urban Area” Data “Redundant Bunch 1” Microsoft Campus at 4 meter resolution Ball field at .25 meter resolution Software Architecture:  Software Architecture ADO.NET 1.1 ADO.NET 1.1 TerraServer Becomes a Web Service TerraServer.net -> TerraService.Net:  TerraServer Becomes a Web Service TerraServer.net -> TerraService.Net Web server is for people. Web Service is for programs The end of screen scraping No faking a URL: pass real parameters. No parsing the answer: data formatted into your address space. Hundreds of users but a specific example: US Department of Agriculture Lighthouse app. USDA has internal TerraServer Web Service Methods:  Web Service Methods Place Search GetPlaceFacts GetPlaceList GetPlaceListInRect CountPlacesInRect Projection ConvertLonLatPtToUtmPt ConvertUtmPtToLonLatPt ConvertLonLatTo NearestPlace GetTheme GetLatLonMetrics Tile GetAreaFromPt GetAreaFromRect GetAreaFromTileId GetTileMetaFromLonLatPt GetTileMetaFromTileId GetTile (Image) Landmark GetLandmarkTypes CountOfLandmarkPointsByRect GetLandmarkPointsByRect CountOfLandmarkShapesByRect GetLandmarkShapesByRect http://terraservice.net TerraServer Web Services:  TerraServer Web Services Get image meta-data Query TS Gazetteer Retrieve TS ImageTiles Projection conversions Web Map Client OpenGIS “like” Landmarks layered on TerraServer imagery Geo-coded data of well-known objects (points), e.g. Schools, Golf Courses, Hospitals, etc. Polygons of well-known objects (shapes), e.g. Zip Codes, Cities, etc Fat Map Client Visual Basic / C# Windows Form Access Web Services for all data Terra-Tile-Service Landmark-Service http://terraservice.net Sample Apps Web Services :  Web Services Web SERVER: Given a url + parameters Returns a web page (often dynamic) Web SERVICE: Given a XML document (soap msg) Returns an XML document Tools make this look like an RPC. F(x,y,z) returns (u, v, w) Distributed objects for the web. + naming, discovery, security,.. Internet-scale distributed computing Your program Data In your address space Web Service soap object in xml Your program Web Server http Web page TerraServer Schema:  TerraServer Schema Load System Flow:  Load System Flow Copy To Load Server TerraLoad Tile Job(s) TerraLoad Pyramid Job(s) 1. Data arrives From the source  2. Copy to the Load Server  3. Tile Job is executed copying tiles to Admin Svr  Admin Db stores new full res tiles Admin/Backup Server Load Server  4. Pyramid Job copies full res tiles to online db(s) & creates image pyramid Pyramiding copies to all online Dbs Hardware Evolution:  Hardware Evolution 1998 – 2000: DEC Alpha 8400, StorageWorks DAS 1 x 8 x 440mhz RISC processor, 2gb RAM 2.5 TB RAID-5, 9gb SCSI drives 7 racks $2.1m (World’s Largest PC) – “Single Server Scale Up” 2000 – 2003: 4-node Compaq Windows 2000 DataCenter Cluster, StorageWorks SAN 4 x 8 x 700mhz Intel (Xeon) Processor, 4 gb RAM each 18 TB RAID-10 (triple mirrored) 73gb drives, 4 racks $1.6m – “High Availability Large Scale Cluster” 2004 - …: “White-box Storage Bricks” Low Cost Availability 4 copies of the data RAID1 SATA Mirroring 2 redundant “Bunches” Spare brick to repair failed brick 2N+1 design Web Application “bunch aware” Load balances between redundant databases Fails over to surviving database on failure ~100K$ capital expense. Outline:  Outline Overview of Microsoft Research Big-Database Research TerraServer: Geospatial app SkyServer: data mining app Q&A Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/:  Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ Premise: Most data is (or could be online) So, the Internet is the world’s best telescope: It has data on every part of the sky In every measured spectral band: optical, x-ray, radio.. As deep as the best instruments (2 years ago). It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). It’s a smart telescope: links objects and data to literature on them. Why Astronomy Data?:  Why Astronomy Data? It has no commercial value No privacy concerns Can freely share results with others Great for experimenting with algorithms It is real and well documented High-dimensional data (with confidence intervals) Spatial data Temporal data Many different instruments from many different places and many different times Federation is a goal The questions are interesting How did the universe form? There is a lot of it (petabytes) Time and Spectral Dimensions The Multiwavelength Crab Nebulae:  Time and Spectral Dimensions The Multiwavelength Crab Nebulae X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ CalTech. Crab star 1053 AD SkyServer.SDSS.org:  SkyServer.SDSS.org A modern archive Raw Pixel data lives in file servers Catalog data (derived objects) lives in Database Online query to any and all Also used for education 150 hours of online Astronomy Implicitly teaches data analysis Interesting things Spatial data search Client query interface via Java Applet Query interface via Emacs Popular -- 1% of Terraserver  Cloned by other surveys (a template design) Web services are core of it. Demo of SkyServer:  Demo of SkyServer Shows standard web server Pixel/image data Point and click Explore one object Explore sets of objects (data mining) Data Federations of Web Services:  Federation Data Federations of Web Services Massive datasets live near their owners: Near the instrument’s software pipeline Near the applications Near data knowledge and curation Super Computer centers become Super Data Centers Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Scientists get “personalized” extracts Uniform access to multiple Archives A common global schema Federation: SkyQuery.Net:  Federation: SkyQuery.Net Combine 4 archives initially Just added 10 more Send query to portal, portal joins data from archives. Problem: want to do multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: “batch schedule” on portal server, Deposits answer in personal database. SkyQuery Structure:  SkyQuery Structure Each SkyNode publishes Schema Web Service Database Web Service Portal is Plans Query (2 phase) Integrates answers Is itself a web service SkyQuery: http://skyquery.net/:  SkyQuery: http://skyquery.net/ Distributed Query tool using a set of web services Four astronomy archives from Pasadena, Chicago, Baltimore, Cambridge (England). Feasibility study, built in 6 weeks Tanu Malik (JHU CS grad student) Tamas Budavari (JHU astro postdoc) With help from Szalay, Thakar, Gray Implemented in C# and .NET Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2 SkyNode Basic Web Services:  SkyNode Basic Web Services Metadata information about resources Waveband Sky coverage Translation of names to universal dictionary (UCD) Simple search patterns on the resources Cone Search Image mosaic Unit conversions Simple filtering, counting, histogramming On-the-fly recalibrations Portals: Higher Level Services:  Portals: Higher Level Services Built on Atomic Services Perform more complex tasks Examples Automated resource discovery Cross-identifications Photometric redshifts Outlier detections Visualization facilities Goal: Build custom portals in days from existing building blocks (like today in IRAF or IDL) MyDB added to SkyQuery:  MyDB added to SkyQuery Let users add personal DB 1GB for now. Use it as a workbook. Online and batch queries. Moves analysis to the data Users can cooperate (share MyDB) Still exploring this MyDB The Big Picture:  The Big Picture facts facts answers questions Data ingest Managing a petabyte Common schema How to organize it? How to reorganize it How to coexist with others Query and Vis tools Support/training Performance Execute queries in a minute Batch query scheduling ? The Big Problems facts facts Outline:  Outline Overview of Microsoft Research Big-Database Research TerraServer: Geospatial app SkyServer: data mining app Q&A Grid and Web Services Synergy:  Grid and Web Services Synergy I believe the Grid will be many web services share data (computrons are free) IETF standards Provide Naming Authorization / Security / Privacy Distributed Objects Discovery, Definition, Invocation, Object Model Higher level services: workflow, transactions, DB,.. Synergy: commercial Internet & Grid tools

Add a comment

Related presentations

Related pages

Hls Services in Redmond, Washington with Reviews & Ratings ...

Find 1 listings related to Hls Services in Redmond on YP.com. See reviews, photos, directions, phone numbers and more for Hls Services locations in Redmond ...
Read more

Dr. James Redmond • 1 Reviews • Dentist Rochester Hls DDS

Patient review: Dr. James Redmond DDS Dentist decent dentist and hygentist staff
Read more

Kohl's in Redmond, WA at 17601 NE Union Hill Rd | Kohl's ...

Shop Kohl's in Redmond, WA today! Find updated store hours, deals and directions to Kohl's in Redmond. Expect great things when you shop at your Redmond ...
Read more

Microsoft Executive Briefings - Microsoft Enterprise

Executive Briefing Center Redmond. Executive Briefing Center Beijing. ... Executive Briefing Center Sydney . Executive Briefing Center UK . Other Microsoft ...
Read more

Hls Properties Inc in Redmond, Washington with Reviews ...

Find 117 listings related to Hls Properties Inc in Redmond on YP.com. See reviews, photos, directions, phone numbers and more for Hls Properties Inc ...
Read more

Real Estate - 26,905 Homes For Sale | Zillow

Redmond homes for sale . Homes for sale; Foreclosures; For sale by owner; Open houses; New construction; Coming soon; Recent home sales; All homes ...
Read more

Conestoga Hls, Bend, OR | Trulia.com

Photos, maps, description for Conestoga Hls, Bend, OR. Search homes for sale, get school district and neighborhood info for Bend, OR on Trulia ...
Read more

Microsoft Careers

Microsoft Careers. Skip to main content. Microsoft Careers. Sign in; Account central; Apply history; Help center; Job alerts; Job watch list; Recommended jobs;
Read more

Eugene Redmond | South Florida Smash HLS

Eugene Redmond has spent decades– as a Yale University professor, and as director of The St. Kitts Biomedical Research Foundation– using primates in ...
Read more

45 Hidden Hls, Bend, OR 97702 | Redfin - Real Estate ...

House located at 45 Hidden Hls, Bend, OR 97702. View sales history, tax history, home value estimates, and overhead views.
Read more