Published on April 8, 2014
1 It used to be that retailers could consider them- selves customer friendly if they put charcoal on the shelf next to beer and ketchup in the summer. Nat- urally enough, customers who bought one of these items would tend to buy the others as well.Today, retailers have to do a bit more than that in the bat- tle to find favor with customers. In e-commerce it is now common practice for retailers to know what their customers like, their order history, discount preferences and, wherever possible, where they live as well. But only if they know how to interpret this data or interlink it intelligently are they able to present customized offerings to their customers and target them wherever they currently are – online, at home on the sofa or on the roadr. To do this, they need the right analytics tools – as well as the right data for the purpose. In an era of global competition and volatile markets, this data is essential to decision-makers seeking to optimize their business. It gives retailers, for instance, dedi- cated information about the purchasing behavior of their customers. For companies, data is the key to understanding their markets better, uncovering hidden trends and identifying new business oppor- tunities in good time. Decisions can thus be made more quickly and with greater precision – with the aim of gaining a greater understanding of customers and being better able to meet their requirements. Data analytics puts marketing departments, for example, in the position of being able to create fine-grained demographic or customer segments and customize products and services to suit their requirements. Detailed segmentation of target groups makes it easier to address them, reduces waste and thus cuts the cost of marketing cam- paigns. A telecommunications provider, for example, can use data analytics to find out why customers are leaving and counter it with targeted measures. The role of data Many decision-makers and executives now recog- nize the strategic value of data, exploit relevant sources of data that give them information about their products and customers and use business intelligence tools to analyze purchasing frequency for different products or changes in stock levels, for example. According to a study by software vendor Artegic, 75 percent of companies believe that they can be significantly more successful if they make use of personal data obtained from online marketing. Business intelligence tools allow them to adapt and control their business and adopt a well-targeted approach. A company’s management benefits significantly from the information obtained and BETTER DECISIONS THROUGH BIG DATA EXECUTIVE BRIEFING To enable the correct business decisions to be made quickly, large quantities of structured and unstructured data now have to be analyzed. Analytics using big data technologies helps us to find the right answers.
2 can use it as a strategic compass to identify changes in the market and customer behavior in good time in order to be proactive. Data becomes big data But however many dashboards, graphics and tables executives have, it doesn’t mean they can just sit back and relax. In recent years, the world of busi- ness intelligence has been really shaken up – trig- gered by the sheer quantity of data. Not long ago, the amount of information available on which to base business decision-making was relatively easy to grasp, but in the last few years it has simply bal- looned. Everything is essentially now digitized, and new types of transaction data and real-time data are emerging. Machines and computers are also producing enormous quantities of data, and this can be stored and analyzed on hardware that is becoming increasingly reasonably priced and dynamic. A modern aircraft, for example, generates up to 10 terabytes of data for every 30 minutes of flying time.With 25,000 flights a day, petabytes of data are generated. The transition toward digital business models and new applications is also contributing to data growth.Technologies such as cloud computing, RFID, transactional systems, data warehouses, document management systems and enterprise content management systems are important developments in the context of big data. Many of these systems are continuously generating new data streams. However, the critical factors in this explosion of data are the Internet, the increasing number of mobile devices and, above all, social media such as Facebook,Twitter andYouTube. Facebook alone, for example, generates 2.7 billion “likes” and 300 million photos a day and scans 105 TB of data every half hour. In addition, not only are the sheer volumes of data generated these days huge; the data is also signi ficantly less structured than the typical kinds of business data generated in ERP systems. Social media information such as text, photographs, audio files or videos can no longer be allocated tidily to rows and columns, as required by the relational database model: this data is unstructured. Accord- ing to an IDC study of data storage in Germany in 2013, 90 percent of data is now unstructured and has to be captured and analyzed using quite new techniques. (Source: IDC Storage*) What that means is that companies now have to deal with large volumes of unwieldy structured, semi-structured and unstructured data from many different sources. These days, companies can no longer ignore unstructured data from social networks, in par ticular. A great deal can be learned from emails, feedback forms, comments and ratings in social networks and discussions in forums.The huge volume of tweets generated every day – currently amounting to around 12 terabytes of data – provides a solid basis for trend research or product development. Typical types of data today Structured data Data that is suitable for the tables and structures of relational databases Semi-structured data Data that is often generated as a result of data interchange between companies and is therefore often based on XML Unstructured data Data from text files, speech-to-text applications, PDFs, scanned mail, presentations, photographs, videos, audio files
3 Which industries benefit from big data? Depending on the technology at their disposal, com- panies can get relatively easy access to large volumes of useful market and customer data – and they want to extract as much value as they can from this data. According to an international IDC study commis- sioned byT-Systems, every second company has al- ready implemented big data projects or has concrete plans to do so. In anSAS survey, three out of every four companies that had launched big data projects described business analytics as an effective aid to decision-making. (Source:SAS Decision Making*) According to the study, they benefit most from increased profitability, reduced costs, more tar geted risk management, process optimization, more rapid decision-making and performance improvements. The outlay associated with big data pays off in terms of hard cash, according to McKinsey. If big data is analyzed correctly and in good time, retail- ers, for example, can improve their margins by up to 60 percent, and European public authorities can save 250 million euros a year through more effi- cient processes, according to the consulting firm. If companies knew more about the locations of their customers, they would be able to sell additional products worth 600 million dollars. (Source: McKinsey Big Data*) Whereas up until recently only banks, financial ser- vices companies and selected large corporations – typical users of data warehousing and business intelligence – had given any thought to automated decision-making processes, now, according to the Experton Group, retailers, utility companies and companies in the life sciences, healthcare industry and many other markets are increasingly also recognizing that data is an important business asset. (Source: Experton Big Data*) In terms of departments within companies, the benefits are felt, above all, in research and develop- ment, sales and marketing, production, distribution and logistics and finance and risk management. In these five areas the business benefits of big data are particularly marked. Analyzing big data Despite the undisputed benefits, converting the data collected into useful information is still a challenge for many companies.According to market research com- panyGartner, over 85 percent of Fortune 500 compa- nies will not be in a position to use big data effectively in order to secure a competitive advantage by 2015. “In terms of technology and administration, most companies are poorly prepared for the challenges associated with big data,” say theGartner analysts. “Consequently, only a few of them will be able to exploit this trend effectively and secure themselves a competitive advantage.” (Source:Gartner PI*) Three factors – the sheer volume of data involved, the heterogeneity of the data and the processing speed required – present a major challenge com- pared with conventional data processing and analy- sis. Given their origins and architecture, relational databases can only be used efficiently for applica- tions involving frequent transactions at the level of data records or for scenarios with low to moderate volumes of data.They are not designed for the pro- cessing and analysis of data quantities measured in petabytes or exabytes. Above all, it is not possible, or at least very difficult, to store unstructured data in table-based relational database systems. Given the increasing volumes of data available for analysis, companies need new approaches and technologies, according to Gartner in its study Big Data Opportunities, New Answers and New Ques- tions. (Source: Gartner Big Data*) Not only do new “big data systems” have to cope with these huge quantities of data, they also have to analyze un- structured data reliably – and as quickly as possible. These real-time analyses require systems with ex- tremely fast database access and efficient parallel- ization so that tasks can be distributed across large numbers of computers – an approach known in the past as grid computing. Google has been the pioneer of big data tools for the analysis of unstructured data.With its MapReduce programming module, the company subdivides the processing of huge volumes of data in such a way that the infrastructure can be adapted with flexibi lity, depending on the volumes of data involved.This resulted in the popular open-source project Hadoop, which is now the standard for big data technology together with in-memory and NoSQL databases for unstructured data. In the context of business appli cations, SAP set things in motion with its SAP HANA database (High-PerformanceAnalyticAppliance) based on in-memory technology. Big data analytics relies on models and algorithms designed to search through mountains of data in
4 order to find connections and identify patterns and similarities. Not only do these predictive or business analytics solutions help to quickly give an accurate picture of the current situation, they also permit predictions and forecasts about future developments. This is done on the basis Source: How Organisations are approaching Big Data, IDG, September 2013 (200 decision-makers from companies with over 100 employees in the USA, Brazil, the Netherlands, Austria, South Africa and Switzerland) Source: How Organisations are approaching Big Data, IDG, September 2013 Business goals related to decision-making capabilities and agility/speed are significantly connected to a majority of respondents’ big data strategies and initiatives. Increasing speed of decision-making Increasing business agility Improving the quality of decision-making Improving the speed of response to IT security issues Improving planning and forecasting capabilities Meeting regulatory/compliance requirements New customer acquisition/retention Using immediate market feedback to improve customer satisfaction Building new business partnerships Improving internal communication Developing new products/services and revenue streams Strengthening existing business partnerships Improving finance/accounting and procurement processes Reducing CAPEX Reducing OPEX 35 34 23 5 3 35 32 26 5 3 31 37 28 2 2 31 31 29 6 3 29 35 28 4 3 26 33 30 8 3 26 33 27 8 5 26 32 32 6 4 25 34 32 6 4 25 32 35 5 3 25 32 34 6 3 25 29 35 6 4 23 30 33 9 5 19 23 41 12 5 18 28 41 8 5 (5)To a significant extent (4) (3)To a moderate extent (2) (1)To a limited extent To what extent is your organization’s big data strategy/big data initiatives connected to each of the following business goals? Base: 155 qualified respondents who have implemented or have plans to implement big data projects (figures in percent) About half of all respondents have either already deployed or are in the process of implementing big data projects at their organizations. Already deployed/implemented big data initiatives In the process of implementing big data projects Planning to implement big data projects over the next 12 months Planning to implement big data projects within the next 13 – 24 months We have no immediate plans to implement big data projects At what stage is your organization currently with the planning and rollout of big data projects? Base: 200 qualified respondents (figures in percent) 25 23 21 10 23 of statistical and stochastic methods, data mod- els and simulations with best- and worst-case scenarios. People with job titles such as “data scientist” are required for this entirely new set of activities.
5 How is meaningful information obtained from large quantities of unstructuredTwitter and Facebook text, video and consumer data? A lot of work has to be done before the data that finds its way into a company can be turned into information on which executives can base their decision-making. Countless selection, process- ing and analysis steps are involved. Based on the analysis of numerous case studies, analytics expert Ken McLaughlin in his blog “Data to Decisions” suggests six concrete steps for data-driven decision-making using business analytics. Step 1: Establish a goal A clearly defined goal must meet two re quirements: It must be both achievable and measurable. “Reduce product shipping costs by 15 percent” would be a clearly formulated goal, for example. Step 2: Model alternatives The goal determines the direction, the alterna- tives and how the goal is to be achieved. Exam- ple: “Costs of a reasonably priced shipper” ver- sus “costs of an automated handling process” would be possible alternatives. Step 3: Identify the required data Identify the data and metrics required to model the alternative. In the example: previous ship- ping costs and software and hardware costs for automated processes. Step 4: Collect and organize data Before the models can be evaluated, data has to be collected and organized. Step 5: Analyze data To evaluate the data, the appropriate analytical techniques and then the best alternative have to be selected. Step 6: Decide and execute Finally, the action that delivers the best results should be executed and the real results observed. What are the risks? A central question in connection with big data is that of data quality. Does data occur more than once, does it contain errors or inconsistencies, or are entire records missing? Users are generally aware of the importance of this question, as a study by Omikron Data Quality shows.Thirty-nine per- cent of those surveyed said they believed that a big data approach is condemned to failure if the data is of poor quality. “It is clear that, when there is a larger volume of data, statistical significance increases and the re- sults of BI analytics are more reliable,” according to the study. “However, if the initial data is incorrect, duplicated or inconsistent, this significance is mis- leading: in the worst-case scenario, you get appar- ently clear results that are mathematically sound – but in fact incorrect. If actions are then taken based on the results of analytics, which is, of course, the goal of BI, negative consequences are inevitable.” (Source: Omikron Data Quality*). If the analyses and forecasts are to be accurate, the foundation (i.e., the data) must be correct. In typical BI, there are proven processes and methods in the ETL (extract, transform, load) process for tidying up data before the information is stored in the data warehouse.These include profiling, cleansing, enriching and comparing with reference data. Data to Decisions: the six steps
The challenge of data silos A further fundamental challenge (or key question) when dealing with big data is the distribution of the data to parallel systems. On the one hand, for his- torical reasons, data silos – from CRM, ERP or other systems, for example – have mastered the architec- ture of data storage and increasingly also have to handle the archiving of historical data. On the other hand, given rising data volumes, many companies merely allocate the data flooding in to different storage locations – without processing or trans- forming it beforehand. These distributed and heterogeneous data process- ing and storage structures are neither cost effective nor expedient for potential data analyses.They prevent the exchange and integration of data and make it difficult to maintain a holistic view of data management. Modern integration technologies can be used here that turn the structured, unstructured and semi- structured data from a variety of sources into an integral part of the enterprise-wide data manage- ment strategy. To this end, software solutions tap sources of data throughout the company, read and extract it and load it into the storage system provided. In the next step, this data is loaded into data models, enriched with further data from other sources and then ana- lyzed. Cloud-based systems help to provide storage capacity for large volumes of data. No big data without skilled staff Successful big data analytics requires not just suit- able technologies but also skilled staff. Big data an- alytics can only be implemented with the help of highly qualified specialists who can handle the rele- vant tools and technologies and are also able to un- derstand the requirements of specific departments and ensure that the technology that is put in place meets these requirements. For some time now, a chief data officer (CDO) has been included in the list of C-level executives in many US companies.The focus of the CDO’s activi- ties is on managing data as an asset and converting it into something with a concrete business value. Capital One appointed the first CDO in the industry in the year 2003. Since then, CDOs have become increasingly com- mon in lists of top executives, above all in large public institutions that are overwhelmed with data. According to Gartner, there are CDOs in 2 percent of companies around the world and in 6 percent of large companies.This is forecast to increase to 20 percent of large companies by 2017. In Europe the CDO is still relatively unknown.Whether it is really necessary to establish a CDO is a matter of debate, particularly since the role is not precisely defined. However, there is an urgent need for big data experts who are able to work with data effectively. These IT experts have to have different skills from those required for conventional IT systems. In addi- tion to meeting the technical requirements, these specialists must be able to work with statistical and stochastic methods as well as analytical models and have sound industry expertise. The Experton Group therefore demands that new types of jobs are created with titles such as data scientist or data artist.The data scientist is the data expert who selects the analytical methods and analyzes the data. A data scientist requires a good general education with knowledge of mathematics and stochastics, programming fundamentals, SQL and databases, information technology and networks. Presentation and visualization of the data is then handled by the data artist, whose training includes graphic design, psychology, some mathematics, IT and communications.These jobs form what you might call the core of big data staff. Other new jobs are being added to this core group.The table on the next page shows all of these. 6
7 Big data job descriptions Position Responsibilities Required expertise Data scientist Decides which forms of analysis are most suitable and which raw data is required and then analyzes it Mathematics, stochastics, pro- gramming, SQL and databases, information technology and networks Data artist Presents the analyses clearly in the form of charts and graphics Graphic design, psychology, mathematics, IT and communi cations Data architect Creates data models and decides which analytical tools are to be used Databases, data analysis, BI Data engineer Looks after the hardware and software, in particular the ana lytical systems and the network components Hardware and software knowledge, programming Information broker Obtains information and makes it available, for example by providing customer data or in-house data from a variety of sources Databases, communications, psychology Who is going to train big data specialists? Until now, however, companies have hardly ever been able to call on staff resources like these. “Data scientist and data artist are jobs for which a two- to three-year period of training would be required, but due to the cross-cutting nature of the work, they scarcely exist today,” says Holm Landrock, a senior advisor at the Experton Group. Only a few companies and organizations are com- mitted to training data scientists and data artists in any way, but what they offer is far from a compre- hensive program of training. IT Companies such as SAS, EMC and Oracle do offer training in this direc- tion.The Fraunhofer also offers training for data scientists. But short courses like this are just a drop in the ocean.The Experton Group therefore recommends that the ICT industry should get together with education providers – such as vocational acade- mies, technical colleges, industry associations and chambers of industry and commerce – to create new job profiles as quickly as possible.Training staff for a role as a data scientist or one of the other new jobs types is not some kind of Good Samaritan project but a foundation stone for future big data projects and the resulting sustainable business success.
8 What big data solutions exist? There is no standard solution, but some processing methods have emerged in recent years that serve as the basis for big data analytics today and will continue to do so in the next few years. The ideal solution for coming to grips with huge volumes of data is the old principle of “divide and conquer”. Arithmetic calculations are subdivided into many small calculations and distributed to multiple servers. Google’s MapReduce algorithm has emerged as the de facto standard for distri buted computing. A typical MapReduce application calculates multiple terabytes of data on thousands of machines. MapReduce is implemented in practice by means of the software library Apache Hadoop. By subdivid- ing the data into smaller chunks and processing them in parallel on standard computers, Hadoop has emerged as the current industry standard for big data environments. The Chinese mobile phone provider China Mobile, for example, was able to use Hadoop to analyze the phone usage of all of its customers and the proba- bility of them churning.The “scale-up” solution it was using prior to this enabled the company to analyze the data of only around ten percent of its customers. Now, however, all customer data can be taken into account, and targeted marketing measures have been introduced to reduce churn. Source: How Organisations are approaching Big Data, IDG, September 2013 In-memory permits real-time analytics However, a Hadoop cluster is not capable of han- dling all big data tasks. If the data is on a hard disk, slow database accesses cannibalize the gains made through parallelization.This is why in-memory databases have established themselves for the accelerated processing of extremely large quan tities of data.These databases store the data in working memory (RAM) and call it from there. That makes them faster than that use conventional disk technology by a factor of around 1,000. To obtain the maximum in terms of performance, wherever possible in-memory databases therefore load the entire volume of data – together with the database applications – into main memory, which has to be large enough to cope. Business data ana- lytics can thus be carried out virtually in real time rather than taking days or weeks. SAP’s highly popular HANA (High-Performance Analytic Appliance), for example, a database About two-thirds of respondents are extremely/very likely to consider using or to continue to use in-memory databases. In-memory databases (e.g., SAP HANA, Oracle Exadata) Log file analysis software NoSQL databases Columnar databases Hadoop/MapReduce (5) Extremely likely (4)Very likely (3) Somewhat likely (2) Not very likely (1) Not at all likely Not familiar with this type of solution How likely are you to consider using or to continue to use each of the following big data solutions? Base: 155 qualified respondents who have implemented or have plans to implement big data projects (figures in percent) 28 38 15 9 3 20 32 26 10 3 20 31 26 9 7 17 28 28 12 4 15 25 26 12 6 6 9 6 11 15
9 system based on in-memory technology, was unveiled as a high-performance platform for the analysis of large volumes of data in mid-2010 by Hasso Plattner and SAP technology bossVishal Sikka. Database specialist Oracle also now offers a database system based on in-memory techno logy: Exadata. In-memory databases are no longer a niche product. According to a study by TNS-Infratest commissioned by T-Systems, 43 percent of German companies are already using in-memory technologies for data analytics or plan to do so in the near future. Ninety percent of users say their experience with the technology has been good or very good. (Source: T-Systems New Study*) However, the majority of German companies regard in-memory technology as complementary to time-critical analytics as things stand. But almost 20 percent of companies see it as an important response to the challenges of big data.They expect in-memory systems to become a central element of data analytics environments. In addition, there are technologies such as NoSQL databases for unstructured data. NoSQL is the collective term for “non-relational” database systems and also the term used to describe a shift away from relational databases to new or forgotten database models. NoSQL database systems are an efficient way to store and process unstructured data such as text, audio files, videos and photo- graphic material. Source: How Organisations are approaching Big Data, IDG, September 2013 Overall, respondents believe that in-memory databases best address big data’s challenges, but there are significant differences by region. Which of the following solutions do you believe would best address the challenges associated with big data? Base: 147 qualified respondents who are familiar with two or more big data solutions shown in Q.3 (figures in per cent) Make or buy? The current market situation for big data solutions presents a final challenge on the way to big data success. Numerous providers are offering software tools based on Hadoop.These include Cloudera, Hortonworks, Datameer and HStreaming as well as big names such as IBM, Intel and EMC. But they are all coming up against the same limitation: none of them have standardized industry solutions that can be customized quickly to suit customers’ require- ments.They often have to specially develop these systems in joint projects together with their cus- tomers. Companies wanting to use the technology are faced with a typical “make or buy” decision.When analytics is carried out on a one-off basis, or there Respondents in EMEA are significantly more likely to favor in-memory databases (60%), compared to only 22% in the US and 14% in Brazil. In-memory databases (e.g., SAP, HANA, Oracle Exadata) NoSQL databases Log file analysis software Columnar databases Hadoop/MapReduce Not sure 30 19 15 12 11 14
10 ©IDG Business Media GmbH, Germany 1/2014 are large fluctuations in data volumes or in demand for analytics, it pays to use cloud-based infrastruc- tures rather than invest in your own hardware.The Munich data center ofT-Systems currently boasts the largest Hadoop cluster in Germany. Companies are offered big data or analytics as a service there – as and when they need it. In the medium to long term, however, companies should manage their data themselves, since other- wise much of the valuable information that can be obtained is lost. Only by working with data continu- ously, testing hypotheses and observing changes can the potential of big data be fully exploited. Conclusion In order to prevail and grow in highly competitive markets with ever shorter production cycles, companies have to secure market shares and sales success.What separates the winners from the losers is their use of IT. Only companies that under- stand their customers well and know their precise needs, and who make the most effective use of IT in their business, will be able to hold their own against the competition over the long term.There are also times when companies have to be able to take well- prepared decisions and adopt a proactive approach. They are thus faced with the challenge of having to process rapidly growing volumes of data from an increasing variety of data sources at shorter and shorter intervals.This data has to be analyzed in order to underpin business decision-making with better figures, data and facts. Big data solutions are thus becoming critical to success, but they also necessitate a certain shift toward cloud-based computing in order to allow the required big data technologies to be integrated seamlessly with the existing infrastructure. Com panies also have to be ready to stay on the ball and integrate new developments. Even in the future, there is no prospect of a single reservoir of data allowing big data challenges to be met centrally and with ease. According to Experton Group analyst Andreas Zilch, it will only ever be possible to come up with partial solutions: “There will be no one big data super solution.” But in the near future, we can expect to see huge parallel data-crunching systems that will be able to analyze even more extensive volumes of data even more quickly than is possible using the current methods. *Sources: – Artegic “Marketing in the Digital Age”, 2013 – Experton “Die Entwicklung von Big Data im Jahr 2012” (The Development of Big Data in theYear 2012) (Experton Big Data) – Gartner “Big Data Opportunities, New Answers and New Questions”, April 2013 (Gartner Big Data) – Gartner PI “Gartner RevealsTop Predictions for IT Organizations and Users for 2012 and Beyond” (Gartner PI) – IDC “Storage in Deutschland 2013” (IDC Storage) – McKinsey “Big data:The next frontier for innovation, competition, and productivity” (McKinsey Big Data) – Omikron “Datenqualität wird zur Herausforderung von Big-Data-Strategien” (Data Quality Becomes a Challenge for Big Data Strategies) (Omikron Data Quality) – SAS Study “Most firms say business analytics boosts decision-making processes” (SAS Decision Making) –T-Systems study “Quo vadis Big Data” (T-Systems Big Data) –T-Systems PI “Neue Studie: Big Data im Fokus der ICT-Entscheider” (New Study: Big Data the Focus of ICT Decision-Makers) (T-Systems New Study)
Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...
In this presentation we will describe our experience developing with a highly dyna...
Presentation to the LITA Forum 7th November 2014 Albuquerque, NM
Un recorrido por los cambios que nos generará el wearabletech en el futuro
Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...
Big data is a fashionable concept and Europe is just at the beginning of this phenomenon, could travel data be the next “online marketing” of the industry?
... Better Decisions through Big Data Monday, Jul 20, 2015. ARTICLE: Companies and governments increasingly rely on ‘big data’ to operate efficiently ...
How to Make Better Decisions Through Shared ... teams in order to build a dynamic environment for better decision ... will drive big data ...
Seeing the Big Picture | Page 5 equipment, and assets aimed at the right activities. Lack of adequate visibility into corporate resources, on the other
Big data is more than high ... why it matters and how it can help you make better decisions every day. SAS ... Big data brings big insights, ...
Big Data is such a really big thing now. Everybody talks about it. In my perspective, Big Data is just a huge collection of data. It depends on a company ...
Using big data to make better pricing decisions By ... companies should take advantage of big data and invest enough resources in supporting their ...
Better Decisions And Improve Performance Big Data ... Through Divorce http ... smart big data analytics and metrics to make better decisions ...
Guest Column: Better Decisions Through Business Analytics; Innovative Ideas Becoming Reality; ... big data, and predictive analytics. In January 2013, ...
Visualization of daily Wikipedia edits created by IBM. At multiple terabytes in size, the text and images of Wikipedia are an example of big data.