Big Data is about extracting valuable information from data to use it in intelligent ways such as to revolutionize decision-making in businesses, science and society.
Big Data analytics is able to handle data volume (large data sets), velocity (data arriving at high frequency), variety (heterogeneous and unstructured data) and veracity (data uncertainty) – the so called four Vs of Big Data. Research on software analytics and mining software repositories has delivered promising results mainly focusing on data volume. However, novel opportunities may arise when leveraging the remaining three Vs of Big Data. Examples include using streaming data (velocity), such as monitoring data from services and things, and combining a broad range of heterogeneous data sources (variety) to take decisions about dynamic software adaptation.
BIGDSE’16 aims to explore opportunities that Big Data technology offers to software engineering, both in research and practice (“big data for software engineering”). In addition, BIGDSE’16 will look at the software engineering challenges imposed by building Big Data software systems (“software engineering for big data”).
BIGDSE’16 seeks contributions of different types, including theoretical foundations, practical techniques, empirical studies, experience, and lessons learned. Potential and relevant research directions that BIGDSE’16 plans to explore include, but are not limited to:
Big Data for run-time monitoring and adaptation of software systems. Big Data taps into the wealth of online data available during the operation of software systems. Monitoring of services, things, cloud infrastructures, users, etc. will deliver an unprecedented range of information, which is available with low latency. Such real-time data offers novel opportunities for real-time planning and decision making and thus supports new directions for software adaptation. As an example, based on changes in user profiles Big Data techniques may deliver actionable insights on which concrete adaptation actions to perform to respond to those changes.
Big Data for software quality assurance and diagnosis. Software analytics, i.e., the use of automated analysis of software artefacts, has been explored for some time. Now, with the significant increase of data volumes as well as analytics capabilities for large volumes of structured and unstructured data, software analytics faces new opportunities in the Big Data area. As an example, monitoring logs of complex systems may easily reach sizes of gigabytes and terabytes in small periods of time. Failure patterns and deviations thus may require Big Data analytics to handle such massive amounts of log data. As an example, deep learning techniques may be applied for performing root cause analysis of software failures.
Software architectures and languages for Big Data. NoSQL and MapReduce are predominant when it comes to efficient storage, representation and query of Big Data. However, apart from large, long-standing batch jobs, many Big Data queries involve small, short and increasingly interactive jobs. To support such kinds of jobs may require new architectures and languages that, for instance, combine classical RDBMS techniques for storage and querying on top of NoSQL and MapReduce paradigms. In addition, as we get more big data stores, we also get more CPUs. So, analytics solutions that were computationally impossible 10 years ago are now becoming possible. Ultimately, this may lead to a new generation of software architectures and languages that optimize Big Data querying and retrieval.
Quality and cost-benefit of Big Data software. Assuring the quality of Big Data software requires adopting and extending proven quality assurance techniques from software engineering. For example, testing Big Data software may require new ways of generating “test” data that is sufficient and representative. However, due to the size of data, exhaustive testing may quickly become infeasible thus requiring (formal) verification techniques to generate assurances for Big Data software. Further, not all data sources may be relevant for a big data analysis task. However, as these data sources often come attached with some cost (e.g., queries may need to be run across distributed data pools), the cost-benefit of Big Data software should be assessed a-priori and not only as an after-thought.
Curriculum for Big Data. One emerging area of concern in practice is the lack of skilled Big Data experts, which develop, deploy and exploit techniques, processes, tools and methods for developing applications that actually turn Big Data into helpful insights. With a particular focus on Big Data software engineering, BIGDSE’16 invites contributions that provide a critically view on how software engineering curricula may be extended to deliver such experts