Panels

 

 

PANEL 1: Data science education: We’re missing the boat, again
3:30PM-5:00PM, Thursday, April 20, San Marino

Moderator: Bill Howe, University of Washington
Panelists:

  • Michael Franklin, University of Chicago
  • Laura Haas, IBM Almaden
  • Tim Kraska, Brown University
  • Jeff Ullman, Stanford University

Description:
Data science has had a transformative influence on post-secondary STEM education: hundreds of new programs in the area have been created in the past few years nationally and internationally. While data management is usually name-dropped in some form in the description of these programs, the specific algorithms, models, and techniques that the database and data engineering communities are known for are rarely prioritized as learning objectives in the curricula.  This omission is a missed opportunity for our community to influence the next generation of STEM students, but more importantly, it’s a problem for the students themselves.

It has become routine to cite anecdotal evidence that data work is 80% pre-processing, wrangling, cleaning, etc.. But few of these programs even attempt to teach students what any of these colloquialisms might mean in practice, or what principles are applicable to solve problems.  And, despite the fact that the products of our community have been enormously impactful in practice among data scientists, they are considered "just tools" by most programs — something to be explored as part of a lab or assignment, if at all — as opposed to manifestations of a set of underlying principles that every student should understand.

In this panel, we’ll discuss the role of data engineering in data science education, exploring the following questions:

  • If data scientists are spending 80% of their time grappling with data, what are they doing wrong? What are we doing wrong? What can we teach them to reduce this cost?
  • What should a practicing data scientist learn about systems engineering? What’s the difference between a data engineer and a data scientist?
  • Scale is at the heart of what we do, and it’s a daily source of friction for data scientists. How can we teach fundamental principles of scalability (randomized algorithms, for example) in the context of data systems?
  • Perhaps data scientists are just consumers of our technology — how much do they really need to know about how things work? Empirically, it appears to be more than we think. There is a black art to making our systems sing and dance at scale, even though we like to pretend everything happens automatically. How can we stop pretending and start teaching the black art in a principled way?
  • How can we address emerging issues in reproducibility, provenance, curation in a principled yet practical way as a core part of data engineering and data systems? Consider that the ML community has a vibrant workshop on fairness, accountability, and transparency. These topics are at least as relevant from a database perspective as they are from an ML perspective, maybe more so. Can we incorporate these issues into what we teach?
  • How much math do we need to teach in our database-oriented data science courses? How can we expose the underlying rigor while remaining practical for people seeking professional degrees?

[ top ]

 

PANEL 2: Small data
1:30PM-3:00PM, Friday, April 21, San Marino

Moderator: Oliver Kennedy, University at Buffalo
Panelists:

  • D. Richard Hipp, Hwaci
  • Stratos Idreos, Harvard University
  • Amelie Marian, Rutgers University
  • Arnab Nandi, Ohio State University
  • Carmela Troncoso, IMDEA
  • Eugene Wu, Columbia University

Description:
Over a decade ago, challenges to assumptions like “Distributed systems failures are outliers”, “We can’t collect everything”, and “There isn’t enough data to distinguish signal from noise” led us into the big data era. Now, fundamental assumptions are changing again. Smart devices are making data more personal. Intelligence is moving closer to the edge with low-cost embedded computing platforms. Tools like D3 are making interactive visualizations a key part of news reporting. Interfaces like Wolfram Alpha and Siri are putting complex question-answering within easy reach. In short, we are transitioning to an era where the data management bottleneck is personal and per-device interactions, rather than scalability — an era of “Small Data”. 

This panel will facilitate a discussion of small data and encourage participants to challenge long-held data management assumptions. Topics for discussion will include:

  • What is small data and why should the database community care?
  • How do human factors affect data management systems and how data is accessed?
  • As edge computing devices like smart sensors, embedded Linux, phones, and watches become pervasive, what bottlenecks will DBMSes have to contend with?
  • Is SQL the right language for a landscape dominated by imperative programming?
  • What tools are required to help individuals leverage open public data?
  • How should new small data technologies be evaluated?
  • What resources are available for new research on small data?

[ top ]

 

[ top ]

 

PHD SYMPOSIUM PANEL: Doing a good PhD and getting a job too
3:30 PM - 4:30 PM, Thursday, April 20, Capri

Panelists:

  • Amol Deshpande, University of Maryland, College Park
  • Arun Kumar, University of California, San Diego
  • Alexandros Labrinidis, University of Pittsburgh
  • Peter Triantafillou, University of Glasgow

Description:

The focus of the panel is to discuss the eternal questions, tricks, and tips for a successful PhD and job hunting/career thereafter broadly in the area of data management. We will discuss questions like:

  • What should be the steps from starting a PhD to finishing it, in 1st, 2nd, 3rd, 4th, >=5th year?
  • How to choose an area of research within data management? How to find a problem within the chosen area?
  • How to get collaborators?
  • How to find an internship? How to connect to researchers outside one's institution?
  • When to start looking for a job?
  • What are the factors to keep in mind to look for positions in industry and academia? How about doing a postdoc?
  • How to achieve a good work-life balance, and successfully deal with advisors, fellow students, and collaborators in the journey of a PhD?

There will be an open Q/A session too.