Loading document…
Opening in Pages for Mac...
Your browser isn’t fully supported.
For the best Pages for iCloud experience, use a supported browser.
Learn More
Cancel
Continue
A request to propose a new TIP-supported collaboration called
“
Linkable Interactive Visualization and Exploration (LIVE) Environments
”
The best tools available for
data
exploration, visualization and analysis
used today are typically open-
source and highly modular. The range of discoveries they enable is breathtaking. But, creators of today’s
open-source data-science tools—especially those close to the science—are well-aware that discoveries
would be even more rapid and numerous if only an investment could be made in
purposefully
coordinating
the inter-operability of the very best open-source tools, so that researchers could easily
create their own cutting-edge data exploration environments, without the constant pressure of funding
redundant and expensive infrastructure investment.
In a November of 2023 discussion of translational data science research and tools,
Erwin Gianchandani
,
Assistant Director o
f NSF’s
Directorate for Technology, Innovation and Partnerships
(TIP), suggested that
Alyssa Goodman and her colleagues write him a letter explaining why TIP would be the perfect
mechanism for supporting a new cross-disciplinary community-driven effort aimed at inter-connecting
the world’s best open-source online data exploration and visualization tools.
Today, our collaboration
1
is happy to offer this short letter in response to Dr. Gianchandani’s November
suggestion. The letter is supported by a new website,
LIVE-env.org
, created to showcase examples of
what we would like to propose, using demonstrations of what our team has already created and
descriptions of what we have planned.
To keep this letter short, we are focusing it on why LIVE would be
a good strategic investment for TIP, and we hope that everyone interested in the technical approach we
propose will
visit the
LIVE-env.org
site to learn more
about our collaboration’s accomplishments and
plans.
We have spent the four months since Dr.
Gianchandani
’s initial suggestion of this letter refi ning a
detailed plan for how our assembled team of leaders can most effectively work together with their
colleagues in Astronomy, Biology, and GIS to effi ciently create a data exploration and visualization system
flexible and customizable enough to work across all of those disciplines. The results of our
collaborations’ many discussions have led to this strategy:
1.
Establish the new website
LIVE-env.org
, to
showcase what is currently possible, and what
realistically
could be
possible
with signifi cant (~$6M) support, within a couple of years.
2.
Plan for the LIVE-env.org site to eventually become the
online starting point
for people who want to
use
LIVE tools in the future.
3.
Think of LIVE-env.org as
infrastructure
enabling
three closely inter-related efforts: LIVE-Astro.org,
LIVE-Bio.org, and LIVE-GIS.org, whose specifi c foci are on tools for
Astronomy, Biology,
and
GIS
,
respectively.
4.
For each of
LIVE-Astro
,
LIVE-Bio
, and
LIVE-GIS
, defi ne focused “
science demonstration” efforts,
funded
separately
from the planned TIP request. These development-driving demonstrations, pieces
of which are already funded and in progress, include
MilkyWay3D.org
for Astronomy,
SpecPath
for
Biology, and a suite of primarily climate-focused
projects
for GIS.
5.
Establish a
realistic collaborative structure
where a
small number of Core Collaborator’s
institutions are responsible
for establishing and coordinating LIVE, supporting it deeply enough to
ensure sustainability. Those institutions are now
Harvard
,
UC Berkeley,
and
The Jackson Laboratory.
LIVE, Page
1
1
listed at
live-env.org/who-are-we
The leaders of the two key software ecosystems enabling LIVE,
Jupyter
(
Perez
, Berkeley) and
glue
(
Goodman
, Harvard), are core collaborators who have successfully collaborated before.
6.
Identify the best tools scientists would like to see available for integration into a LIVE Environment,
and recruit the leaders of those tool development efforts to the collaboration. This step is well
underway, as shown by the impressive list of
“Affi liated Collaborators”
at
live-env.org/who-are-we
.
7.
Build-in collaborations with
commercial partners
from the start of LIVE. Once fully built-out, LIVE
will become the go-to solution for linked, interactive, visualization and exploration environments
across a wide swath of data-driven science, so companies expert at helping to deploy, customize,
and maintain it will can expect plenty of work in the future.
8.
Keep all LIVE code
open-source
, while still
permitting commercial re-use
(as is the case with both
Jupyter’s and glue’s current licensing), to drive innovation while realistically
aligning economic and
technological incentives.
Why doesn’t LIVE fi t within existing NSF programs?
The “
Pathways to Open Source Ecosystems
" (POSE) program within TIP has a stated mission for its
support that sounds perfect for LIVE:
“
the facilitation, creation and growth of open-source ecosystems
for the creation of new technology solutions
.” But even POSE's “Phase II” version is not large enough to
support the
community-scale effort
LIVE represents. The constituent elements of LIVE have all been
funded from a variety of sources at levels at or beyond POSE’s Phase II (~$1.5M). What’s needed to
realize LIVE’s potential is funding to i
ntegrate these elements
. Each LIVE component tool’s creator could
continue to ask for a few hundred thousand dollars at a time for “their” tool, and many would likely
succeed, but
integration necessitates coordination, enabled by a joint award of dedicated funding
.
Otherwise, the members of the LIVE collaboration will be forced to continue to “compete” with each
other for a total amount of NSF funding far exceeding what a much more effective and effi cient LIVE
approach requires. A competition-only model cannot incentivize coordination and collaboration, and
will ultimately hurt science and offer far less ROI for NSF and the US taxpayer.
Once it’s built, how will LIVE be used?
The screenshot at right shows a snippet of
LIVE-env.org
summarizing how a person or group can create the LIVE
Environment they need. Clicking on blue ALL CAPS words
on the
site
offers much more information, but in brief:
1.
FRAMEWORKS
include a variety of browser-based
options
, including JupyterLab and Solara, where a LIVE
Environment can be constructed. The idea of
“
TEMPLATES
” allows re-use of commonly-useful
Environments, much the way templates work in word-
processing applications today.
2.
Visualization
TOOLS
within LIVE include a long list of
the most popular extant and frontier packages that
can be seamlessly combined, as-needed. Many tools
already have APIs, but a signifi cant amount of the
funding for LIVE would make including any “LIVE-compatible” tool into a FRAMEWORK easy. A listing
of visualization tools from LIVE’s
affi liated collaborators
is at
live-env.org/tech/visualization-tools
,
Lists of fi eld-specifi c additional tools are at the
LIVE-Astro
,
LIVE-Bio
, and
LIVE-GIS
portions of the
LIVE
site. LIVE’s mechanisms for linking tools build off of the “glupyter”
code base
already built and
tested by a collaboration of the glue and Jupyter teams.
LIVE, Page
2
3.
Mechanisms for accessing
DATA
within LIVE will be primarily cloud-based, and signifi cant funding will
be used to ensure easy and effi cient
data access
from within any LIVE Environment. Options to
include and/or upload local data sources will also be provided, and privacy concerns will be
considered.
4.
A key features of glue that in-part inspires the building of LIVE is glue’s ability to
LINK
data sets on
shared attributes
, allowing data to be used in visualizations and exploration without the creation of
“merged” data sets. This paradigm will be expanded in LIVE to include the inter-connection of cloud-
based data sets, which will require signifi cant development effort and collaboration with
commercial
data storage providers
(e.g. AWS).
5.
Using either an established LIVE Environment (e.g. a TEMPLATE), or a newly-created one, a
researcher will then use the system for
EXPLORATORY
Data Analysis
, allowing for rapid, real-time,
interrogation of trends seen in multiple open graphs and calculations, using a brushing-and-linking
paradigm. (
Tableau
is a good analogy here, except that it does not offer persistent links between
data sets, cannot handle scientifi c data formats or volumes, is closed source, and is expensive.)
6.
Critically, since LIVE is web based, all or some of a user’s exploratory environment can be excerpted
and
SHARED
in any online publication, including research journals, but also more public-facing and
educational sites. Presently,
interactive fi gures
are usually generated specifi cally for publication, so
being able to generate them as part of a user’s everyday environment offers gains in effi ciency and
research reproducibility as well.
Who is LIVE for?
The strategies and goals outlined above are aimed to create data-exploration environments that can do
anything from showcasing a single data set in a single window on a web page to designing a complex
dashboard for visualization to creating a “blank canvas” for data exploration that understands a wide-
variety of fi eld specifi c data formats, to creating a guided web-based educational or training experience.
It is already possible (examples at
LIVE-env.org
) to build such an array of end-user tools using the
components of what will become LIVE, but it requires hundreds of developer hours to create systems of
even modest complexity, and the most bespoke are often not re-usable. Creating LIVE will open up the
creation of expert-grade easy-to-use visualization and exploration environments to all NSF-funded STEM
and STEM education organizations, and ultimately to the wider world.
Commercial Implications
LIVE is not a
"business intelligence" or "big data analytics" platform, even though it does offer data
integration and interactive graphics, including dashboards.
Great commercial
products, like Google's
Looker Studio, Domo, Microsoft PowerBI, and Tableau all offer
pieces
of the functionality LIVE seeks to
provide--but essentially only for tabular data and geographic maps. Amazingly powerful data integration
platforms like Palantir Foundry can assemble vast data resources in seconds, but those resources cannot
be analyzed or visualized within a flexible, format-agnostic, open-source suite of tools.
In the future, the
LIVE team would love to
connect
our system to several of these
commercial tools
,
especially through
partnerships
as they become more useful ways for scientists to interact with cloud-based resources.
At present, several
commercial software development companies
(see
live-env.org/who-are-we
) have
made, and are expected to continue to make, vital contributions to the creation of key components of
what we envision as an ever-improving LIVE ecosystem.
Why this team?
The expertise of the LIVE team covers both technology and science. This avoids the classic “if you build it
they will come” pitfall of pure infrastructure-driven projects
and
the “DIY” duct-tape style software
fragility created by scientists working on their own. What is more, the participants in LIVE include the
LIVE, Page
3
world leaders
in their respective fi elds. We are happy for NSF to send this letter around to experts to
corroborate this bold claim.
The
Science Demonstration projects
for LIVE were chosen based on their being led by active participants
in the broader LIVE effort itself.
LIVE Astro’s
demonstrator is
MilkyWay3D.org
.
Its leader,
the Smithsonian Astrophysical Observatory’s
Catherine Zucker
, has used glue and Jupyter to revolutionize our understanding of the Milky Way near
the Sun over the past several years. Zucker and her team, who will propose to NSF AST in 2024, plan to
use LIVE to explore and chart the Milky Way far beyond the Sun’s neighborhood, and to share their 3D
fi ndings with researchers and learners around the world.
LIVE Bio’s
demonstrator, the
Digital Spectral HistoPathology platform
called “
SpecPath
”, is led by The
Jackson Laboratory’s
Ed Liu
and Princeton’s
Olga Troyanskaya
. During his tenure as President of the
Jackson Laboratory, Liu had the vision to imagine and then fund “glue genes,” the genomics-focused
version of glue that underlies infrastructure for LIVE Bio, and Troyanskaya is eager to integrate her and
her colleagues’ world-leading bioinformatics algorithms with LIVE’s data exploration and visualization
functionality. The
SpecPath platform
being proposed
to ARPA-H
represents
an integration of the very
best new hardware tools
infrared (IR) spectroscopic laser scanning confocal microscopy with spatial
genomic approaches that are informed and empowered by advanced computational and data
integration and visualization technologies from
LIVE.
LIVE GIS
has several
demonstration projects
already in progress rather than a single marquee project.
These projects include several
Cosmic Data Stories
focused on earth satellite data, projects in remote
sensing around snow cover and agricultural yield underway at the
Schmidt Center for Data Science &
Environment at Berkeley
(where Co-PI Perez is Co-director), and a number of studies relying on the
CryoCloud
, a cloud-based service built on Juptyer notebooks that is accelerating studies of the earth’s
cryosphere.
What now?
Presuming you agree that LIVE needs to happen, and soon, our collaboration is happy to answer any and
all questions about how to work with NSF TIP, and other divisions as-needed, to secure funding for it, via
an effi cient, collaboration-centric approach. We are also of course happy to write a proper NSF
proposal, as soon as that becomes the relevant next step.
LIVE, Page
4
Transmittal letter
Dear Erwin,
Thanks for your presentation to the MPSAC on 8 November, which I attended in my role as COV Chair for
AST. Thanks also for suggesting, when we chatted after Nigel Sharp introduced us, that I send you this
letter explaining why our US-wide collaboration on LIVE would be a good fi t for NSF TIP funding. As you
requested, I’ve kept the letter short, with links to (much) more information online at the “Linkable
Interactive Visualization and Exploration (LIVE) Environments” (
LIVE-env.org
) website created over the
past few months, in support of this letter.
My colleagues and I hope that once you’ve had a chance to take a look at what we’d like to do that you
might tell us how to submit a formal proposal. We believe that the LIVE approach will dramatically
accelerate discovery across research while also fostering commercial collaboration, innovation and
development—so we hope you agree that TIP support seems a perfect fi t.
The “request to propose” document attached to this email includes a summary of this note, in case
you’d like to circulate the request to colleagues.
Thanks much,
Alyssa
LIVE, Page
5