HUB Ocean on the Mapscaping Podcast

Our Product and Data Science Lead Tara Zeynep Baris was a guest on Daniel O'Donohue’s “Mapscaping Podcast”, one of the best podcasts in the geospatial genre. Tara talked through what we at HUB Ocean are doing and what challenges our Ocean Data Platform will solve. Here is the transcription of this podcast. Enjoy!

Daniel: Welcome to another episode of the Mapscaping Podcast. My name is Daniel and this is a podcast for the geospatial community. Today on the podcast we're talking about something called HUB Ocean, which as the name suggests is a hub for ocean data.

Now we've talked about these kinds of data hubs before on the podcast. We've had episodes around Sentinel Hub, which is an earth observation data hub. We've talked about the Microsoft Planetary Computer, Google Earth Engine, something called Open Topography, which again, as the name suggests, is a data hub for topography data.

The concept of this data hub is nothing new, but it's not easy to implement. And if these types of data hubs work, they have a certain gravity that becomes stronger over time. One of the guiding concepts behind these data hubs seems to be this idea of FAIR data. F-A-I-R data, findable, accessible, interoperable, and reusable data. All of this stuff makes a lot of sense. 

I think though it's probably worth considering how we can take this idea of FAIR and also apply it to the results of the data. It's great that data is FAIR, but I think that the results of our research should also be FAIR, should also be findable, accessible, interoperable, and reusable.

In just a minute you're going to hear from Tara Barris, a senior data scientist from HUB Ocean, and she's going to give us a bunch of insight into what it takes to build one of these data hubs, the kinds of technologies that are involved, and the tension around how we make data FAIR and what the advantages of this might be. 

Hi Tara, welcome to the podcast. You are a senior data scientist at something called HUB Ocean, and I think we should just start here. Would you mind adding a little bit more depth to that introduction, please, andperhaps move off and tell us what HUB Ocean is? 

Tara: Sure. I'm currently a senior data scientist at HUB Ocean, but I would like to add that my background is actually in biology. I have a PhD in marine biology and ecology, specifically on genomics and evolution ofmarine species. And I realized that I really enjoyed the analysis portion of my PhD the most.

I decided to pursue a career in data science, and I worked at a few different companies that realized I really missed the ocean space. When I had the opportunity to join HUB Ocean as a senior data scientist, it seemed like the perfect fit.

What HUB Ocean is, and what I do there, is we have this very lofty mission of unlocking and uniting ocean data from a bunch of relevant sources to produce this global, comprehensive digital ecosystem of the ocean. And by doing this, we're hoping to change the fate of the ocean, help heal the ocean, and rewire the way that industry uses the oceans. And my job specifically there is to help build tools for our users to interact with the data that we have on our platform.

Daniel: There's a lot of platforms around these days. We're going to dive into that a little bit more later on. But let's start clarifying this lofty goal. I think you said heal the ocean, rewire the way industry works with the ocean. What's wrong with the ocean? Why do we need to heal it?

Tara: The ocean is very important. It makes up about 70% of the Earth's surface. We get 50% of our oxygen from the ocean. It is a really important carbon sink. It absorbs around 30% of all carbon dioxide emissions, and it captures 90% of the excess heat generated by these emissions. It's a really important buffer against climate change.

Of course, it's also very important for a lot of economies, either through tourism or ocean industry. And of course, we get a lot of our food from the oceans. But I think as we all know, we're reaching a tipping point with climate change, and there's only so much the ocean can absorb. 

It's facing very extensive coral bleaching, habitat loss, ocean acidification. There's food insecurity. We really want to make sure that we have the data to help mitigate some of these effects of climate change.

Daniel: I realized that was maybe the stupidest question that's ever been asked on a podcast. But I just wanted to have a starting point. I wanted to get into some of these things. A couple of other things I can think about, I'm not sure if you mentioned, was waste disposal. You talked about the ocean's ability to absorb a lot of these outputs from our human systems, if you will. And I think that's really important. And power generation as well. I think it's becoming more and more important.

And of course, food. The ocean is important because that's where we get a lot of our food from. Possibly the world's dumbest question, but it's important to put some names on some of these things here.

I've had guests on the podcast previously, and we've talked about mapping the ocean floor and how we're doing this and the importance of this. And one of the things that comes up again and again is this really interesting statement. I want to read it out to you, and then I want to hear your reaction to it as a data scientistworking in the space. The statement is, we know more about the surface of Mars than we do about the oceanfloor. 

Tara: Yes. I think about 80% of the ocean is unexplored or unmapped. And of course, that's because there's a lot of various challenges with mapping the ocean.

Daniel: What are some of those challenges as you see it, as someone working with this data? What do you think are the biggest challenges? 

Tara: They’re in a bunch of different categories. The first one is just gathering the data, right? The ocean is not a very accessible place. It's huge. It's very deep.

Certain areas are just difficult to get to for data collection. Even though we have a lot of remote sensing now and satellite imagery that really is only for the surface of the ocean. It's really hard to understand what's going on deeper down. And deeper down, it's difficult to explore. There's crushing pressure, really low temperatures.

It costs a lot of money to go and collect data from some of these places. But then of course, there's other things that contribute to this. One of them is, there's still a lot of technological limitations to try to make sense of all this data and catalog the data and make sure it's available to people. A lot of the times the data is locked away or not publicly available. There's a few factors that contribute to this.

Daniel: Wow, you're a real pro. This is a great segue off into the data hub that you're building. And I think what I'm going to do here is, I'm going to put a few links in the show notes of this episode to previous episodes all about why it's so hard to map the ocean, why it's so hard to remotely sense data from the ocean.Tell me about this other side of it.

Let's imagine for a second, we could do all this. We could map the ocean floor, we could map everything in the ocean. The next challenge you were talking about is the availability of data or perhaps the discoverability of data and access to it. Tell me about this as a problem. 

Tara: Sure. Frst, data management infrastructure has not really been able to keep pace with exponential increase in data. And this is what a lot of different people are struggling with. It's very expensive to manage data and to have the technology and the computational power to analyze it. 

The other thing is, it can be just very difficult to discover. Right now, a lot of data lives in silos, in different data sets, you have to go to different websites to even know that it exists. And then once you know it exists, sometimes you have to download it to your local machine.

And that's the only way you can work with it. A lot of people don't have that type of space or don't have the computing power to work with it.

We’re trying to bring all of that data to one place, and then also provide the computing power that you would need to analyze all of that data.

Daniel: Okay, a couple of things here, won't cloud optimized file formats solve a lot of this problem? We can stream data into a browser, essentially, or into a client that we have on our local machine and work with it there. We don't have to have access to all of the data such just the bit that I'm working on.

Tara: Yeah, definitely. We're using a lot of these cloud optimized geospatial formats, like geo parquet and czarfiles. But the current issue is a lot of the people who are in charge of these data sets, don't really have the resources or the capacity to put time and effort into, into using these formats.

A lot of them are, are using old technology, they're using old ways of storing the data. And it's, it's not as accessible as it should be. We have a lot of partners, for example, that we're helping in making their data accessible in these types of formats. And then they can come and just get it through our API and just get the bits that they need in a really easy way.

And also serving them in consistent formats is helpful to the user because they don't have to figure out: “what package do I need for this data format?“

Daniel: Okay, part of the problem that you're seeing here is that the people that are creating this data or hosting it, don't have the resources to convert it into these cloud optimized formats. For example, maybe don't even have the resources to make it discoverable through a stackinterface, just as an example. 

Tara: Right. And of course, I mean, there's a lot of new initiatives, but I think bringing them all together is the difficult part. We know that we're not going to be the only platform out there. And we are trying really hard to be interoperable with all the different people in this space. But we're hoping to bring together all the different data sets. And even if someone isn't getting the data through us, or we're not ingesting the data, that it's really easy for them to use the exact same tools to get data from a different platform.

Daniel: Let's talk a little bit about the platform side of things, because we hear this word again and again and again in the geospatial world, I'm sure in data science in general.

And there's a lot of platforms out there, let me name three of them. When I think about geospatial anyway, I think about Google Earth Engine, I think about Sentinel Hub, I think about Microsoft's planetary computer. Why do we need another platform for marine data?

Tara: Sure. I think that it's important to have a platform that focuses exclusively on ocean data. A lot of the other platforms are more focused on land. And I will say that HUB Ocean and the Ocean Data Platform is driven by mandates under the UN Decade of Ocean Science and the High Level Panel for a Sustainable Ocean Economy. Our focus areas are quite different than some of the other platforms. For example, mapping and managing the ocean, getting sustainable food, helping renewable energy industries, green transportation, zero emission shipping.

And then the other thing is we have, yes, we are very similar to the Microsoft planetary computer, but we are working really closely with them. And what we found is actually they're a really good source for satellite data.And as you mentioned, they have a stack API, but we also provide a lot of more tabular data in these cloud optimized geospatial data formats, Geoparquet and CZAR. 

And what we're actually hoping is that our data pipelines and data flow will be more of self-service so people can pipe in data as they wish. And then we have the infrastructure for people to build tools on top of our platform. In this way, maybe it's a bit more relevant to compare it to Apple. We can have a data exchange or an app marketplace in the future and people can build tools on top of our platform.

Daniel: And I think this side of the conversation is really, really interesting, but go back to that comparison to Microsoft's planetary computer just for a second. Can you give us a brief overview of how you're similar? And then we can move off and talk about what you're building on top of that.

Tara: Sure. We're similar in the sense that we both have this Jupyter Hub environment that uses Dask for parallel computing. Basically you can go into their platform and use the data straight in this Jupyter Hub environment and spin up computing cluster. And we offer the same thing.

You can explore the data on a map in a similar way. We have this application space, but I think where we differ the most is that we focus more on the ocean and they focus mostly on satellite data and stack. And currently the way that stack works, it's a bit difficult to incorporate tabular data. We have a lot of data sources that we serve in table formats and that's not always that easy with stack.

Daniel: What could some of those data sources be, just out of curiosity? And then I guess my next question is, how are you joining them to things like satellite imagery or maybe you're not? 

Tara: Sure. An example of tabular data is the World Ocean Database. This is a NOAA product and they basically had data from Captain Cook's voyage in 1772. They have a lot of information on temperature and salinity and a lot of other important ocean variables. This is not satellite data because they take profiles through the water column. Basically they drop an instrument and then it takes measurements all down thewater column at different depths. And this can be served either as a X-ray or it can be served as a table.

There's a lot of biodiversity information out there that's only available as tabular data. A lot of information about human assets or where oil rigs are or pipelines are, that's all tabular. Currently we're not necessarily joining it with satellite data. We don't have an example of that, but we're figuring out a way to potentially do that in the future.

Daniel: And could you talk a little bit about the app environment that you're planning to build on top of this and what function that might serve?

Tara: Sure. We have a few examples of that already with apps that we've built, but in the future, we want other people to build apps on top. We've built an app actually in collaboration with the Norwegian Seafood Authority, where people can go and explore sea lice data at different fish farms. It's a nice, yeah, it's a nice visual tool for people to go and explore. And then the different fish farms can see what's going on around, because of coursethat affects them. If there's a lice outbreak in one farm, it'll affect another farm. 

And then we had collaboration with the Lloyd's Register, who built an app on top of the platform about the ocean safety economy, I believe. You can go and look at different countries and see what their ocean safety index is. Essentially trying to get the data to be more digestible by people who want to use it, people in government or policymakers. 

Daniel: But when you say an app, just to clarify, are we talking about a web-based map? Are we talking about a web-based something something dashboard? Is that an app?

Tara: Yes, that's basically what we're talking about.

Daniel: If I'm a data scientist, let's say I am a data scientist, I'm really interested in the marine environment. I go to your platform, I find the data that I'm looking for, because you've done all thisgreat work to aggregate it, make it easy. There is a compute environment there that I can work on. I can create my seven different layers I want to make available in the app. I can write some functionality and I can save this as an app, as a link that I can share with people. Is that, I realize there's a broad generalization, but am I on the right track? 

Tara: Yes, yes, you are. I think we're still debating whether we are going to help host the apps or we want other people to host them. But that's the general idea.

Daniel: What's your hope? Obviously, your hope is that a lot of data scientists will do this, but what's in it for them? Are they going to make money out of this? Is it going to be, obviously, hopefully it's going to be helpful for people. I guess what I'm struggling to understand here is what is the motivation of people to come and do this work?

Tara: Yeah, that's a good question. I think we've discussed this a lot internally. I think a lot of the times people will be building these apps because of just the field that they're in or the work that they want to accomplish. I'm not sure if anyone will be making money off of it.

I mean, they could be, they could build a tool that's really useful to a lot of businesses and they could potentially sell it. But I think mainly we want this to be a communication tool, right? It's a lot of data scientists can go play with the data and get really important insights out of it. But there are a lot of people who are not data scientists, right? And we have to figure out a way for them to understand the data without going in themselves. I think this could potentially be a really useful tool for a lot of businesses or for governments to show what they're working on and why what they're working on is important.

Daniel: There's a lot of people that, and I totally understand what you're saying here. There are a lot of people that are not data scientists. It's difficult for them to go in and work with this data and find the answers, if you will. There are a lot of data scientists that are not real great at communicating what they've found. Is this also part of the motivation behind building this?

Tara: Yeah. And we currently we have a team with a designer and she has expressed many times that if people need help communicating something, or if they need help creating something visual that will allow them to spread their message that she could help. We also have this community aspect to it where we all are trying to help each other, figure out the best way to heal the oceans. We're hoping that having a lot of examples of different things that people can build will also be an inspiration to how you can communicate certain subjects. 

Daniel: When we think about ocean data sources, what are some of the big sources that are available today?

Tara: A lot of the times the governments are in charge of these really large ocean data databases. I mentioned the World Ocean Database, which is operated by NOAA in the United States. There's the Ocean Biodiversity Information System under the IOC, the Intergovernmental Oceanographic Commission.And this is a huge database about observations of different organisms around the world. 

There's a lot of European initiatives. There's a lot of big public databases that are very widely used by scientists. But as I mentioned before, there's also a lot of data that is locked away either by industry or by scientists. And these are not as readily available. There's a few challenges to access this data, even if it's out there. And having to go to different websites to access them, even knowing that they exist is a challenge.Trying to get them all in one place and accessed in the same way, I think, would make a big difference.

Daniel: Can you give us an idea of what format that these databases are in? Are we talking about an FTP site where you're clicking through folders to find data? Is it, is there some cataloging happening? Are they links to a bunch of different files? What are we talking about when we talk about the database that NOAA has, for example?

Tara: The World Ocean Database is currently available in a few different ways. There is this thread server and you can go and you can click on the different variables that interested in or the different dates and then just download the data directly. They also have their own tool where you can do some filtering and then they'll email you the results once they've pulled the files that you need. And they're also working on getting their data in the cloud.

As I mentioned a lot of this is an initiative for a lot of groups, but they, they can be a bit resource strained sometimes, but OBIS, the Ocean Biodiversity Information System, they have an API that makes it really easy to pull the data. There's other data sources where you just go and you click and you download a CSV. It can vary quite a bit.

Daniel: I think we mentioned either in this conversation, we talked about this idea that we're missing a lot of data as well. And I'm wondering if you had to choose between cataloging the data that we've got. We talked about these different sources and when I say cataloging, I mean arranging it in a way that all of the different sources were discoverable, searchable, and perhaps even in a common format so we could just download them easier. Or if you had to collect more data, which one of those two things would you choose? Which one of those two things do you think would have the biggest impact?

Tara: I think that's a really difficult choice. I don't know if I could actually choose, but I will say that having a good catalog of the data we have will make it much easier to target our resources and understand where we're lacking data and where we have to go collect more data. And of course, by cataloging it, as you mentioned, it makes it a lot more reusable. If they all have consistent metadata and consistent ways of interacting with the data, of course, that makes a huge difference. But as I mentioned earlier, 80% of the ocean is still unmapped.There is still quite a lot of data to collect, but it would be nice to know exactly where that needs to be. 

Daniel: It sounds like there's a bit of room for improvement in both those areas.

Tara: Exactly.

Daniel: A lot of people that are listening to this right now, they'll be thinking “Oh, we need some data. Why do we need data?”

Because a lot of us involved in earth science in some way, shape or form, we understand data is a good thing to have. There'll be a lot of data scientists listening to this and thinking “It'd be great if the compute was right there. It'd be great if I can easily make these apps you're talking about and sharing it.”

But when I go to your website, there seems to this focus on industry as well. And I'm curious how you're going to motivate them. How are you going to get industry to participate in this? And by participation, my guess is you're asking them to give you data.

Tara: Yes, that's a great question. We've spoken to a lot of industry and the response we usually get is, ”Yeah,we're happy to share the data.” But then that's where it stops. 

What we've done actually is our Science Lead at our company, Anna Silyakova, she is running these workshops where she brings together industry and scientists. And basically what happens is scientists say, “OK, these are the models that we're running. This is the data that we're lacking. Industry do you have this data?”

And industry will say, “Actually, yes, we do.” And if they have a very concrete example of what scientists need, they're much more willing to share it than just say “OK, we can share data, but what do you actually need?”

And then, of course, industry uses a lot of the models that the scientists put out there. It's helping them, benefiting them as well if these models are more accurate. By bringing them together, we can have more concrete examples and industry is much more willing to actually share the data.

And then, of course, by having an infrastructure where they can easily do that, such as the platform that makes their lives a lot easier, they don't have to figure out, “what's the best way to do this? how can we share it?” 

Daniel: Have you run into the situation where they're like, “Well, I'd rather you didn't know”?

Tara: Yeah, of course. I mean, that's one of the reasons why industry is not openly sharing data all the time.But I think there are things that they can share without having that fear. And so it's important to even get that if we can.

 A lot of other reasons is sensitivity as to where they're operating where they're looking for their next project.And then in that case, for example, we can potentially get the data in an aggregated form that doesn't give away ship locations, for example. There are always solutions.

And I think most of them are actually willing to share and a lot of them are committed to this shift, especially because the industry relies on a healthy ocean. But it's just a way of figuring out how they're comfortable with it and making sure that they understand the benefits to them as well.

Daniel: In my mind, anyway, we've been talking about these two separate but obviously interlinked groups.We've got the data scientist and then we've got the industry.

Which one of those groups has been asking for this platform? If so, if they have been asking for a platform to do this, where have they been asking for it? How do you know that people want that? I guess is my roundabout way of asking this question. 

Tara: I think both of them want them for different reasons. We've done a lot of user research before actually starting to build and we kept hearing the same things over and over again, especially from researchers and data scientists that it's just really hard to get the data that they need or to even know that it exists.

And then a lot of the times they spend so much time trying to figure out how to use the data. Just as an example, for me, I'm a biologist, I'm not an oceanographer. And the first time that I had to use oceanography data, I was very confused. It took me a really long time to figure out what are these file formats? How do I access this data?

And then from the industry side, a lot of them, like I mentioned, they want to change and they want to contribute to this green shift, but they don't really know where to start.

Our offerings are a bit different to the different groups. I think with industry, we're mostly helping guide them and let them know how they can be part of the screen shift. And then with the data scientists and researchers, we're providing them with tools that make them more efficient. And then to add to that, we had a few private releases of the platform and we've gotten a lot of feedback.

One of the things that we kept seeing is that people really play with the data in this JupyterHub environment.They don't have to set up their own environment and that the same line of code will pull oceanography data, it will pull biodiversity data. They don't need to figure out where all the data lives and how to interact with it.

Daniel: When I hear you talk about this, I'm wondering is this the story of a startup? Is this the story of a scientific breakthrough? Is this the story of earth scientists trying to do the right thing? Where do you land on that? What do you think of when you think about the HUB Ocean platform? How do you think of it? 

Tara: It's a good question. I don't know if I've really thought about it. I guess I would say we are a startup trying to create this paradigm shift with how people use data, but also how people view data sharing, right?

We're also trying to create this cultural shift of everybody working together, sharing their data, including industry, but scientists as well.

Daniel: How do you navigate that as a data scientist? Everyone is involved in some industry in some respects, but it doesn't sound like you're running your own business. It doesn't sound like you have your own shipping company or fish farm or whatever else. It sounds like you're a data scientist who's working with those people.But when you talk about it and talk to them, you need to talk in another way to those people that need to hear a different set of messages. And trying to span that gap between people over here, “Just get out of my way and give me the data, please, so I can do some science.” And then industry is like, “What's the ROI on this? Why should I do this? What's the benefit for me?” And then you're in the middle, you're the data scientist in the middle building this platform. How do you navigate all that?

Tara: Yeah, I struggle with this a lot, actually. The thing is, so there are, of course, data scientists working in industry as well. And like you said, they're just like, “Okay, I just want the data. I want to work with it.” But in order to get to them, you have to go through the higher level industry people, right? You have to convince them first.

And so that's a very different way of communicating. And I think a lot of the ways that we actually do that is, as I mentioned, these apps that we're building, right? For those types of people, it's really important for them to have something visual to look at, to understand the implications of the data. And then the more technical tools are for the data scientists.

But for the data scientists in industry to get to the point where they're using them, we have to convince their leaders that it's worthwhile, right? 

Daniel: I recently heard someone talking about their time in a satellite startup. And one of the biggest mistakes they made was that they were talking about satellites and space, and that got them a seat at the table. They could sit down and talk with the people, but then they forgot to stop talking about satellites and space when they moved on further down the chain and started talking to the people that were going to implement these products, because they didn't care about satellites and space. They cared about what is the answer to the problem, they needed a different set of messaging. And I guess in a lot of ways, this is the same situation that you're in now.

Tara: Yes, that's true. I mean, we have people at our company who are much better at talking to those peoplethan I am, right? I get pulled in mostly if they have technical questions and things like that. But we are also trying to provide actual solutions.

For example, we have this, our own data set, which estimates emissions from shipping, right? And this has been a really useful tool for a lot of the shipping industry, and for companies that are finance companies that want to invest more in sustainable practices and sustainable industries. We're trying to do a lot.

But of course, to try to get industry to change the way that it thinks and rewire industry, as I mentioned, that requires a layer above the technical aspects of the platform.

Daniel: Yeah, I think this is a really important message for people in a similar position to hear, it's that there's different messages for different people. What will success look like?

Tara: This might sound a bit cliche, but and I'm sure you've heard this with data before, but above everything, success would look like all ocean data falling in this FAIR category, which is findable, accessible, interoperable, and reusable. That is our main goal.

And we want all data to be FAIR for scientists, for industry, for government, in a way that allows them to work together to solve some of these really important problems. And on top of that, as I mentioned, success would also be this paradigm shift and a change in how people view their data, not something that they have to keep for themselves and have it for a competitive advantage, but more a shifttowards making data accessible for the common good.

Daniel: I think there will always be some data, right, that they keep as a competitive advantage, like we can't just give this away to everyone else. But I wonder if the big shift would be not just assuming that every data set is like that, that the company has that, oh, we could all benefit from, from these different data sets. And we're not giving away all our secrets if we, if we just share them. 

Tara: Yeah, exactly. And it's not, it's not really just industry too, that keeps their data private for competitive advantages. This also happens in the science world and in academia, I mean, there's a lot of scientists out there who don't share their data. It's really this cultural shift that we're looking for. People don't share their data either for a competitive advantage or because they just don't know how, it's a lot of work to figure out the best way to do it. Trying to also make it easier for people, we're hoping will contribute to this.

Daniel: This is a thought that's just popped into my head. We've been talking about industry sharing the data and how great that would make it for everything, for everyone.

If we had access to these different data sources, we could do a lot of stuff. I think everyone listening to this gets that. Where do you land on reproducible research? Not just having access to the data, but also what was it that you're doing with the data? Yeah, you've got these amazing results, but I can't really see that. Personally, when I read a scientific article, you could write anything you want in there, right? And sure, it makes sense. Awesome. Great. Other people have looked at it. But it's not reproducible in any way, shape or form. Not really. I mean, I can't just put it into my system and do the same thing because I didn't have access to the data, but also that magic code, right? Where do you land on that? Should scientists be more involved in this movement towards reproducible research?

Tara: Definitely yes. And I think that is changing. Now a lot of journals require you to put your data in a repository before they'll accept your paper. A lot of the times the data can be available. In some articles, as you mentioned, at the end, there's a little note that says, “Contact the author if you want the data.”

But of course, that author might have moved on to a different institution. They might not have access to the external drive that that data was on anymore. It can be actually quite difficult to get a hold of it sometimes. Of course, the other problem is, as you said, you don't have access to that magic code. And this is something that we've thought about a lot.

We're building this data catalog to try to catalog all the ocean data that's out there, whether it's on our platform or on another platform. And one of the things we really want to include is actually DOIs for codes and for GitHub repos that go along with the data so that it can be reproducible.

Of course, this will require a bit more work from the scientist side, right, to make the data really easily understandable by everyone to have a nice clean repo for people to use. But we are hoping that we're going in that direction.

Daniel: I recently saw this Twitter poll, and it feels weird to be quoting a Twitter poll. But it was interesting. The question was something like, “Why you don't contribute your code to open source?” You know, three, four options there. But most people chose the option “I don't think it's good enough, or I am afraid of critique.” It was something along those lines.

As a scientist, have you ever had that? Because no one writes the best code in the world. There's always something that can be improved. There's always something that can be tweaked. I wonder if that is a reason why people shy away from making everything fully transparent.

Tara: I think, yeah, for sure. When I was a PhD student, I started coding during my PhD, right? I was not writing great code. I was writing what I needed to get what I needed out of the data, right? But I can definitely empathize with that.

It's as someone who might not be super experienced with it, maybe you don't want to put it out there. But I'm not really sure what the solution to that is.

Daniel: Neither do I. I'm just interested to hear your thoughts on it. If you had one message that you want to get out about this platform that you're building, this thing that you're working on, what is that? What's that one thing you want people to understand? 

Tara: I want them to understand that ocean data is a really, really complex space. And there's a lot of really great people working on it, not just us. And it's the way forward is really for all of us to work together. And I think that we collaborate a lot with different people and a lot of different governmental institutions.

And we all want the same thing. And hopefully we'll get there one day. But it will take time and a lot of iteration.But the more people are involved, the faster we'll get there. 

Daniel: This has been really, really enjoyable talking with you. I really appreciate you taking the time to walk us through this. We've mentioned the name a few times. Why don't you do it again, just so people understand where they can go and check out the platform?

Tara: Sure. We are HUB Ocean. And our platform is called the Ocean Data Platform. And I believe if you just Google us, you will be able to easily find us.

Next
Next

Data Sonification: Acclaimed Musician Transforms Ocean Data into Music