Information extraction and the missing Mark2Cure module

In our previous post, we asked readers, 'What is your preferred moniker?'. Here is the response:

Mark2Curator: 36%
Citizen Scientist: 36%
Contributor: 18%
"Anything BUT volunpeer": 10%

Although it may seem a little strange that researchers have been struggling to find an answer to the "What's in a name?" issue for discussing citizen science, this struggle is a deeply representative of some of the important work biocurators do. "What's in a name? A citizen scientist by any other name still makes important contributions"

Researchers need a common vocabulary to be able to coherently exchange information, but settling on that vocabulary--on how that vocabulary is structured is difficult. Without a common vocabulary, it is easy for scientists to miss research that is valuable to their field of study. Although it has yet to be seen how the citizen science research community will settle this issue, in biomedical research, biocurators help with that sort of determination. Biocurators help standardize terms, define the rules governing how terms are classified and how they are organized. In doing so, they facilitate information quality control and exchange. Biocurators do all this and more.

Given that biocurators do very important, very tedious, and often very difficult work, one question we get quite a bit is:

"How is it possible to train citizen scientists to replace such important, skilled researchers?"

But this question is built on a fundamentally incorrect assumption about the goals of Mark2Cure. We KNOW biocurators do very important work, and that one of the most tedious, and time-consuming things that they do is information extraction.

Information extraction can generally be broken down into three tasks:
1. Named Entity Recognition (identifying and classifying words/phrases in text)
2. Normalization (linking that text to an ontology)
3. Relationship Extraction (identifying the relationship between different entities).

We want to train citizen scientists to help with this task, so that biocurators can apply their unique training towards solving problems in biomedical research analogous to the ones we're seeing in the citizen science field.

Since Mark2Cure is a citizen science project, the "What's in a name?" issue applies to us as well. Although our informal poll was only for fun, I was personally very happy with the results for two reasons:

1. I am a fan of wordplay, and I love that many users liked the term Mark2Curator--a term which blends Mark2Cure and biocurator. I love science puns

2. Even if I'm reading too much into it, I like to think that our users picked 'citizen scientist' or 'contributors' because they feel that the help they provide to Mark2Cure is important--because it is.

If you've gotten this far, you are probably one of our many astute readers and may have noticed that information extraction was divided into THREE tasks, when Mark2Cure only has TWO. Where is the third task? Why is it the missing task is the step in between the first and the last task?

The missing task, 'Normalization', is the task in between NER and Relationship Extraction. We started with NER because NER has been well-investigated so there was a solid foundation for us to build upon. We followed with the relationship extraction task because this would allow us to unlock some of the most difficult to access and valuable information in the text.

As for the Normalization's currently in being built by volunteers. Mark2Curators have been helping us investigate NER mappings to different ontologies, and a very talented programmer and machine learning expert has been busy building the Normalization module. But we could use more help. We need feedback on potential interfaces for how parts of the module might work. If you'd like to help with that, answer the poll in our newsletter.

Of note for our U.S.-based Mark2Curators over 65 years of age.

Did you know? US National Park Services has a lifetime pass for seniors that will allow you to enter or park at US national parks for free or at a discounted rate. These passes only cost $10 now through August 27th. After August 28th, the price will go up to $80.

If you enjoy hiking, nature, or plan to visit any of our beautiful national parks, you may want to get your pass while it's still $10. In San Diego, the closest national park where you can purchase one in person is Cabrillo. To find the national park closest to you, visit the NPS's site. If you don't live near a park, but plan on visiting some in the future, you can purchase a pass by mail or online.

Join Mark2Cure and Dazzle4Rare

From August 13th to August 20th, Mark2Cure will be participating in the #Dazzle4Rare campaign to raise awareness for rare diseases. Did you know? About 10% of the population lives with a rare disease, and roughly 50% of rare diseases don’t have any sort of disease-specific foundation to support or research those diseases. See more interesting statistics about rare disease at Global Genes.
If you have a rare disease story you would like us to highlight for the campaign, please get in touch!

What's new in Mark2Cure?
The EDEM1 Entity Recognition mission is over 95% complete, please help us finish it so we can launch the next one. If it seems like we’ve been quiet lately it’s because we’ve been preparing for some major updates. If you’re curious about what’s in the pipeline or would like to preview/provide feedback for potential future interface designs, we’d LOVE to hear from you! Your feedback is how we improve! If not for our many marvelous Mark2Curators providing constructive criticism, Mark2Cure would be a clunky and more difficult to use platform.

Speaking of our volunteers, citizen scientists, participants, contributors, volunpeers, and Mark2Curators…there was an interesting discussion earlier today within the citizen science community on the best way to address the amazing people who help make science happen. In fact, a bunch of researchers even wrote an interesting paper about the pros and cons of different terminology

Which takes us to our current poll.

Lastly, there is an ongoing effort to increase discussion, collaboration, and cooperation within the citizen science (or whatever you wish to call it) community. This has led our friend Alice to introduce #CitSciStories. You may think that your contributions to science in your spare time are no big deal, but from the perspective of the researchers who rely on these are amazing! Inspiring! Awesome beyond words! We love what you do and we love learning from you and getting to know you. If you'd like to share your story and inspire others to help science, please get in touch with Alice (@PenguinGalaxy). You can learn more about the #CitSciStories effort, here.

Upcoming #CitSciChat on Biomedical Citizen Science

New Mark2Cure Video added to our youtube playlist!

The Citizen Science Conference in May was very productive, and the last of Mark2Cure's recorded talks is now available on our youtube channel. As previously mentioned, Max delivered the project slam for Mark2Cure and was selected as one of the top three to deliver an abbreviated version during the 'Night in the Clouds' event.

View the two-minute talk here:

Biomedical CitSciChat on Wed. July 19th, at 11:00am PT

Speaking of the conference, we were able to connect in person with a lot of lovely people in the citizen science arena, especially the amazing people from @EyesOnAlz, @CitSciBio, and @CochraneCrowd. Because we're all passionate about bringing citizen science to biomedical research, we organized a panel for a biomedical #citscichat. Caren Cooper (@CoopSciScoop) kindly agreed to moderate the chat as usual, and Pietro (@pmichelu, @EyezOnAlz) was able to convince @foldit's Seth Cooper to join the panel.

What: Hour long chat on biomedical citizen science (#CitSciChat)

Where: online via twitter

When: Wed July 19 2:00pm ET (11:00am PT)

Why: Because citizen science is used in biomedical research too

Who: Everyone interested in citizen science is welcome to join this chat which will be moderated by citizen science expert and author, Caren Cooper. The panel so far includes:

  • Mark2Cure of course! Mark2Cure is a citizen science project for addressing the big data issue of biomedical literature. Citizen scientists help look for clues about NGLY1-deficiency in curated literature. (@Mark2Cure/@gtsueng, @x0xmaximus, @AndrewSu)
  • Cochrane Crowd is a citizen science project from the Cochrane Collaborative, and also looks to make biomedical literature more useful. Citizen Scientists help identify randomized controlled trials so that Cochrane Reviewers can use them to answer important medical questions. (@Cochrane_Crowd, @annanoelstorr)
  • EyesOnAlz/Stall Catchers is a citizen science project from the Human Computation Institute to identify blood blockages in short videos of the brain. Their game is super fun, helps with Alzheimer's research AND they have a major event (Catchathon) coming up. If you would like to host a local catchathon, check out this post. (@EyesOnAlz, @seplute, @Clair_csg, @pmichelu)
  • CitSciBio is NIH's new biomedical citizen science hub. It is sponsored by the Division of Cancer Biology at the National Cancer Institute. There are tools for collaborating, creating projects, and now you can login via your scistarter account. (@citscibio)
  • is a long standing, and very successful citizen science game which empowers gamers and volunteers to help determine the structure of proteins important to biomedical research. Seth Cooper from Northeastern University has agreed to join the panel to share about this wildly successful project. (@UWGameScience)
  • Beat the heat and help science!

    Need an excuse to stay indoors, avoid chores, and avoid the summer heat? Look no further! One of our current missions is over 80% complete. Help us finish it!

    Happy Fathers Day!

    A HUGE thanks to all the dads (and EVERYONE) who has been contributing to make a difference for the NGLY1 families.

    Shipping delays Apologies to international prize and drawing winners who were waiting for their prizes. Most of the international packages that we shipped out in May/June have been returned to us due to customs issues (fortunately, this happened at some point prior to shipping so the postage on these is still good, unfortunately, it took a long time for these to get back to us so we can address the issue). We’ll be trying again to get these out ASAP.

    Max’s original project slam now online As mentioned in our previous newsletter, Max delivered the project slam for Mark2Cure at the Citizen Science Conference in Minnesota. The project slam talks were supposed to have been recorded and still may be released by the Citizen Science Association someday, but we couldn’t wait. Here’s our recording of Max’s project slam. He finished within his allotted four minutes, and was engaging enough to win one of three invitations to deliver an even shorter version of the slam at an even the following day.
    You can check it out here:

    You be the scientist! One thing we’ve heard (and quite agree with) at the Citizen Science Conference is that trained volunteers are capable of doing more than simple tasks. Mark2Curators have very much fed into the tutorial process, and played an important role in testing and improving the design of the interface. The entities our users have identified from the text have already yielded interesting clues which we’ve used to expand the set of documents to investigate, and by now, there are users who have read a lot of abstracts—A LOT! If you’ve read something that sticks out in your mind as being potentially related to NGLY1-deficiency, share it with us! We’d love to hear YOUR hypothesis on what might be an interesting term to explore and why.

    Happy Memorial Day weekend!

    The last few weeks have been a bit hectic, so we've got plenty of news and info to share with you.

    First of all, if you haven't seen it yet, Cochrane Crowd has posted about about our joint webinar and the #MedLitBlitz. If you missed the webinar or had technical difficulties/time zone issues with it, it's available on youtube. The prize packages for the top three participants of #MedLitBlitz are packed and will be shipped either today or early next week (depending on whether or not shipments have been picked up for today or not).

    Secondly, Mark2Cure was at the Citizen Science Association conference from 2017.05.15-2017.05.20, and was fortunate enough to share about YOUR work to an audience of scientists who LOVE citizen science! More than a few researchers stopped to introduce themselves to me and spoke highly of our community! Although it's always weird to hear a recording of your own voice, I recorded my presentation because it wouldn't be fair to talk about the amazing work you've done without sharing it with you! You can find my presentation for the biomedical session in our youtube channel. On a side note, I know the audio quality isn't the best which is why I've transcribed it using youtube's captioning software. If you have trouble hearing the presentation (because of the poor audio quality), please turn on the closed captions.

    Max also delivered two lightning talks for the event, which I hope to upload soon.
    Not available yet, but will be soon In addition to the talks, we had a poster for Mark2Cure and a table at two public events.
    Max spreading the love for Mark2Cure

    We were especially pleased to be so close to our buddy at Cochrane Crowd for this event
    Cochrane Crowd looking good

    Lastly, it looks like one of the missions was completed just as I was settling back in after the conference. A HUGE thanks to everyone that helped complete the carpingly mission. A new mission has been launched in its place, so check it out if you have some free time.

    MedLit Blitz, Mark2Curathon Results and More

    Mark2Curathon Results

    MedLit Blitz, Mark2Curathon Results and More

    Sorry for the delay, the Mark2Curathon results are finally in! During the Mark2Cure portion of MedLit Blitz, we had 34 participants contribute over 16,000 annotations. Because both the entity recognition and the relationship extraction tasks are very different from Cochrane's screening task, we had to take some additional considerations when tallying the results.

    For the Relationship Extraction module, multiple annotations per abstract were possible as each abstract could have any number of concept pairings. Hence, for the relationship extraction module each annotation submitted counted as one task unit

    For the Entity Recognition module, only one submission was possible per abstract, but users needed to identify three different types of entities. Hence, each abstract completed counted as three task units (one for each concept type--genes, treatments, diseases). Additionally, a tiered bonus multiplier (of an additional 2% to 15%) was applied for users who submitted high quality annotations.

    The RE and ER tasks units were then added together for each user, and sorted from highest to lowest in order to determine user ranking for the event. Without further ado, these were the top 15 participants in the Mark2Curathon:
    1. ckrypton
    2. Dr-SR
    3. TAdams
    4. hwiseman
    5. Kien Pong Yap
    6. skye
    7. ScreenerDB
    8. priyakorni
    9. Judy E
    10. pennnursinglib
    11. Calico
    12. AJ_Eckhart
    13. uellis
    14. sueandarmani
    15. nclairoux

    A huge thanks to you all, and everyone who participated for making our first adventure with Cochrane Crowd so successful!

    To qualify for the MedLit Blitz prize, Mark2Curators had to have contributed to the Cochrane Screening Challenge as well.

    MedLit Blitz Results

    We are in the process of contacting the winners and hope to have an update about this soon.

    Mark2Cure at Citizen Science Association Conference 2017

    Max and I have arrived in Twin Cities, Minnesota for the Citizen Science conference. Mark2Cure was accepted as part of the symposium on biomedical citizen science. Additionally, Mark2Cure was also accepted for a poster presentation and for the project slam. If that doesn't sound busy enough, Mark2Cure was accepted for a table at the 'Night in the Cloud' event (open to the public). If you are in town, please stop by our table!

    About the prizes

    Winners will receive a Mark2Cure mug, marker, novelty item, in addition to any prizes that Cochrane has prepared for this event.

    The Mark2Curathon starts now!

    The Mark2Curathon starts now!

    Our anniversary celebration with Cochrane Crowd is well under way. #MedLitBlitz started with a webinar on Monday, and was followed by the Cochrane screening challenge from Tuesday to Wednesday. During that challenge, over 100 MedLit Blitzers screened 29,494 citations--over nine THOUSAND more than the initial goal of 20,000!

    But the celebrations aren't over yet. It's now time for the Mark2Curathon portion of #MedLitBlitz!

    For this part, we've launched 3 new missions in the Entity Recognition module. To be clear, all annotations (regardless of whether they were submitted via the Entity Recognition or Relationship Extraction module) will count towards #MedLitBlitz as long as they fall within the time frame of the event. If you don't see the new ER missions, log out, clear your cache and log back in.

    As with Cochrane Crowd, we will be active on twitter; however, we know that many of our most ardent Mark2Curators do not use twitter. For this reason, we will also be sharing updates via our chat channel. As with our previous Mark2Curathons, no sign up is required to chat on this channel, and we encourage you to join us there.

    For ease of tracking, here's the countdown till the end of the event:

    If you participated in the Cochrane screening challenge as part of #MedLitBlitz we'd love to hear about it! It's been really fun working with Anna and Emily over at Cochrane Crowd, we'll definitely look forward to working with them in the future. If you've enjoyed our collaborative effort, feel free to ping some praise to @AnnaNoelStorr and @cochrane_crowd.

    Webinar, Mark2Curathon, and more

    Webinar, Mark2Curathon, and more

    It’s citizen science season and we’re in the thick of it!

    First off, welcome new users! If you found us from the latest SciStarter campaign, feel free ping us on twitter to let us know so we can pass our thanks to the @SciStarter team! We’re very excited to be featured as part of SciStarter’s recent focus/feature on biomedical citizen science! Note, if you complete your SciStarter profile this month, the SciStarter team will send you a free digital copy of The Rightful Place of Science: Citizen Science. See their post for more details

    Citizen science has enormous potential, and we’re glad that Mark2Curators are helping us explore its application towards biomedical discovery.

    As mentioned last week, we’re not the only ones who need your help for dealing with the biomedical literature. Cochrane Crowd is reaching its first anniversary in joining this domain of citizen science, and we’re celebrating together! We will be jointly hosting a webinar on May 8th and there will be two 24hr screening challenges. There will be prizes for the top three contributors who take part in both the Cochrane Crowd and Mark2Cure screening challenges. Here are the details:

    Mark2Cure/Cochrane Crowd Webinar:

    Date/Time: May 08, 2017, 9:00am – 10:00am PDT

    Tentative agenda:

    1. Intro (5 minutes)
    2. Mark2Cure presentation (15 mins)
    3. Cochrane Crowd presentation (15 mins)
    4. MedLit Blitz (5 minutes)
    5. Audience Q&A (15-20 mins)

    Interested in participating in the webinar? You’ll need to register first! Hurry, space is limited (due to limitations/licensing restrictions) of the webinar software. Register here

    Medlit Blitz (2 x 24 hr screening challenges):

    Cochrane Challenge: Help Cochrane Crowd identify studies that provide the best possible evidence of the effectiveness of a health treatment. Once identified by the Crowd the studies go into a central register where health researchers and practitioners can access them. The more studies identified by the Crowd, the more high-quality evidence is available to help health practitioners treat their clients.

    Challenge Start: May 9th, 2017 10am GMT + 1 (UK time zone) / 2am (PDT)

    Challenge Finish: May 10th, 2017 10am GMT + 1 (UK time zone) / 2am (PDT)

    Mark2Curathon: Join the search for clues on a rare disease by identifying genes, diseases, drugs, and the relationships between these based on literature surrounding the NGLY1.

    Challenge Start: May 11th, 2017 7pm GMT + 1 (UK time zone) / 11am (PDT)

    Challenge Finish: May 12th, 2017 7pm GMT + 1 (UK time zone) / 11am (PDT)

    Get ready to use your reading skills to make a difference in biomedical science and health!!!