# Artificial Intelligence Services

In partnership with the Horniman Museum and Gardens and the Science Museum Group, an AI framework has been developed for the CIIM product, integrating various third-party AI services like Google Vision and Amazon Comprehend, alongside the GATE toolkit from Sheffield University. This integration enhances the CIIM platform, enabling:

  • Better image and artifact descriptions for improved search
  • Creation of new connections between artifacts
  • Identification of patterns in data
  • Utilization of natural language processing
  • Task automation to save staff time

The platform allows multiple AI services to work independently or collaboratively, enriching metadata in cycles. Enhancements can be automatically added to the core data or reviewed first through a workflow.

# Putting you in control of the costs

Many AI service providers charge for use of their APIs but tend to have a “free tier” offering a maximum number of transactions over a fixed period. To help manage costs the CIIM AI framework can be configured to monitor use and ensure subscriptions stay within the free tier allowance.

# Integrations

Our AI framework has been extended to include the following services:

AI for Text
Amazon Comprehend
GATE
Google Natural Language
Google Places
IBM Watson Natural Language Understanding
Microsoft AI
SpaCy
AI for Images
Alttext.ai
Amazon Rekognition
Azure AI
Clarifai
Google Vision
AI for Audio
Open AI Whisper

# AI for Text

# Amazon Comprehend

Amazon Comprehend is a Natural Language Processing (NLP) service that uses machine learning to find insights and relationships in text. It helps in tasks like sentiment analysis, entity recognition, language detection, and topic modeling. It is a fully managed service that can be used to process large datasets and understand text in multiple languages.

# Features and Capabilities

  1. Entities Entities refer to specific pieces of information found within text, such as people, organizations, locations, dates, or other important terms. For example, in the sentence "Barack Obama visited Paris in 2015," the entities would be Barack Obama (person), Paris (location), and 2015 (date). Amazon Comprehend can automatically identify and categorize these entities in text.

  2. Key Phrases Key phrases are the important words or phrases that represent the main ideas or topics in the text. These are often nouns or noun phrases that summarize what the text is about. For example, in the sentence "The Eiffel Tower is a famous landmark in Paris," the key phrases might be Eiffel Tower and famous landmark. Amazon Comprehend extracts these key phrases to help identify the core topics of the text.

  3. Sentiment Sentiment analysis identifies the overall emotional tone or attitude expressed in a piece of text. Amazon Comprehend can determine if the sentiment is positive, negative, neutral, or mixed. For example, the sentence "I love the new movie!" would be classified as positive sentiment, while "I hate waiting in lines" would be negative. This helps in understanding customer feedback, reviews, or social media mentions.

  4. Syntax Syntax analysis helps break down the structure of the text, including parts of speech (like nouns, verbs, adjectives) and how words relate to each other within sentences. For instance, it can identify which word is the subject, object, or action in a sentence. Amazon Comprehend provides insights into sentence structure, helping with tasks like sentence parsing, and understanding how words connect and function together.

Amazon Comprehend helps computers understand and analyze text, just like how humans read and interpret information. Imagine you have a big pile of customer reviews, news articles, or social media posts. Instead of reading each one yourself, you can use Amazon Comprehend to automatically figure out things like:

  • What’s the overall feeling (positive or negative)?
  • What topics or keywords are mentioned a lot?
  • Who or what is being talked about (names, places, etc.)?

It’s like a robot assistant that helps you quickly understand and organize large amounts of text without having to read it all yourself!

Information pending...

# IBM Watson Natural Language Understanding

IBM Watson Natural Language Understanding is a comprehensive AI service for analyzing text. It offers features like sentiment analysis, emotion analysis, entity extraction, keyword extraction, and language classification. It also has a feature for relationship extraction, helping users understand how concepts in the text relate to each other. Watson NLU can be used for applications like content analysis and customer feedback evaluation.

# Features and Capabilities

  1. Entities Entities refer to specific items or concepts in the text that are meaningful, such as people, locations, organizations, dates, and other things that stand out. For example, in the sentence "Tesla launched a new electric car in Berlin," Tesla is an organization, electric car is a product, and Berlin is a location. Watson NLU can identify these entities and categorize them to help you understand the key elements of the text.

  2. Keywords Keywords are the most important words or phrases in a piece of text that give you an idea of what the text is about. These are often terms that help summarize the main topics or concepts. For example, in a news article about climate change, keywords could include terms like global warming, carbon emissions, or renewable energy. IBM Watson NLU extracts these keywords so you can quickly grasp the main points of a document.

  3. Concepts Concepts are higher-level ideas or themes that the text is discussing, which may not always be mentioned directly but are implied. For example, in a sentence like "The forest is being destroyed due to deforestation," the concept might be environmental impact or sustainability. Watson NLU can identify these broader ideas and link them to the text, helping you understand the underlying themes.

  4. Categories Categories help group the text into specific subjects or topics, like sports, business, technology, or politics. For example, an article about a new smartphone might be categorized under technology, while an article about a recent political debate would be categorized under politics. Watson NLU automatically tags the text with appropriate categories to help you sort and analyze large amounts of information based on their subjects.

BM Watson Natural Language Understanding is like a smart robot that reads and understands text, helping people make sense of written information automatically. Imagine you have a bunch of customer reviews, articles, or social media posts. This tool can:

  • Figure out the emotions in the text, like whether it's happy, sad, or angry.
  • Find important things like names, places, or products mentioned.
  • Understand connections between different concepts in the text, like how one thing relates to another.
  • Pick out key words and phrases that summarize the main points.

It’s super helpful for businesses to analyze lots of text quickly, find trends, and make smarter decisions based on what people are saying.

Information pending...

# Google Natural Language

Google Cloud Natural Language API provides text analysis capabilities, such as sentiment analysis, entity recognition, and syntactic analysis. It supports multiple languages and can process text from various sources, including documents and web content. The service allows users to analyze content, extract insights, and understand relationships within the text, making it valuable for business intelligence, content moderation, and more.

Google Natural Language is like a smart tool that helps computers understand and analyze human language, such as written text. Imagine you have a big pile of books, emails, or social media posts. This tool can automatically:

  • Figure out the overall feeling (Is it happy, sad, or neutral?).
  • Find important things like people, places, or products mentioned.
  • Spot the main topics of what the text is talking about.
  • Understand the structure of the sentences, like who is doing what in a sentence.

It makes it easier to analyze and understand lots of text quickly, without having to read everything yourself! This helps businesses and developers build smarter apps that can "read" and "understand" text just like people do.

Information pending...

# Google Places

The Google Places API allows developers to access information about points of interest, such as businesses, landmarks, and addresses. It can provide data like place names, addresses, ratings, and geographical locations. The API is commonly used in location-based apps and services, helping users search for places nearby or get detailed information about a location.

# Features and Capabilities

  1. Geo-location Geo-location refers to the specific geographic coordinates (latitude and longitude) of a place. This helps pinpoint exactly where something is on the map. For example, if you're looking for a restaurant using an app, the geo-location data allows the app to show you the restaurant's exact location on the map so you can easily find it.

  2. Organization Details/Opening Hours Google Places can also provide detailed information about organizations or businesses, such as:

    • Name and Type of the business (e.g., "Starbucks" or "Gym").
    • Address (so you know where it's located).
    • Phone Number and website link (if available).
    • Opening Hours, which tell you when the business is open or closed (for example, "Open from 9 AM to 5 PM").

The Google Places API is like a map that helps apps find and give information about places around you, such as restaurants, stores, landmarks, or any other location. If you're using an app and want to know the nearest coffee shop or check the address of a museum, this tool helps the app look up that information.

It can tell you things like:

  • What a place is called (like "Joe's Coffee Shop")
  • Where it is (the address or coordinates on a map)
  • What people think of it (ratings or reviews)
  • Details about the place (like hours of operation or type of business)

It’s super useful for apps that help you explore new places or find things nearby!

Information pending...

# GATE

GATE (General Architecture for Text Engineering) is an open-source software toolkit for text processing and analysis. It is widely used for Natural Language Processing tasks such as information extraction, sentiment analysis, and corpus annotation. GATE is designed for researchers and developers working on language processing tasks, providing tools for creating, testing, and deploying text mining applications.

# Features and Capabilities

  1. Supply Training Data In GATE, training data is used to teach machine learning models how to recognize patterns or perform tasks such as named entity recognition (NER), part-of-speech tagging, or sentiment analysis. Supplying training data means providing examples of labeled text so that the model can learn from them. For example, if you’re training a model to recognize people’s names in text, you would provide examples where names are already tagged, so the model can learn how to identify them in new, unlabeled text.

  2. Supply Thesaurus Data Thesaurus data in GATE refers to lists or databases of synonyms, related terms, and phrases that can help improve text understanding. By supplying thesaurus data, you're providing additional contextual information to the system. This can help GATE recognize different words that mean the same thing or are conceptually related, improving the system’s ability to process and analyze text. For example, recognizing that "car" and "automobile" are the same concept helps improve accuracy in text analysis.

  3. Thesaurus Alignment Thesaurus alignment refers to the process of matching or linking words in one thesaurus with words in another, or with concepts in a broader knowledge base. This could involve mapping synonyms or related terms between different languages or different sets of data. For example, aligning a list of technical terms in a specialized thesaurus with more general language words might help GATE understand domain-specific content better. It improves the system’s ability to recognize and relate terms across different contexts.

  4. Custom Annotation Parsing Custom annotation parsing in GATE refers to the ability to create and parse (or analyze) your own specific annotations. Annotations in GATE refer to labels applied to parts of text (like tagging a person’s name or an event). Custom annotation parsing means that you can define your own types of annotations that are relevant to your project or domain. For example, if you’re analyzing medical texts, you might want to annotate drug names or symptoms, so custom parsing allows you to define and extract exactly what’s important for your analysis.

GATE (General Architecture for Text Engineering) is a tool that helps computers understand and process written text. Imagine you have a giant book or lots of articles and you want to find useful information from them, like names of people, places, or key ideas. GATE helps computers:

  • Read the text and break it down into useful pieces.
  • Tag important things, like identifying people’s names or finding dates.
  • Analyze the text to understand what it's talking about, such as identifying topics or emotions.

It’s like a robot helper for reading and understanding lots of text quickly, which is great for tasks like organizing information, searching for trends, or summarizing large amounts of text.

Information pending...

# SpaCy

SpaCy is an open-source Python library for advanced Natural Language Processing. It is designed specifically for production use, with a focus on performance and efficiency. SpaCy provides pre-trained models for tasks like tokenization, part-of-speech tagging, named entity recognition, and dependency parsing. It's widely used by developers and researchers for building custom NLP pipelines and applications. SpaCy also integrates well with other machine learning libraries like TensorFlow and PyTorch.

SpaCy is a tool that helps computers understand and work with language, kind of like teaching a robot how to read and make sense of text. It can:

  • Break down sentences into parts (like subjects, actions, and objects) so the computer knows what’s going on.
  • Spot important words like people’s names, locations, and dates.
  • Understand relationships between words, like figuring out who is doing what in a sentence.

It’s fast and efficient, and it's often used by developers to build smart systems that can automatically analyze and understand text—like sorting through emails, understanding customer feedback, or even translating languages.

Information pending...

# AI for Images

# Amazon Rekognition

Amazon Rekognition is a powerful image and video analysis service provided by AWS. It uses deep learning to detect objects, scenes, activities, and people in images and videos. It can also recognize text in images, identify facial features, and compare faces for verification.

# Features and Capabilities

  1. Faces Amazon Rekognition can detect faces in images and videos. It can locate the position of faces and even analyze them for specific features, such as:

    • Facial attributes like age range, gender, emotion (happy, sad, surprised), and more.
    • Facial recognition to compare faces and identify if they belong to the same person.
  2. Labels Labels are categories or tags that Amazon Rekognition assigns to objects, scenes, and activities it identifies in images or videos. For example, it can detect and label things like "dog," "car," "tree," or activities like "playing soccer" or "sitting at a table." Labels help in automatically categorizing and organizing visual content, which is useful for searching or filtering through large amounts of images and videos.

  3. Text in Images Amazon Rekognition can also detect and read text that appears in images. This includes:

    • Printed text (e.g., text on signs, labels, or documents).
    • Handwritten text (depending on the clarity). It can extract this text for use in applications such as automated document processing, sign recognition in photos, or translating text within images. This feature is great for accessibility, content extraction, or enhancing search capabilities within images.

Information pending...

# Google Vision

Google Vision AI is an image analysis tool that allows developers to integrate image recognition capabilities into their applications. It can detect objects, landmarks, logos, text, and more in images. It also provides features like image labeling, facial recognition, and the ability to analyze image sentiment. Google Vision is widely used for applications in retail, security, and accessibility, allowing businesses to gain insights from images quickly and efficiently.

# Features and Capabilities

  1. Labels Labels in Google Vision refer to tags or categories assigned to objects, scenes, or activities detected in an image. These labels represent what the model identifies, such as "dog," "sunset," "beach," or "person with umbrella." It helps in automatically categorizing and organizing visual content, making it easier to search or analyze large datasets of images.

  2. Faces Google Vision can detect faces in images and identify key facial features like the position of the eyes, nose, and mouth. It can also detect facial expressions (happy, sad, surprised, etc.). However, it does not perform facial recognition (identifying individuals), but it can be useful for applications like emotion analysis, face detection in photos, or tracking people in videos.

  3. Text in Images Text in images refers to the ability of Google Vision to detect and extract any text that appears within an image. It can read printed text (e.g., signs, documents, labels) and handwritten text (depending on clarity). This feature is useful for automating document processing, scanning receipts, and enabling search within images by recognizing any text.

  4. Web Entities Web Entities identify and link the contents of an image to related online entities, such as websites, articles, or other resources. For instance, if Google Vision detects a famous landmark or brand in an image, it might associate that image with relevant online resources or knowledge about that entity. This is useful for connecting images to external data, offering more context and insights about what’s in the image.

  5. Landmarks Google Vision can recognize landmarks in images, such as famous buildings, monuments, or locations (e.g., the Eiffel Tower, the Great Wall of China). It can label and provide information about these landmarks, which helps in recognizing and categorizing travel or historical imagery.

  6. Localization Localization helps Google Vision understand the geographical context of the image. This can include recognizing places based on their coordinates or regional features, which is especially useful for identifying locations that are not globally famous landmarks but still hold significance within a region (e.g., local parks, restaurants, or attractions).

  7. Image Properties (e.g., Dominant Colors) Image Properties help analyze the visual characteristics of an image, such as its dominant colors. This is useful for applications in visual design, content analysis, or categorizing images based on color schemes. Google Vision can identify primary colors in an image, which can assist in organizing or filtering images by their visual aesthetic.

Information pending...

# Alttext.ai

Alttext.ai is an AI-powered service focused on automatically generating alt text for images, which is important for accessibility purposes. Alt text describes images to visually impaired users by reading the text aloud. Alttext.ai uses AI to analyze an image and generate descriptive text that explains what’s in the image, helping websites and apps become more inclusive for users with disabilities.

Information pending...

# Azure Image Cognitive Services

Azure Image Cognitive Services, part of Microsoft's Azure platform, offers various tools for image processing and analysis. It can perform tasks like image tagging, object detection, facial recognition, and analyzing content for explicit or inappropriate imagery. The service can also provide image descriptions for accessibility. Azure's image services are used to build intelligent applications that can understand and process visual content efficiently.

Information pending...

# Clarifai

Clarifai is an AI platform that provides advanced computer vision and machine learning tools for image and video recognition. It enables businesses and developers to integrate visual recognition capabilities into their applications. Clarifai uses deep learning to analyze images, videos, and text.

# Features and Capabilities

  1. Object detection Identifying and labeling objects within images (e.g., people, cars, animals).

  2. Face recognition Detecting and recognizing faces in images and videos.

  3. Text recognition Extracting text from images, such as signs or documents.

  4. Content moderation Automatically detecting and filtering inappropriate content in images and videos.

  5. Custom model training Allowing users to train models specifically for their own use cases.

Clarifai is used across various industries, including retail, security, healthcare, and entertainment, to automate visual tasks, improve user experiences, and streamline workflows. It offers a user-friendly interface and powerful API integration for easy deployment.

Information pending...

# AI for Audio / Video

# Open AI Whisper

OpenAI Whisper is an advanced speech-to-text model developed by OpenAI that can transcribe and translate audio into text. It is designed to recognize and understand speech in multiple languages, offering high accuracy even in noisy environments or with varying accents.

# Features and Capabilities

  1. Speech-to-text transcription Converting spoken words into written text, making it useful for applications like transcription services, voice assistants, or captions for videos. It performs well in challenging audio conditions, such as background noise or overlapping voices.

  2. Multilingual support It can understand and transcribe speech in several languages, and even translate audio from one language to another.

Whisper is used for improving accessibility, enhancing voice recognition systems, and automating the process of transcribing or translating audio content. It's part of OpenAI's efforts to make advanced AI models more accessible and effective for real-world applications.

Information pending...