Argument Mining API documentation


The Argument Mining API serves as a direct interface to the ArgumenText project. It can be used to bypass the user interface and directly access the search engine via HTTP POST. Two different APIs are provided; the search API and the classify API.

Request API Access

If you want to have access to the API, please register on this site.

With an API login, permission is granted (free of charge) to use the services provided on this website. We make no warranties at all.

Classify API

Classifies given texts based on a search topic.

The classify API works similar to the search on the main page but takes list of sentences or a text or an URL with text as an input instead of searching for documents in an index. The classify API can be accessed via https://api.argumentsearch.com/en/classify. Both input and output are in JSON format.

The classify API also supports documents in PDF format as byte stream. The POST request has to be sent with Content-Type multipart/form-data. All parameters are passed as form fields. Example 4 below shows an exemplary query via curl.

Parameters

The description of each key in input JSON is described below -

topic : str
Search query
sortBy : str

Sort by argumentConfidence (Default), argumentConfidenceLex, or none.

argumentConfidence: Sort by average of argument and stance (if applicable) confidence.

argumentConfidenceLex: Sort by Numpy's Lexsort.

none: Keep original input ordering.

userID : str
Pass your personal userID.
apiKey : str
Pass your personal apiKey.
model : str
Model to be used. Available options are default (Default), default_topic_relevance.
sentences : list of str
Sentences to be classified. Either sentences or text or targetUrl can be used at a time.
text : str
Text to be classified. Either sentences or text or targetUrl can be used at a time.
targetUrl : str
URL of the texts to be classified. Either sentences or text or targetUrl can be used at a time.
predictStance : bool, optional
Predict stances of arguments if true (Default: true).
computeAttention : bool, optional
Computes attention weights if true (Default: true). Does not work for BERT based models.
showOnlyArguments : bool, optional
Shows only argumentative sentences if true else shows all sentences (Default: true).
removeDuplicates : bool, optional
Removes duplicate sentences if true (Default: true)
filterNonsensicalEntries : bool, optional
Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words (Default: true)
topicRelevance : str, optional

Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec (Default:None).

match_string: Selects sentences if the provided topic is in the sentence.

n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.

word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.

topicRelevanceThreshold : float, optional
Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1 (Default: 0).
normalizeTopicRelevanceThreshold : bool, optional
If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold (Default: false).
userMetadata : str, optional
Custom meta data in form of a String that will be returned (unmodified) with the result of the query (Default: "").

Returns

JSON

The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.

metadata

  • modelVersion: Version of the current running model (e.g. 0.1).

  • timeArgumentPrediction: Time needed to predict the arguments in seconds.

  • timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.

  • timePreprocessing: Time needed to preprocess the documents/sentences in seconds.

  • timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.

  • timeLogging: Time needed to store the query to the database.

  • timeTotal: Time needed to process all data in seconds.

  • totalArguments: Total number of sentences that are arguments.

  • totalContraArguments: Total number of contra arguments that were found in the data.

  • totalProArguments: Total number of pro arguments that were found in the data.

  • totalNonArguments: Total number of sentences that are no arguments.

  • totalClassifiedSentences: Total number of sentences classified.

sentences

  • argumentConfidence: Confidence that sentence is an argument.

  • argumentLabel: Argument label for sentences (argument or no argument)

  • sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).

  • sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).

  • sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it's the same score as argumentConfidence.

  • stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).

  • stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).

  • weights: Weights that signal the importance of each word of the sentence (only if computeAttention=true, e.g. [0.2, 0.3, 0.4, 0.1]).

Example 1: Classify API with sentences as input

{
    "topic": "Nuclear power",
    "sentences": [
        "Nuclear power is awesome.",
        "Nuclear power is awesome, because of its nearly zero carbon emissions.",
        "Nuclear power is dangerous, because it produces radioactive waste."
    ],
    "predictStance": true,
    "computeAttention": true,
    "showOnlyArguments": false,
    "userID": "yourPersonalUserID",
    "apiKey": "yourPersonalApiKey"
}

Example 2: Classify API with text as input

{
"topic": "Nuclear power",
"text": "Nuclear energy outputs nearly zero carbon emissions. But it is also dangerous,
        because of the nuclear waste it produces.",
"predictStance": true,
"computeAttention": true,
"showOnlyArguments": false,
"userID": "yourPersonalUserID",
"apiKey": "yourPersonalApiKey"
}

Example 3: Classify API with URL as input

{
"topic": "Brexit",
"targetUrl": "https://www.washingtonpost.com/world/2018/12/14/is-theresa-may-bad-
            negotiator-or-is-brexit-just-an-impossible-proposition-answer-yes",
"predictStance": true,
"computeAttention": true,
"showOnlyArguments": false,
"userID": "yourPersonalUserID",
"apiKey": "yourPersonalApiKey"
}

Example 4: Classify API with PDF document as input

curl -X POST \
    -H 'Content-Type: multipart/form-data'\
    -F "pdf=@/path/to/file/nuclear_energy.pdf" \
    -F "userID=yourPersonalUserID" \
    -F "apiKey=yourPersonalApiKey" \
    -F "topic=nuclear energy" \
    -F "predictStance=True" \
    'https://api.argumentsearch.com/en/classify'

Raises

KeyError
If unknown model is used.
Cluster API

Clusters given arguments by their similarity.

The cluster API can be accessed via https://api.argumentsearch.com/en/cluster_arguments. Both input and output are in JSON format.

Parameters

The description of each key in input JSON is described below -

arguments : list of str
All arguments to be clustered as a list of strings
threshold : int
Similarity threshold that is necessary between two arguments to cluster them into the same cluster
min_cluster_size : int
All clusters must hold at least that many arguments.
model : str
The model to use for clustering. Available options are SBERT (Default).
userID : str
Pass your personal userID.
apiKey : str
Pass your personal apiKey.

Returns

JSON

The output JSON contains the two keys metadata and clusters , which are explained below. clusters are a list of dictonaries (=clusters) and the parameters listed below are returned for each of the clusters. The input parameters are also returned through the metadata parameter.

clusters

  • id: The id of the respective cluster.

  • sentences: A list of sentences contained in the respective cluster.

  • size: The number of sentences in the respective cluster.

metadata

  • arguments_count: Total number of arguments over all clusters.

  • clusters_count: Total number of clusters.

  • execution_time_seconds: Time to execute the clustering.

  • min_cluster_size: Input parameter.

  • threshold: Input parameter.

Example: Cluster API

{
    "arguments": [
        "That could benefit the operator of a fleet of electric vehicles .",
        "In theory , these reactors are at greatly reduced risk of a Fukushima-style accident .",
        "There is always the possibility of a breakthrough that would make nuclear safe .",
        "This means that physicists will be able to investigate rare phenomena and make more accurate measurements .",
        "These various designs are meant to be cheaper to build and operate-and much safer-than conventional reactors .",
        "It would provide a stable and greenhouse gas-emission-free energy source , says the IAEA."
    ],
    "userID": "yourPersonalUserID",
    "apiKey": "yourPersonalApiKey",
    "threshold": 0.2,
    "min_cluster_size": 2,
    "model": "SBERT"
}
Search API

The search API works similar to the search on the main page. The search API can be accessed via https://api.argumentsearch.com/en/search. Both input and output are in JSON format .

Parameters

The description of each key in input JSON is described below -

topic : str
Search query
index : str
Index server to use (Default: cc).
sortBy : str

Sort by argumentConfidence (Default), argumentConfidenceLex, or none.

argumentConfidence: Sort by average of argument and stance (if applicable) confidence.

argumentConfidenceLex: Sort by Numpy's Lexsort.

none: No ordering (keeps the ordering from the index results).

numDocs : int
Number of documents scanned for arguments. The higher the number, the longer the process may take (Default: 20).
userID : str
Pass your personal userID.
apiKey : str
Pass your personal apiKey.
beginDate : str
Start date from which the documents are searched in the index (in format: yyyy-MM-dd'T'HH:mm:ss).
endDate : str
End date up to which the documents are searched in the index (in format: yyyy-MM-dd'T'HH:mm:ss).
strictTopicSearch : bool
If true, returns only sentences that contain exact matches of the topic
model : str
Model to be used. Available options are default (Default), default_topic_relevance.
predictStance : bool, optional
Predict stances of arguments if true (Default: true).
computeAttention : bool, optional
Computes attention weights if true (Default: true). Does not work for BERT based models.
showOnlyArguments : bool, optional
Shows only argumentative sentences if true else shows all sentences (Default: true).
removeDuplicates : bool, optional
Removes duplicate sentences if true (Default: true)
filterNonsensicalEntries : bool, optional
Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words (Default: true)
topicRelevance : str, optional

Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec (Default:None).

match_string: Selects sentences if the provided topic is in the sentence.

n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.

word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.

topicRelevanceThreshold : float, optional
Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1 (Default: 0).
normalizeTopicRelevanceThreshold : bool, optional
If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold (Default: false).
userMetadata : str, optional
Custom meta data in form of a String that will be returned (unmodified) with the result of the query (Default: "").

Returns

JSON

The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.

metadata

  • language: Language of the model (en or de).

  • modelVersion: Version of the current running model (e.g. 0.1).

  • timeArgumentPrediction: Time needed to predict the arguments in seconds.

  • timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.

  • timePreprocessing: Time needed to preprocess the documents/sentences in seconds.

  • timeIndexing: Time needed to find and return documents from the index in seconds.

  • timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.

  • timeLogging: Time needed to store the query to the database.

  • timeTotal: Time needed to process all data in seconds.

  • totalArguments: Total number of sentences that are arguments.

  • totalContraArguments: Total number of contra arguments that were found in the data.

  • totalProArguments: Total number of pro arguments that were found in the data.

  • totalNonArguments: Total number of sentences that are no arguments.

  • totalClassifiedSentences: Total number of sentences classified.

sentences

  • argumentConfidence: Confidence that sentence is an argument.

  • argumentLabel: Argument label for sentences (argument or no argument)

  • date: Creation date of the document the sentence origins from.

  • sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).

  • sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).

  • sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it's the same score as argumentConfidence.

  • source: Source of the document where the sentence origins from.

  • stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).

  • stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).

  • url: URL of the document where the sentence origins from.

Example

{
    "topic": "Nuclear power",
    "predictStance": true,
    "computeAttention": false,
    "numDocs": 20,
    "sortBy": "argumentConfidence",
    "userID": "yourPersonalUserID",
    "apiKey": "yourPersonalApiKey"
}

Raises

KeyError
If unknown model is used.
Generated by pdoc 0.6.3.