Argument Mining API documentation

The Argument Mining API serves as a direct interface to the ArgumenText project. It can be used to bypass the user interface and directly access the search engine via HTTP POST. Three different APIs are provided; the search API, the classify API and the cluster API.

Request API Access

api.api_registration()

If you want to have access to the API, please register on this site.

With an API login, permission is granted (free of charge) to use the services provided on this website. We make no warranties at all.

Classify API

api.classify_api()

Classifies given texts based on a search topic.

The classify API works similar to the search on the main page but takes list of sentences or a text or an URL with text as an input instead of searching for documents in an index. The classify API can be accessed via https://api.argumentsearch.com/en/classify. Both input and output are in JSON format.

The classify API also supports documents in PDF format as byte stream. The POST request has to be sent with Content-Type multipart/form-data. All parameters are passed as form fields. Example 4 below shows an exemplary query via curl.

Parameters
The description of each key in input JSON is described below -
topic: str

Search query

sortBy: str

Sort by argumentConfidence (Default), argumentConfidenceLex, argumentQuality, or none.

none: No ordering (keeps the ordering from the index results).

argumentConfidence: Sort by average of argument and stance (if applicable) confidence.

argumentConfidenceLex: Sort by Numpy’s Lexsort.

argumentQuality: Sort by the computed quality of the arguments. To predict the quality scores, we use the following work:

Shai Gretz, Roni Friedman, Edo Cohen-Karlik, As-saf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).

userID: str

Pass your personal userID.

apiKey: str

Pass your personal apiKey.

model: str

Model to be used. Available options are default (Default), default_topic_relevance.

sentences: list of str

Sentences to be classified. Either sentences or text or targetUrl can be used at a time.

text: str

Text to be classified. Either sentences or text or targetUrl can be used at a time.

targetUrl: str

URL of the texts to be classified. Either sentences or text or targetUrl can be used at a time.

predictStance: bool, optional

Predict stances of arguments if true (Default: true).

computeAttention: bool, optional

Computes attention weights if true (Default: true). Does not work for BERT based models.

showOnlyArguments: bool, optional

Shows only argumentative sentences if true else shows all sentences (Default: true).

removeDuplicates: bool, optional

Removes duplicate sentences if true (Default: true)

filterNonsensicalEntries: bool, optional

Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words (Default: true)

topicRelevance: str, optional

Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec (Default:None).

match_string: Selects sentences if the provided topic is in the sentence.

n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.

word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.

topicRelevanceThreshold: float, optional

Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1 (Default: 0).

normalizeTopicRelevanceThreshold: bool, optional

If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold (Default: false).

userMetadata: str, optional

Custom meta data in form of a String that will be returned (unmodified) with the result of the query (Default: "").

Returns
JSON

The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.


metadata

  • modelVersion: Version of the current running model (e.g. 0.1).

  • timeArgumentPrediction: Time needed to predict the arguments in seconds.

  • timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.

  • timePreprocessing: Time needed to preprocess the documents/sentences in seconds.

  • timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.

  • timeLogging: Time needed to store the query to the database.

  • timeTotal: Time needed to process all data in seconds.

  • totalArguments: Total number of sentences that are arguments.

  • totalContraArguments: Total number of contra arguments that were found in the data.

  • totalProArguments: Total number of pro arguments that were found in the data.

  • totalNonArguments: Total number of sentences that are no arguments.

  • totalClassifiedSentences: Total number of sentences classified.


sentences

  • argumentConfidence: Confidence that sentence is an argument.

  • argumentLabel: Argument label for sentences (argument or no argument)

  • sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).

  • sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).

  • sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it’s the same score as argumentConfidence.

  • stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).

  • stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).

  • weights: Weights that signal the importance of each word of the sentence (only if computeAttention=true, e.g. [0.2, 0.3, 0.4, 0.1]).

Examples

Example 1: Classify API with sentences as input

{

“topic”: “Nuclear power”,
“sentences”: [

“Nuclear power is awesome.”,
“Nuclear power is awesome, because of its nearly zero carbon emissions.”,
“Nuclear power is dangerous, because it produces radioactive waste.”

],
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”

}

Example 2: Classify API with text as input

{

“topic”: “Nuclear power”,
“text”: “Nuclear energy outputs nearly zero carbon emissions. But it is also dangerous, because of the nuclear waste it produces.”,
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”

}

Example 3: Classify API with URL as input

{

“topic”: “Brexit”,
“targetUrl”: “https://www.washingtonpost.com/world/2018/12/14/is-theresa-may-bad-negotiator-or-is-brexit-just-an-impossible-proposition-answer-yes”,
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”

}

Example 4: Classify API with PDF document as input

curl -X POST

-H ‘Content-Type: multipart/form-data’ \\
-F “pdf=@/path/to/file/nuclear_energy.pdf” \\
-F “userID=yourPersonalUserID” \\
-F “apiKey=yourPersonalApiKey” \\
-F “topic=nuclear energy” \\
-F “predictStance=True” \\
‘https://api.argumentsearch.com/en/classify’

KeyError

If unknown model is used.

Cluster API

api.cluster_arguments()

Clusters given arguments by their similarity.

The cluster API can be accessed via https://api.argumentsearch.com/en/cluster_arguments. Both input and output are in JSON format.

Parameters
The description of each key in input JSON is described below -
arguments: list of str

All arguments to be clustered as a list of strings

threshold: int

Similarity threshold that is necessary between two arguments to cluster them into the same cluster

min_cluster_size: int

All clusters must hold at least that many arguments.

model: str

The model to use for clustering. Available options are SBERT (Default).

compute_labels: bool

If true, computes and returns labels for each cluster (Default: false).

topic: str, optional

The general topic of all given arguments. Only necessary if compute_labels is true.

userID: str

Pass your personal userID.

apiKey: str

Pass your personal apiKey.

Returns
JSON

The output JSON contains the two keys metadata and clusters , which are explained below. clusters are a list of dictonaries (=clusters) and the parameters listed below are returned for each of the clusters. The input parameters are also returned through the metadata parameter.


clusters

  • id: The id of the respective cluster.

  • label: The label computed for the cluster if compute_labels is true.

  • sentences: A list of sentences contained in the respective cluster.

  • size: The number of sentences in the respective cluster.


metadata

  • arguments_count: Total number of arguments over all clusters.

  • clusters_count: Total number of clusters.

  • execution_time_seconds: Time to execute the clustering.

  • min_cluster_size: Input parameter.

  • threshold: Input parameter.

Search API

api.search_api()

The search API works similar to the search on the main page. The search API can be accessed via https://api.argumentsearch.com/en/search. Both input and output are in JSON format .

Parameters
The description of each key in input JSON is described below -
topic: str

Search query

index: str

Index server to use (Default: cc).

sortBy: str

Sort by argumentConfidence (Default), argumentConfidenceLex, argumentQuality, or none.

none: No ordering (keeps the ordering from the index results).

argumentConfidence: Sort by average of argument and stance (if applicable) confidence.

argumentConfidenceLex: Sort by Numpy’s Lexsort.

argumentQuality: Sort by the computed quality of the arguments. To predict the quality scores, we use the following work:

Shai Gretz, Roni Friedman, Edo Cohen-Karlik, As-saf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).

numDocs: int

Number of documents scanned for arguments. The higher the number, the longer the process may take (Default: 20).

userID: str

Pass your personal userID.

apiKey: str

Pass your personal apiKey.

beginDate: str

Start date from which the documents are searched in the index (in format: yyyy-MM-dd’T’HH:mm:ss).

endDate: str

End date up to which the documents are searched in the index (in format: yyyy-MM-dd’T’HH:mm:ss).

strictTopicSearch: bool

If true, returns only sentences that contain exact matches of the topic

model: str

Model to be used. Available options are default (Default), default_topic_relevance.

predictStance: bool, optional

Predict stances of arguments if true (Default: true).

computeAttention: bool, optional

Computes attention weights if true (Default: true). Does not work for BERT based models.

showOnlyArguments: bool, optional

Shows only argumentative sentences if true else shows all sentences (Default: true).

removeDuplicates: bool, optional

Removes duplicate sentences if true (Default: true)

filterNonsensicalEntries: bool, optional

Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words (Default: true)

topicRelevance: str, optional

Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec (Default:None).

match_string: Selects sentences if the provided topic is in the sentence.

n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.

word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.

topicRelevanceThreshold: float, optional

Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1 (Default: 0).

normalizeTopicRelevanceThreshold: bool, optional

If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold (Default: false).

userMetadata: str, optional

Custom meta data in form of a String that will be returned (unmodified) with the result of the query (Default: "").

Returns
JSON

The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.


metadata

  • language: Language of the model (en or de).

  • modelVersion: Version of the current running model (e.g. 0.1).

  • timeArgumentPrediction: Time needed to predict the arguments in seconds.

  • timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.

  • timePreprocessing: Time needed to preprocess the documents/sentences in seconds.

  • timeIndexing: Time needed to find and return documents from the index in seconds.

  • timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.

  • timeLogging: Time needed to store the query to the database.

  • timeTotal: Time needed to process all data in seconds.

  • totalArguments: Total number of sentences that are arguments.

  • totalContraArguments: Total number of contra arguments that were found in the data.

  • totalProArguments: Total number of pro arguments that were found in the data.

  • totalNonArguments: Total number of sentences that are no arguments.

  • totalClassifiedSentences: Total number of sentences classified.


sentences

  • argumentConfidence: Confidence that sentence is an argument.

  • argumentLabel: Argument label for sentences (argument or no argument)

  • date: Creation date of the document the sentence origins from.

  • sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).

  • sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).

  • sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it’s the same score as argumentConfidence.

  • source: Source of the document where the sentence origins from.

  • stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).

  • stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).

  • url: URL of the document where the sentence origins from.

Raises
KeyError

If unknown model is used.

Examples

{

“topic”: “Nuclear power”,
“predictStance”: true,
“computeAttention”: false,
“numDocs”: 20,
“sortBy”: “argumentConfidence”,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”

}

Aspects API

api.get_aspects()

Takes a topic and arguments as input and returns argument aspects and their positions within the arguments.

Parameters
The description of each key in input JSON is described below -
userID: str

Pass your personal userID.

apiKey: str

Pass your personal apiKey.

topic: str

Search query.

arguments: dictionary of arguments

A dictionary with id (str) as key and argument (str) as value. E.g. {“1”: “Nuclear energy is bad for the environment.”}

Returns
JSON

The output JSON contains the two keys arguments and aspects , which are explained below.


arguments Holds a dictionary with the following information for each string id and argument given in the input key arguments:

  • aspects: A list of argument aspects found in the given argument.

  • aspect_pos: The aspect position within the sentence (list of integer with begin and end token position for each aspect).

    Note: Sentences are split at whitespaces.

  • sent: The preprocessed argument given as input.

  • topic: The given topic.

aspects Holds the list of aspects found in all given arguments.

Examples

{

“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”,
“query” : “face masks”,
“arguments” : {

“1”: “Featuring different stylish prints, these fabric face masks that are made from 100 percent cotton are breathable, machine washable, and reusable.”,
“2”: “Keeping in tune with its effort to minimise the environmental impacts face masks pose to society, it will be fully machine- washable and reusable. “,
“3”: “Additionally, the masks are also made out of 100 percent cotton, guaranteeing that they’’’re comfortable to use all day, everyday.”,
“4”: “Made from 100 percent cotton, these reusable and reversible face masks made by Levi should come very handy for those times when you need to go out of your house.”

}

}