Argument Mining API documentation¶
The Argument Mining API serves as a direct interface to the ArgumenText project. It can be used to bypass the user interface and directly access the search engine via HTTP POST. Three different APIs are provided; the search API, the classify API and the cluster API.
Request API Access¶
Classify API¶
api.
classify_api
()¶Classifies given texts based on a search topic.
The classify API works similar to the search on the main page but takes list of sentences or a text or an URL with text as an input instead of searching for documents in an index. The classify API can be accessed via https://api.argumentsearch.com/en/classify. Both input and output are in JSON format.
The classify API also supports documents in PDF format as byte stream. The POST request has to be sent with Content-Type multipart/form-data. All parameters are passed as form fields. Example 4 below shows an exemplary query via curl.
- Parameters
- The description of each key in input JSON is described below -
- topic: str
Search query
- sortBy: str
Sort by argumentConfidence
(Default)
, argumentConfidenceLex, argumentQuality, or none.none: No ordering (keeps the ordering from the index results).
argumentConfidence: Sort by average of argument and stance (if applicable) confidence.
argumentConfidenceLex: Sort by Numpy’s Lexsort.
- argumentQuality: Sort by the computed quality of the arguments. To predict the quality scores, we use the following work:
Shai Gretz, Roni Friedman, Edo Cohen-Karlik, As-saf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).
- userID: str
Pass your personal userID.
- apiKey: str
Pass your personal apiKey.
- model: str
Model to be used. Available options are default
(Default)
, default_topic_relevance.- sentences: list of str
Sentences to be classified. Either sentences or text or targetUrl can be used at a time.
- text: str
Text to be classified. Either sentences or text or targetUrl can be used at a time.
- targetUrl: str
URL of the texts to be classified. Either sentences or text or targetUrl can be used at a time.
- predictStance: bool, optional
Predict stances of arguments if true
(Default: true)
.- computeAttention: bool, optional
Computes attention weights if true
(Default: true)
. Does not work for BERT based models.- showOnlyArguments: bool, optional
Shows only argumentative sentences if true else shows all sentences
(Default: true)
.- removeDuplicates: bool, optional
Removes duplicate sentences if true
(Default: true)
- filterNonsensicalEntries: bool, optional
Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words
(Default: true)
- topicRelevance: str, optional
Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec
(Default:None)
.match_string: Selects sentences if the provided topic is in the sentence.
n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.
word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.
- topicRelevanceThreshold: float, optional
Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1
(Default: 0)
.- normalizeTopicRelevanceThreshold: bool, optional
If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold
(Default: false)
.- userMetadata: str, optional
Custom meta data in form of a String that will be returned (unmodified) with the result of the query
(Default: "")
.- Returns
- JSON
The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.
metadata
modelVersion: Version of the current running model (e.g. 0.1).
timeArgumentPrediction: Time needed to predict the arguments in seconds.
timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.
timePreprocessing: Time needed to preprocess the documents/sentences in seconds.
timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.
timeLogging: Time needed to store the query to the database.
timeTotal: Time needed to process all data in seconds.
totalArguments: Total number of sentences that are arguments.
totalContraArguments: Total number of contra arguments that were found in the data.
totalProArguments: Total number of pro arguments that were found in the data.
totalNonArguments: Total number of sentences that are no arguments.
totalClassifiedSentences: Total number of sentences classified.
sentences
argumentConfidence: Confidence that sentence is an argument.
argumentLabel: Argument label for sentences (argument or no argument)
sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).
sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).
sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it’s the same score as argumentConfidence.
stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).
stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).
weights: Weights that signal the importance of each word of the sentence (only if computeAttention=true, e.g. [0.2, 0.3, 0.4, 0.1]).
Examples
Example 1: Classify API with sentences as input
- {
“topic”: “Nuclear power”,
“sentences”: [“Nuclear power is awesome.”,
“Nuclear power is awesome, because of its nearly zero carbon emissions.”,
“Nuclear power is dangerous, because it produces radioactive waste.”],
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”}
Example 2: Classify API with text as input
- {
“topic”: “Nuclear power”,
“text”: “Nuclear energy outputs nearly zero carbon emissions. But it is also dangerous, because of the nuclear waste it produces.”,
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”}
Example 3: Classify API with URL as input
- {
“topic”: “Brexit”,
“targetUrl”: “https://www.washingtonpost.com/world/2018/12/14/is-theresa-may-bad-negotiator-or-is-brexit-just-an-impossible-proposition-answer-yes”,
“predictStance”: true,
“computeAttention”: true,
“showOnlyArguments”: false,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”}
Example 4: Classify API with PDF document as input
- curl -X POST
-H ‘Content-Type: multipart/form-data’ \\
-F “pdf=@/path/to/file/nuclear_energy.pdf” \\
-F “userID=yourPersonalUserID” \\
-F “apiKey=yourPersonalApiKey” \\
-F “topic=nuclear energy” \\
-F “predictStance=True” \\
‘https://api.argumentsearch.com/en/classify’
- KeyError
If unknown model is used.
Cluster API¶
api.
cluster_arguments
()¶Clusters given arguments by their similarity.
The cluster API can be accessed via https://api.argumentsearch.com/en/cluster_arguments. Both input and output are in JSON format.
- Parameters
- The description of each key in input JSON is described below -
- arguments: list of str
All arguments to be clustered as a list of strings
- threshold: int
Similarity threshold that is necessary between two arguments to cluster them into the same cluster
- min_cluster_size: int
All clusters must hold at least that many arguments.
- model: str
The model to use for clustering. Available options are SBERT
(Default)
.- compute_labels: bool
If
true
, computes and returns labels for each cluster (Default: false
).- topic: str, optional
The general topic of all given arguments. Only necessary if compute_labels is
true
.- userID: str
Pass your personal userID.
- apiKey: str
Pass your personal apiKey.
- Returns
- JSON
The output JSON contains the two keys metadata and clusters , which are explained below. clusters are a list of dictonaries (=clusters) and the parameters listed below are returned for each of the clusters. The input parameters are also returned through the metadata parameter.
clusters
id: The id of the respective cluster.
label: The label computed for the cluster if compute_labels is
true
.sentences: A list of sentences contained in the respective cluster.
size: The number of sentences in the respective cluster.
metadata
arguments_count: Total number of arguments over all clusters.
clusters_count: Total number of clusters.
execution_time_seconds: Time to execute the clustering.
min_cluster_size: Input parameter.
threshold: Input parameter.
Examples
- {
- “arguments”: [
“That could benefit the operator of a fleet of electric vehicles .”,
“In theory , these reactors are at greatly reduced risk of a Fukushima-style accident .”,
“There is always the possibility of a breakthrough that would make nuclear safe .”,
“This means that physicists will be able to investigate rare phenomena and make more accurate measurements .”,
“These various designs are meant to be cheaper to build and operate-and much safer-than conventional reactors .”,
“It would provide a stable and greenhouse gas-emission-free energy source , says the IAEA.”],
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”,
“threshold”: 0.2,
“min_cluster_size”: 2,
“model”: “SBERT”}
Search API¶
api.
search_api
()¶The search API works similar to the search on the main page. The search API can be accessed via https://api.argumentsearch.com/en/search. Both input and output are in JSON format .
- Parameters
- The description of each key in input JSON is described below -
- topic: str
Search query
- index: str
Index server to use
(Default: cc)
.- sortBy: str
Sort by argumentConfidence
(Default)
, argumentConfidenceLex, argumentQuality, or none.none: No ordering (keeps the ordering from the index results).
argumentConfidence: Sort by average of argument and stance (if applicable) confidence.
argumentConfidenceLex: Sort by Numpy’s Lexsort.
- argumentQuality: Sort by the computed quality of the arguments. To predict the quality scores, we use the following work:
Shai Gretz, Roni Friedman, Edo Cohen-Karlik, As-saf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020).
- numDocs: int
Number of documents scanned for arguments. The higher the number, the longer the process may take
(Default: 20)
.- userID: str
Pass your personal userID.
- apiKey: str
Pass your personal apiKey.
- beginDate: str
Start date from which the documents are searched in the index (in format: yyyy-MM-dd’T’HH:mm:ss).
- endDate: str
End date up to which the documents are searched in the index (in format: yyyy-MM-dd’T’HH:mm:ss).
- strictTopicSearch: bool
If true, returns only sentences that contain exact matches of the topic
- model: str
Model to be used. Available options are default
(Default)
, default_topic_relevance.- predictStance: bool, optional
Predict stances of arguments if true
(Default: true)
.- computeAttention: bool, optional
Computes attention weights if true
(Default: true)
. Does not work for BERT based models.- showOnlyArguments: bool, optional
Shows only argumentative sentences if true else shows all sentences
(Default: true)
.- removeDuplicates: bool, optional
Removes duplicate sentences if true
(Default: true)
- filterNonsensicalEntries: bool, optional
Only keep sentences that have between 3 and 30 tokens and less than 4 sequentially repeating words
(Default: true)
- topicRelevance: str, optional
Filter the sentences based on given strategy. Available options are match_string, n_gram_overlap and word2vec
(Default:None)
.match_string: Selects sentences if the provided topic is in the sentence.
n_gram_overlap: Selects sentences if any of the nouns in topic is in the sentence. If there are no nouns in the topic, then stopwords are removed from topic and checked if the remaining tokens are in the sentences. If there are no tokens after removings stopwords (i.e. all the words in the topic are stopwords), all the sentences are returned.
word2vec: Selects the sentences based on the cosine similarity if a certain threshold is exceeded. The default model is used for calculating cosine similarity irrespective of a model used for prediction.
- topicRelevanceThreshold: float, optional
Threshold against which the calculate cosine similarity is compared. Should be between 0 and 1
(Default: 0)
.- normalizeTopicRelevanceThreshold: bool, optional
If true, normalize the cosine similarities of all sentences before applying topicRelevanceThreshold
(Default: false)
.- userMetadata: str, optional
Custom meta data in form of a String that will be returned (unmodified) with the result of the query
(Default: "")
.- Returns
- JSON
The output JSON contains the two keys metadata and sentences , which are explained below. sentences are a list of sentences and the parameters listed below are returned for each of the sentences. The input parameters are also returned and not explained again.
metadata
language: Language of the model (en or de).
modelVersion: Version of the current running model (e.g. 0.1).
timeArgumentPrediction: Time needed to predict the arguments in seconds.
timeAttentionComputation: Time needed to compute the attention weights for all words in seconds, -1 if computeAttention=false.
timePreprocessing: Time needed to preprocess the documents/sentences in seconds.
timeIndexing: Time needed to find and return documents from the index in seconds.
timeStancePrediction: Time needed to predict the stances of all arguments in seconds, -1 if predictStance=false.
timeLogging: Time needed to store the query to the database.
timeTotal: Time needed to process all data in seconds.
totalArguments: Total number of sentences that are arguments.
totalContraArguments: Total number of contra arguments that were found in the data.
totalProArguments: Total number of pro arguments that were found in the data.
totalNonArguments: Total number of sentences that are no arguments.
totalClassifiedSentences: Total number of sentences classified.
sentences
argumentConfidence: Confidence that sentence is an argument.
argumentLabel: Argument label for sentences (argument or no argument)
date: Creation date of the document the sentence origins from.
sentenceOriginal: Original sentence before preprocessing (e.g. Nuclear power is awesome).
sentencePreprocessed: Original sentence after preprocessing (e.g. nuclear power is awesome).
sortConfidence: A combined score of argument and stance confidence. If stance is not predicted, it’s the same score as argumentConfidence.
source: Source of the document where the sentence origins from.
stanceConfidence: Confidence that argument is pro/contra in regard to the topic (only if sentence is an argument and predictStance=true).
stanceLabel: Stance label for the argument (only if sentence is an argument and predictStance=true (pro or contra).
url: URL of the document where the sentence origins from.
- Raises
- KeyError
If unknown model is used.
Examples
- {
“topic”: “Nuclear power”,
“predictStance”: true,
“computeAttention”: false,
“numDocs”: 20,
“sortBy”: “argumentConfidence”,
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”}
Aspects API¶
api.
get_aspects
()¶Takes a topic and arguments as input and returns argument aspects and their positions within the arguments.
- Parameters
- The description of each key in input JSON is described below -
- userID: str
Pass your personal userID.
- apiKey: str
Pass your personal apiKey.
- topic: str
Search query.
- arguments: dictionary of arguments
A dictionary with id (str) as key and argument (str) as value. E.g. {“1”: “Nuclear energy is bad for the environment.”}
- Returns
- JSON
The output JSON contains the two keys arguments and aspects , which are explained below.
arguments Holds a dictionary with the following information for each string id and argument given in the input key arguments:
aspects: A list of argument aspects found in the given argument.
- aspect_pos: The aspect position within the sentence (list of integer with begin and end token position for each aspect).
Note: Sentences are split at whitespaces.
sent: The preprocessed argument given as input.
topic: The given topic.
aspects Holds the list of aspects found in all given arguments.
Examples
- {
“userID”: “yourPersonalUserID”,
“apiKey”: “yourPersonalApiKey”,
“query” : “face masks”,
“arguments” : {“1”: “Featuring different stylish prints, these fabric face masks that are made from 100 percent cotton are breathable, machine washable, and reusable.”,
“2”: “Keeping in tune with its effort to minimise the environmental impacts face masks pose to society, it will be fully machine- washable and reusable. “,
“3”: “Additionally, the masks are also made out of 100 percent cotton, guaranteeing that they’’’re comfortable to use all day, everyday.”,
“4”: “Made from 100 percent cotton, these reusable and reversible face masks made by Levi should come very handy for those times when you need to go out of your house.”}
}