L2S2 Documentation
Getting Started
L2S2 is a web application that allows users to search for matching gene sets created from the LINCS L1000 data. The application provides several features, including single gene set search, up and down gene set search, term search, and an API for programmatic access. To get started try submitting the included examples on the home page:


Once submitting either of the examples, a results table will be returned sorted by the most significantly overlapping gene sets or signatures. From here, you can filter for FDA-approved drugs, for the directionality of the gene sets or signatures, as well as identify consensus mechanisms of action (MoAs), and consensus compounds seen across cell lines, time points and concentrations:


Please explore the features in more depth below:
1.1 Single Gene Set Search
The Gene Set Search page enables users to search the L2S2 database for gene sets that match their query gene set. Similarity to gene sets contained within the L2S2 database with the query gene set is measured with Fisher's exact test. Any significantly overlapping gene sets are returned to the user along with their accompanying metadata. User query gene sets can be pasted or typed into the input form with each gene on a new line, or the user may upload a file containing genes where the genes are listed with new line, tab, or comma separators:

Paginated results are returned with the total number of gene sets which the query was compared to and the number of those gene sets which were significantly enriched. Enrichment statistics are provided on the right side of the table. The user may explore the metadata associated with the signatures on each results page, by clicking on the icon next to the compound, gene KO, or on the underlined cell line:

Other information such as the directionality of the gene set, the number of overlapping genes, and the total number of genes in the enriched gene set are also displayed. Clicking the overlap or gene set size will open a modal box with the corresponding genes as well as buttons to copy the gene set to the clipboard, or to view the enrichment results on RummaGEO, Rummagene, or Enrichr:

To further filter and refine the results, the user may use the search bar located above the table to search for gene sets containing certain keywords. This allows, for instance, to view enriched results from the MCF7 cell line based upon the same input gene set. The total number of enriched gene sets is updated accordingly:

Results may also be easily downloaded using the button to the far right of the search feature in a tab delimited format (for downloads of greater than 10,000 results please refer to the downloads page):

Users may additionally filter results by the direction of the signature, FDA approved drugs, or CRISPR KOs with the buttons above the table:

To generate consensus results, selecting the consensus button at the top of table will compute significant compounds and CRISPR KOs using the Fisher's exact test taking into account the number of significant and insignificant signatures corresponding to that perturbation. P-values for the direction of the signature are then generated for each significant consensus perturbation and CRISPR KO using the number of significant up and down, and insignificant up and down terms for that perturbation and is denoted with a red or blue arrow:

To identify the most common mechanisms of action and their directionality similar to the consensus compound feature, the consensus mechanism of action button can be selected. This will compute the number of significant and insignificant signatures corresponding to that mechanism of action and generate p-values for the direction of the signature using the number of significant up and down, and insignificant up and down terms for that mechanism of action:

1.2 Up & Down Gene Set Search
The up and down gene set search functionality enables users to search the L2S2 database for gene set signature pairs that most significantly mimic or reverse the expression of the submitted up- and down-gene set signature. A Fisher's exact test is also used to assess the significance of these results, specifically measuring a mimicker overlap (up L2S2 gene set & up user gene set + down L2S2 gene set & down user gene set) and reverser overlap (up L2S2 gene set & down user gene set + down L2S2 gene set & up user gene set). Any significantly overlapping mimicker or reverser signature is returned to the user along with their accompanying metadata. Similarly to the single gene set search, user query gene sets can be pasted or typed into the two input boxes with each gene on a new line, or the user may upload a file containing genes where the genes are listed with new line, tab, or comma separators after selecting the up & down gene set option on the input form:


The gene set search page is very similar to that of the single gene set search page, displaying the enrichment statistics, button filters, and mimicker and reverser overlaps which can be opened as modals and further explored:

1.4 Term Search
L2S2 also provides direct metadata search of the L1000 signatures. Paginated results are returned with accompanying metadata of the returned signatures:


1.5 API
L2S2 provides programmatic access through a GraphQL endpoint. Users can learn more about GraphQL queries from their provided documentation. The L2S2 GraphQL endpoint and associated Postgres database provide users with a wide range of available queries and with a user interface to test and develop these queries:

For example, single gene set enrichment analysis queries can be performed in Python against all L2S2 signatures using the requests library as follows:
import pandas as pd
import requests
import json
url = "http://l2s2.maayanlab.cloud/graphql"
def enrich_l2s2_single_set(geneset: list, first=1000):
query = {
"operationName": "EnrichmentQuery",
"variables": {
"filterTerm": " ",
"offset": 0,
"first": first,
"filterFda": False,
"sortBy": "pvalue_up",
"filterKo": False,
"genes": geneset,
},
"query": """query EnrichmentQuery(
$genes: [String]!
$filterTerm: String = ""
$offset: Int = 0
$first: Int = 10
$filterFda: Boolean = false
$sortBy: String = ""
$filterKo: Boolean = false
) {
currentBackground {
enrich(
genes: $genes
filterTerm: $filterTerm
offset: $offset
first: $first
filterFda: $filterFda
sortby: $sortBy
filterKo: $filterKo
) {
nodes {
geneSetHash
pvalue
adjPvalue
oddsRatio
nOverlap
geneSets {
nodes {
term
id
nGeneIds
geneSetFdaCountsById {
nodes {
approved
count
}
}
}
totalCount
}
}
totalCount
consensusCount
consensus {
drug
oddsRatio
pvalue
adjPvalue
approved
countSignificant
countInsignificant
countUpSignificant
pvalueUp
adjPvalueUp
oddsRatioUp
pvalueDown
adjPvalueDown
oddsRatioDown
}
}
}
}
""",
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url, data=json.dumps(query), headers=headers)
response.raise_for_status()
res = response.json()
#consensus = pd.DataFrame(res['data']['currentBackground']['enrich']['consensus'])
consensus = res['data']['currentBackground']['enrich']['consensus']
#enrichment = pd.DataFrame(res['data']['currentBackground']['enrich']['nodes'])
enrichment = res['data']['currentBackground']['enrich']['nodes']# %%
df_consensus = pd.DataFrame(consensus).rename(columns={'drug': 'perturbation'})
df_enrichment = pd.json_normalize(
enrichment,
record_path=['geneSets', 'nodes'],
meta=['geneSetHash', 'pvalue', 'adjPvalue', 'oddsRatio', 'nOverlap']
)
if df_enrichment.empty:
return pd.DataFrame(), pd.DataFrame()
df_enrichment["approved"] = df_enrichment["geneSetFdaCountsById.nodes"].map(lambda x: x[0]['approved'] if len(x) > 0 else False)
df_enrichment["count"] = df_enrichment["geneSetFdaCountsById.nodes"].map(lambda x: x[0]['count'] if len(x) > 0 else 0)
df_enrichment.drop(columns=['geneSetFdaCountsById.nodes'], inplace=True)
df_enrichment['batch'] = df_enrichment["term"].map(lambda t: t.split('_')[0])
df_enrichment["timepoint"] = df_enrichment["term"].map(lambda t: t.split('_')[1])
df_enrichment["cellLine"] = df_enrichment["term"].map(lambda t: t.split('_')[2])
df_enrichment["batch2"] = df_enrichment["term"].map(lambda t: t.split('_')[3])
df_enrichment["perturbation"] = df_enrichment["term"].map(lambda t: t.split('_')[4].split(' ')[0] + " KO" if len(t.split('_')[4].split(' ')) == 2 else t.split('_')[4])
df_enrichment['concentration'] = df_enrichment["term"].map(lambda t: t.split('_')[5].split(' ')[0] if len(t.split('_')) > 5 else "N/A")
df_enrichment['direction'] = df_enrichment["term"].map(lambda t: t.split(' ')[1])
return df_enrichment, df_consensus
Additionally, up- and down-gene set enrichment analysis queries can be performed in Python against all L2S2 signatures using the requests library as follows:
def enrich_l2s2_up_down(genes_up: list[str], genes_down: list[str], first=100):
query = {
"operationName": "PairEnrichmentQuery",
"variables": {
"filterTerm": " ",
"offset": 0,
"first": first,
"filterFda": False,
"sortBy": "pvalue_mimic",
"filterKo": False,
"topN": 1000,
"pvalueLe": 0.05,
"genesUp": genes_up,
"genesDown": genes_down
},
"query": """query PairEnrichmentQuery($genesUp: [String]!, $genesDown: [String]!, $filterTerm: String = "", $offset: Int = 0, $first: Int = 10, $filterFda: Boolean = false, $sortBy: String = "", $filterKo: Boolean = false, $topN: Int = 10000, $pvalueLe: Float = 0.05) {
currentBackground {
pairedEnrich(
filterTerm: $filterTerm
offset: $offset
first: $first
filterFda: $filterFda
sortby: $sortBy
filterKo: $filterKo
topN: $topN
pvalueLe: $pvalueLe
genesDown: $genesDown
genesUp: $genesUp
) {
totalCount
consensusCount
consensus {
drug
oddsRatio
pvalue
adjPvalue
approved
countSignificant
countInsignificant
countUpSignificant
pvalueUp
adjPvalueUp
oddsRatioUp
pvalueDown
adjPvalueDown
oddsRatioDown
}
nodes {
adjPvalueMimic
adjPvalueReverse
mimickerOverlap
oddsRatioMimic
oddsRatioReverse
pvalueMimic
pvalueReverse
reverserOverlap
geneSet {
nodes {
id
nGeneIds
term
geneSetFdaCountsById {
nodes {
count
approved
}
}
}
}
}
}
}
}
"""
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url, data=json.dumps(query), headers=headers)
response.raise_for_status()
res = response.json()
# Assuming you already have the response data loaded as 'res'
consensus = res['data']['currentBackground']['pairedEnrich']['consensus']
enrichment = res['data']['currentBackground']['pairedEnrich']['nodes']
df_consensus_pair = pd.DataFrame(consensus).rename(columns={'drug': 'perturbation',
'pvalueUp': 'pvalueMimick',
'pvalueDown': 'pvalueReverse',
'adjPvalueUp': 'adjPvalueMimic',
'adjPvalueDown': 'adjPvalueReverse',
'oddsRatioUp': 'oddsRatioMimic',
'oddsRatioDown': 'oddsRatioReverse'
})
df_enrichment_pair = pd.DataFrame(enrichment)
df_enrichment_pair['term'] = df_enrichment_pair['geneSet'].map(lambda t: t['nodes'][0]['term'].split(' ')[0])
df_enrichment_pair['approved'] = df_enrichment_pair['geneSet'].map(lambda t: t['nodes'][0]['geneSetFdaCountsById']['nodes'][0]['approved'])
df_enrichment_pair['count'] = df_enrichment_pair['geneSet'].map(lambda t: t['nodes'][0]['geneSetFdaCountsById']['nodes'][0]['count'])
df_enrichment_pair['nGeneIdsUp'] = df_enrichment_pair['geneSet'].map(lambda t: t['nodes'][0]['nGeneIds'])
df_enrichment_pair['nGeneIdsDown'] = df_enrichment_pair['geneSet'].map(lambda t: t['nodes'][0]['nGeneIds'])
df_enrichment_pair["perturbation_id"] = df_enrichment_pair["term"].map(lambda t: t.split('_')[0])
df_enrichment_pair["timepoint"] = df_enrichment_pair["term"].map(lambda t: t.split('_')[1])
df_enrichment_pair["cellLine"] = df_enrichment_pair["term"].map(lambda t: t.split('_')[2])
df_enrichment_pair["batch"] = df_enrichment_pair["term"].map(lambda t: t.split('_')[3])
# Assuming df_enrichment_pair is your dataframe with a column 'geneSet'
df_enrichment_pair["geneSetIdUp"] = df_enrichment_pair["geneSet"].map(
lambda t: next((node['id'] for node in t['nodes'] if ' up' in node['term']), None)
)
df_enrichment_pair["geneSetIdDown"] = df_enrichment_pair["geneSet"].map(
lambda t: next((node['id'] for node in t['nodes'] if ' down' in node['term']), None)
)
df_enrichment_pair = df_enrichment_pair.set_index('term')
df_enrichment_pair = df_enrichment_pair.drop(columns=['geneSet']).reset_index(drop=False)
df_enrichment_pair
return df_enrichment_pair, df_consensus_pair
Overlapping genes can be retrieved from either the single or up- and down-gene set search results using the L2S2 gene set ids provided in the returned enrichment tables:
## Use this function to get the overlap from a user set of genes and a given L2S2 gene set (id)
## gene set ids are returned as a part of the enrichment query show above
def get_overlap(genes, id):
query = {
"operationName": "OverlapQuery",
"variables": {
"id": id,
"genes": genes
},
"query": """query OverlapQuery($id: UUID!, $genes: [String]!) {geneSet(id: $id) {
overlap(genes: $genes) {
nodes {
symbol
ncbiGeneId
description
summary
} }}}"""
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url, data=json.dumps(query), headers=headers)
response.raise_for_status()
res = response.json()
return [item['symbol'] for item in res['data']['geneSet']['overlap']['nodes']]
def get_l2s2_up_dn_overlap(genes_up: list[str], genes_down: list[str], id_up: str, id_down: str, overlap_type: str):
if overlap_type == 'mimicker':
up_up_overlap = get_l2s2_overlap(genes_up, id_up)
dn_dn_overlap = get_l2s2_overlap(genes_down, id_down)
return list(set(up_up_overlap) | set(dn_dn_overlap))
elif overlap_type == 'reverser':
up_dn_overlap = get_l2s2_overlap(genes_up, id_down)
dn_up_overlap = get_l2s2_overlap(genes_down, id_up)
return list(set(up_dn_overlap) | set(dn_up_overlap))
Due to the nature of the L1000 assay, the L2S2 background includes 11,335 protein-coding genes. To find the overlap of a user gene set and the L2S2 background, the following query can be used to retrieve the overlap and converted symbols:
def get_l2s2_valid_genes(genes: list[str]):
query = {
"query": """query GenesQuery($genes: [String]!) {
geneMap2(genes: $genes) {
nodes {
gene
geneInfo {
symbol
}
}
}
}""",
"variables": {"genes": genes},
"operationName": "GenesQuery"
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url, data=json.dumps(query), headers=headers)
response.raise_for_status()
res = response.json()
return [g['geneInfo']['symbol'] for g in res['data']['geneMap2']['nodes'] if g['geneInfo'] != None]
L2S2 is actively being developed by the Ma'ayan Lab.