Marker data guidelines
Introduction
The following guidelines detail the form fields found on the marker upload page and marker editing pages. Like the upload form, related sections are grouped near each other in this guide, and are in the same order as on the upload form. Under each section is a description of the input field, an example of what kind of information we want for that field, and standard conventions that should be followed for that piece of information.
Following these guidelines are required when uploading marker data to our database. This helps keep our data consistent and accurate.
Marker information
The marker information section describes basic information regarding the marker data being added to the database.
Marker name
Required - The identifier or name best used to identify the marker being added to the database.
Example: KIT
Conventions:
- Should use the most accepted identifier used in the literature. Most often this is the protein or gene symbol.
- Should use the protein symbol, unless the gene symbol is more frequently used to identify the protein marker. e.g. prefer "IL-2RA" over "IL2RA".
- Should use human gene/protein nomenclature over mouse gene/protein nomenclature. e.g. the name identifier should be be "KRT1" not "Krt1". The identifier for the mouse gene/protein can be placed in the Alternative name field if it quite different than the human gene/protein identifier. Otherwise, the mouse identifier can go in an Alias field if the text differs from the human symbol. See here and here for information on human and mouse nomenclature, respectively.
- Should use mouse nomenclature if the marker is only found in mice. Specify in the description that this gene is mouse or human only.
- Should use the appropriate gene/protein symbol, abbreviation, or acronym and not the full protein name. e.g. "KIT" is used and not "KIT proto-oncogene, receptor tyrosine kinase."
- Can use the cluster of differentiation (CD) designation if it is the most frequently used identifier for the marker. Be sure to list it both here and in the CD designation field. e.g. "CD34". If it is well known, the protein/gene symbol should also be placed in the Alternative name field, otherwise it should be submitted as an Alias. See the CD designation section below for information on styling CD designations.
- If the name is hotly debated or is contested, provide the name referenced earliest in literature and provide the other accepted name in the Alternative name field. Then list all other protein/gene symbols and names as aliases.
- Can contain most alphanumeric and Greek characters.
- Should be less than 40 characters.
CD designation
Required if it exists - Cluster of differentiation (CD) designation for the marker being added.
Note: All current CD markers have been added to the database by default, so it will be very rare you will need to add a marker that has a CD designation. If you are trying to add one, please search the database to ensure that marker does not already exist.
Example: CD117
Conventions:
- The CD designation must start with "CD" and can contain numbers and letters.
- Variants can be listed with a lowercase or uppercase letter, depending on which is canonical for that CD marker in the literature. e.g. both "CD16a" and "CD62L" are valid.
- If the primary marker name is a cluster of differentiation (CD) designation, it should be listed both here and in the Name field. e.g. "CD34" is both the primary identifier and the CD designation and gets listed in both fields with the same styling.
Alternative name
Optional - Alternative name, abbreviation, or acronym used to describe the marker in addition to the primary name.
Example: c-Kit
Conventions:
- Should follow the same nomenclature rules as the name field.
- Should list the protein/gene symbol here if a CD designation is the primary marker name and the protein/gene symbol is frequently used to identify that marker. Otherwise, provide the protein/gene symbol as an alias instead.
- Should not list CD designations here, in almost all cases. In the rare circumstance a marker has a more recognized name over a CD designation, then the CD designation can be listed both here and the CD designations field instead. e.g. "B220" is more commonly used than "CD45R", so CD45R is placed in the alternative name field instead.
- Can contain the mouse gene/protein symbol if it differs substantially from the human symbol and is widely used to identify the marker.
- Can use this field for other designations that correspond to the primary marker, as long the marker is commonly identified by that name. Should provide any other frequently used designations aliases.
- Should prefer gene symbols, abbreviations, and acronyms over long full-text names.
- Should list an other widely accepted name if the primary name is hotly debated or contested.
- Should not contain parenthesis as these are added automatically.
- Can contain most alphanumeric and Greek characters.
- Should be less than 40 characters.
Other aliases
Optional - Other names, abbreviations, or acronyms that describe the marker.
Repeatable - This field is repeatable and numerous entries can be added.
Examples: SCFR
Conventions:
- Should contain other widely accepted or often used protein/gene symbols, names, abbreviations, or acronyms.
- Should follow the same nomenclature rules as the Name field.
- Should not list CD designations here. In almost all cases these should be placed in the CD designations field.
- Should prefer gene symbols, abbreviations, and acronyms over long full-text names.
- Should not contain parenthesis.
- Can contain most alphanumeric and Greek characters.
- Should be less than 40 characters.
Marker description
The marker description section provides relevant descriptive information regarding the marker being added to the database.
Description
Optional - Long-form text field that provides descriptive information regarding the marker.
Example: KIT is a cytokine receptor found on hematopoietic stem cells and other hematopoietic progenitors and binds to stem cell factor (SCF). KIT is a receptor tyrosine kinase type III, and upon SCF binding forms a dimer that is capable of phosphorylating downstream signal transduction molecules.
Conventions:
- Can provide useful information regarding the marker.
- If a marker is not found in both species, that should be mentioned at the start of the description. e.g. "Mouse only." or "Human only."
- Can provide a general explanation of broad cell and tissue expression of the marker. e.g. "CD19 is a frequently used marker to identify most B cell populations."
- Can utilize acronyms in this section, as long as they are used consistently.
- Can utilize protein/gene symbols, as long as they follow the nomenclature and rules outlined in the name sections above.
Other notes
Optional - Long-form text field for informational notes regarding the marker that provide further context.
Example: ATP-binding cassette (ABC) transporters are a large superfamily of integral membrane proteins that use ATP to translocate numerous types of substrates across membranes.
Conventions:
- Should provide potentially useful information or facts, such as a description of a large protein superfamily.
- Should be used to describe why certain markers names may be hotly debated or contested. This field can also mention controversies regarding cell or tissue expression, although these are better placed on cell data pages.
External resources
The external resources section provides a mechanism to link a marker to other relevant databases or resources.
Optional - Fields in this section are optional, but recommended.
GeneCards ID
Optional - ID of the protein in the GeneCards database.
Example: GC04P054657
Conventions:
- Should begin with "GC".
- Should be provided whenever possible.
Uniprot ID
Optional - ID of the protein in the Uniprot database.
Example: P10721
Conventions:
- Should begin with a letter, which is followed by numbers.
- Should be provided whenever possible.
- Should provide the ID for the human Uniprot entry, unless the marker is only found in mice. Murine Uniprot entries can be added as an external database link described below.
NCBI ID
Optional - ID of the protein in the NCBI database.
Example: 3815
Conventions:
- Should be completely numeric.
- Should be provided whenever possible.
- Should provide the ID for the human NCBI entry, unless the marker is only found in mice. Murine NCBI entries can be added as an external database link described below.
MGI ID
Optional - ID of the protein in the MGI database.
Example: 96677
Conventions:
- Should be completely numeric.
- Should be provided whenever possible, even for human markers.
Wikipedia link
Optional - Used to link the marker to a Wikipedia page.
Example: https://en.wikipedia.org/wiki/KIT_(gene)
Conventions:
- Should only include a valid URL for a relevant Wikipedia page.
- Should use https links whenever possible.
External databases
Optional - Used to link the marker to other relevant external databases. Details described below.
Should include duplicate entries for human and mouse pages on external databases, if that database has separate pages for them. Label them as described in the "Database ID" section below.
Should include links to murine Uniprot and NCBI entries if the human pages were provided in the fields above. Label them as described in the Database ID section below.
Repeatable - This field is repeatable and numerous entries can be added.
Conventions:
- Should include duplicate entries for human and mouse pages on external databases, if that database has separate pages for them. Label them as described in the Database ID section below.
Database name
Required if adding databases - Name of external database being linked.
Example: Database Name
Conventions:
- Should prefer using an abbreviation or acronym of the database resource, if possible.
- Can contain most alphanumeric and Greek characters.
- Should be 25 characters or less.
Database ID
Optional - External ID used to identify marker. This is only provided here for convenience and is not required.
Example: A12345
Conventions:
- Should match the format provided on the external database. e.g. if the marker can be found at http://some-database.com/A12345, then "A12345" should be provided, unless the data page provides a different unique entry id.
- If distinguishing between human and mouse database links, append either "(human)" or "(mouse)" to the end of the ID. Be sure to separate the text from the ID with a space.
- Can contain both letters and numbers.
- Should be 40 characters or less.
Database link
Required if adding databases - URL pointing to external database being linked. Required if providing an external database.
Example: http://some-database.com/12345
Conventions:
- Should include the full URL used to browse to the relevant database entry.
- Should be a valid URL.
- Should use https links whenever possible.
External links
Optional - Used to link the marker to other relevant external websites. Details described below.
Repeatable - This field is repeatable and numerous entries can be added.
Link name
Required if adding links - Name of external link being linked.
Example: Some Website
Conventions:
- Should prefer using an abbreviation or acronym of the resource, if possible.
- Can contain most alphanumeric and Greek characters.
- Should be 25 characters or less.
Link URL
Required if adding links - URL pointing to external link being linked. Required if providing an external link.
Example: http://some-link.com/12345
Conventions:
- Should include the full URL used to browse to the relevant resource.
- Should be a valid URL.
- Should use https links whenever possible.