Schema for TransMap ESTs - TransMap EST Mappings Version 5

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: hg38 Primary Table: transMapEstV5 Data last updated: 2019-06-10
Big Bed File Download: /gbdb/hg38/transMap/V5/hg38.est.transMapV5.bigPsl
Item Count: 4,722,982
The data is stored in the binary BigBed format.

Format description: bigPsl derived pairwise alignment with additional information

field	example	description
`chrom`	chr1	Reference sequence chromosome or scaffold
`chromStart`	165937016	Start position in chromosome
`chromEnd`	166062379	End position in chromosome
`name`	rheMac8:CB230316.1-1.1	alignment Id
`score`	928	Score (0-1000), faction identity * 1000
`strand`	+	+ or - indicates whether the query aligns to the + or - strand on the reference
`thickStart`	166062379	Start of where display should be thick (start codon)
`thickEnd`	166062379	End of where display should be thick (stop codon)
`reserved`	0	RGB value (use R,G,B string in input file)
`blockCount`	13	Number of blocks
`blockSizes`	374,99,11,64,28,18,5,10,5,17,5,12,10,	Comma separated list of block sizes
`chromStarts`	0,5668,5768,125177,125242,125270,125288,125294,125309,125314,125331,125341,125353,	Start positions relative to chromStart
`oChromStart`	3	Start position in other chromosome
`oChromEnd`	681	End position in other chromosome
`oStrand`	+	+ or -, - means that psl was reversed into BED-compatible coordinates
`oChromSize`	789	Size of other chromosome.
`oChromStarts`	3,377,476,487,551,580,599,606,623,629,647,658,671,	Start positions relative to oChromStart or from oChromStart+oChromSize depending on strand
`oSequence`	gggactagatgccagtagatctgccttccagttgtaacaaccaaaactgtctacagacattgcctaatgtccgcctgtgggtcaaaatcgtccctgttttgagaaccaaggatccagacccgtgcttctcaaaatacagtccgtggtccagtaagcgtctgcctcatctagaactttgttagaagtgaagactcttgaggcccactccaaatctactgactcagaaactgcattttaacaaggtcctcaggtgattcatatgcagttagaattgacaagcgctgacctaaaccaacttcttcagtttacaagtgaggagatggctttccagagggcagatcacctactcaaggtcatgtctcctgaatcctgcacatgaaagtttcaacacagtggaagagacagaccgatcaaaactgcaatacaaagcgaaatgtaaaacggagtaagcgaggcacccccaaagtgctacgggattgagaagaagacggaaaagcgaagatccaagagatggcctgcatcaaacccccatcaagccaaaatcaggcctcggggtctcaagtctcctgggcccggacacccaccctgtgccaatgaaacactggggtttgatacacaaaaaccgggctttggacagatctaattcctactggaaaaaaaaaagacaggaatcagggcaaggctggcaaaaaccttcgggaaatccaactggagttttaaaagcctggcttaattccttttccacaaaaaaatggcgccccaatttttcccctttctggacaaaaaaaa	Sequence on other chrom (or edit list, or empty)
`oCDS`		CDS in NCBI format
`chromSize`	248956422	Size of target chromosome
`match`	256	Number of bases matched.
`misMatch`	47	Number of bases that don't match
`repMatch`	355	Number of bases that match but are part of repeats
`nCount`	0	Number of 'N' bases
`seqType`	1	0=empty, 1=nucleotide, 2=amino_acid
`srcDb`	rheMac8	source database
`srcTransId`	CB230316.1	source transcript id
`srcChrom`	chr1	source chromosome
`srcChromStart`	201533114	start position in source chromosome
`srcChromEnd`	201667101	end position in source chromosome
`srcIdent`	986	source score (fraction identity * 1000)
`srcAligned`	833	fraction of source transcript aligned (fraction aligned * 1000)
`geneName`		gene name
`geneId`		gene id
`geneType`		gene type
`transcriptType`		transcript type
`chainType`	syn	type of chains used for mapping
`commonName`	Rhesus	common name
`scientificName`	Macaca mulatta	scientific name
`orgAbbrev`	Macaca	organism abbreviation

Sample Rows

chrom	chromStart	chromEnd	name	score	strand	thickStart	thickEnd	blockCount	blockSizes	chromStarts	oChromStart	oChromEnd	oStrand	oChromSize	oChromStarts	oSequence	chromSize	match	misMatch	repMatch	nCount	seqType	srcDb	srcTransId	srcChrom	srcChromStart	srcChromEnd	srcIdent	srcAligned	chainType	commonName	scientificName	orgAbbrev
chr1	165937016	166062379	rheMac8:CB230316.1-1.1	928	+	166062379	166062379	13	374,99,11,64,28,18,5,10,5,17,5,12,10,	0,5668,5768,125177,125242,125270,125288,125294,125309,125314,125331,125341,125353,	3	681	+	789	3,377,476,487,551,580,599,606,623,629,647,658,671,	gggactagatgccagtagatctgccttccagttgtaacaaccaaaactgtctacagacattgcctaatgtccgcctgtgggtcaaaatcgtccctgttttgagaaccaaggatccagacccgtgcttc ...	248956422	256	47	355	0	1	rheMac8	CB230316.1	chr1	201533114	201667101	986	833	syn	Rhesus	Macaca mulatta	Macaca
chr1	166060009	166069884	mm10:AV381729.2-1.1	760	-	166069884	166069884	20	57,31,14,21,24,24,30,22,32,30,45,39,32,11,65,56,18,20,27,52,	0,57,93,126,147,171,201,232,254,288,320,366,405,455,468,533,589,607,627,9823,	1	678	+	678	0,58,89,103,137,164,188,218,242,274,304,349,389,421,432,498,555,574,598,625,	cgatgttgcttcccaaccaattagaagcgagagtgaaaacatccacaccaggtgtctcctagtaggacaaggcaagcccacccagcaagcagaaccacatggcaagacgctgagaaggagggatatga ...	248956422	455	156	39	0	1	mm10	AV381729.2	chr1	167079415	167088602	997	998	syn	Mouse	Mus musculus	Mus
chr1	166060319	166070238	mm10:CA327390.1-1.1	844	-	166070238	166070238	14	8,45,39,32,11,65,56,18,20,27,304,83,9,7,	0,10,56,95,145,158,223,279,297,317,9515,9819,9902,9912,	0	735	+	735	0,8,53,93,125,136,202,259,278,302,329,634,719,728,	gcagactgcccaactgnntgtgtggaacacatgccactgagacccgcagtggggtgaccaggggtggcaggagccaaactgagtttctgagccaaagcagaccctcttggtgtgccagcctttgcagc ...	248956422	570	112	39	3	1	mm10	CA327390.1	chr1	167079061	167088306	998	993	syn	Mouse	Mus musculus	Mus
chr1	166070081	166166146	rn6:CX569547.1-1.1	945	-	166166146	166166146	13	15,17,5,40,41,7,532,86,5,5,1,3,58,	0,36,53,59,99,142,150,95904,95991,95996,96001,96002,96007,	0	848	+	922	74,115,133,140,181,222,229,761,847,853,859,861,864,	gaggagacctcgcccatcgtcctgcgctacaagaccccttacttcaaagcctcggcccgtgtggtcatgccccccatcccccgccacgagacctgggtggtgggctggatccagtcttgcaaccagat ...	248956422	761	44	0	10	1	rn6	CX569547.1	chr13	85111121	85231314	998	874	syn	Rat	Rattus norvegicus	Rattus
chr1	166070218	166166182	mm10:CB518481.1-1.1	955	-	166166182	166166182	3	12,532,197,	0,13,95767,	0	741	+	744	3,15,547,	gtgtgcgccaccatcgaccagtgccccacgcgcatcgaggagacctcgcccatcgtcctgcgctacaagaccccttacttcaaagcctcggcccgtgtggtcatgccccccatcccccgccacgagac ...	248956422	705	33	0	3	1	mm10	CB518481.1	chr1	167001631	167079080	997	991	syn	Mouse	Mus musculus	Mus
chr1	166070238	166070763	xenTro9:EL832716.1-1.1	809	-	166070763	166070763	1	525,	0,	58	583	+	700	117,	aaaatggtggctcaagatgaaagaatgcaatatctacttatatatatactccagtttggtcaagctgggagcttcctgatttgagggaaggaagagtaaaagcaatcagcgattctgatggtgtaagt ...	248956422	425	100	0	0	1	xenTro9	EL832716.1	chr4	108968771	109004859	1000	1000	rbest	X. tropicalis	Xenopus tropicalis	Xenopu
chr1	166070361	166166234	mm10:BB647672.1-1.1	943	-	166166234	166166234	6	14,7,6,11,348,249,	0,22,37,43,54,95624,	1	660	+	705	45,59,88,95,107,455,	taagcatcacctgcaaggcgcggatccggcgcgagaacatcgtggtgtacgatgtgtgcgccaccatcgaccagtgccccacgcgcatcgaggagacctcgcccatcgtcctgcgctacaagacccct ...	248956422	599	36	0	0	1	mm10	BB647672.1	chr1	167001579	167078938	981	900	syn	Mouse	Mus musculus	Mus
chr1	166070690	166166366	rn6:FM038873.1-1.1	975	-	166166366	166166366	16	73,86,5,5,1,3,82,53,24,31,18,10,1,47,3,6,	0,95295,95382,95387,95392,95393,95398,95481,95534,95558,95589,95607,95618,95619,95666,95670,	0	455	+	455	0,73,159,165,171,173,176,259,312,336,367,386,396,398,446,449,	gctgagctggaccgggagcgagcggcgcgtcgccatgccggcgtgacgggcgcccgcggctgcccgcgccgggcccccgcgctgccccacgccgcgccgcgccggcgccgggcggcaggatgggctgt ...	248956422	437	11	0	0	1	rn6	FM038873.1	chr13	85109976	85230708	995	993	syn	Rat	Rattus norvegicus	Rattus
chr1	166165901	166185318	mm10:BY091203.1-1.1	764	+	166185318	166185318	6	5,68,79,19,29,93,	0,6,74,19262,19281,19324,	1	348	+	348	1,6,79,205,226,255,	ggaggggttggggagggggttcagaatggagagtggagaaaggggccaggaccccgctcccgcgccgcgtggcgtggccggtcgcttacatgcccaggtcgctgtaggtgttgaagaactccatctgg ...	248956422	224	69	0	0	1	mm10	BY091203.1	chr1	166974590	167001916	1000	997	syn	Mouse	Mus musculus	Mus
chr1	166165985	166166313	xenTro9:CX796837.1-1.1	802	-	166166313	166166313	4	263,9,10,39,	0,269,278,289,	453	775	+	864	89,352,362,372,	atgagagagagagagcggctgaatccgagcgcagggaaggattcaccatccgacaaggacatcagcccctggccacctgggacatgacaagccctggcagccccggcagcagggcacaggagcagcgg ...	248956422	256	63	0	2	1	xenTro9	CX796837.1	chr4	108969323	109020500	995	997	rbest	X. tropicalis	Xenopus tropicalis	Xenopu

TransMap ESTs (transMapEstV5) Track Description


	Description This track contains GenBank spliced EST alignments produced by the TransMap cross-species alignment algorithm from other vertebrate species in the UCSC Genome Browser. For closer evolutionary distances, the alignments are created using syntenically filtered BLASTZ alignment chains, resulting in a prediction of the orthologous genes in human. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare cDNAs against the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods Source transcript alignments were obtained from vertebrate organisms in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, were used as available. For all vertebrate assemblies that had BLASTZ alignment chains and nets to the human (hg38) genome, a subset of the alignment chains were selected as follows: For organisms whose branch distance was no more than 0.5 (as computed by `phyloFit`, see Conservation track description for details), syntenic filtering was used. Reciprocal best nets were used if available; otherwise, nets were selected with the `netfilter -syn` command. The chains corresponding to the selected nets were used for mapping. For more distant species, where the determination of synteny is difficult, the full set of chains was used for mapping. This allows for more genes to map at the expense of some mapping to paralogous regions. The post-alignment filtering step removes some of the duplications. The `pslMap` program was used to do a base-level projection of the source transcript alignments via the selected chains to the human genome, resulting in pairwise alignments of the source transcripts to the genome. The resulting alignments were filtered with `pslCDnaFilter` with a global near-best criteria of 0.5% in finished genomes (human and mouse) and 1.0% in other genomes. Alignments where less than 20% of the transcript mapped were discarded. To ensure unique identifiers for each alignment, cDNA and gene accessions were made unique by appending a suffix for each location in the source genome and again for each mapped location in the destination genome. The format is: accession.version-srcUniq.destUniq Where `srcUniq` is a number added to make each source alignment unique, and `destUniq` is added to give the subsequent TransMap alignments unique identifiers. For example, in the cow genome, there are two alignments of mRNA `BC149621.1`. These are assigned the identifiers `BC149621.1-1` and `BC149621.1-2`. When these are mapped to the human genome, `BC149621.1-1` maps to a single location and is given the identifier `BC149621.1-1.1`. However, `BC149621.1-2` maps to two locations, resulting in `BC149621.1-2.1` and `BC149621.1-2.2`. Note that multiple TransMap mappings are usually the result of tandem duplications, where both chains are identified as syntenic. Data Access The raw data for these tracks can be accessed interactively through the Table Browser or the Data Integrator. For automated analysis, the annotations are stored in bigPsl files (containing a number of extra columns) and can be downloaded from our download server, or queried using our API. For more information on accessing track data see our Track Data Access FAQ. The files are associated with these tracks in the following way: TransMap Ensembl - `hg38.ensembl.transMapV4.bigPsl` TransMap RefGene - `hg38.refseq.transMapV4.bigPsl` TransMap RNA - `hg38.rna.transMapV4.bigPsl` TransMap ESTs - `hg38.est.transMapV4.bigPsl` Individual regions or the whole genome annotation can be obtained using our tool `bigBedToBed` which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: `bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/transMap/V4/hg38.refseq.transMapV4.bigPsl -chrom=chr6 -start=0 -end=1000000 stdout` Credits This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data submitted to the international public sequence databases by scientists worldwide and annotations produced by the RefSeq, Ensembl, and GENCODE annotations projects. References Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 2007 Dec;17(12):1763-73. PMID: 17989246; PMC: PMC2099585 Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol. 2007 Dec;3(12):e247. PMID: 18085818; PMC: PMC2134963

Description

Display Conventions and Configuration

Methods

Data Access

Credits

References