Representing Numbers and Letters with Binary

Representing Numbers and Letters with Binary: Crash Course Computer Science #4

英文

Hi I’m Carrie Anne, this is Crash Course Computer Science and today we’re going to talk about how computers store and represent numerical data.

Which means we’ve got to talk about Math!

But don’t worry.

Every single one of you already knows exactly what you need to know to follow along.

So, last episode we talked about how transistors can be used to build logic gates, which can evaluate boolean statements.

And in boolean algebra, there are only two, binary values: true and false.

But if we only have two values, how in the world do we represent information beyond these two values?

That’s where the Math comes in.

So, as we mentioned last episode, a single binary value can be used to represent a number.

Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful.

And if we want to represent larger things we just need to add more binary digits.

This works exactly the same way as the decimal numbers that we’re all familiar with.

With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9, and to get numbers larger than 9 we just start adding more digits to the front.

We can do the same with binary.

For example, let’s take the number two hundred and sixty three.

What does this number actually represent?

Well, it means we’ve got 2 one-hundreds, 6 tens, and 3 ones.

If you add those all together, we’ve got 263.

Notice how each column has a different multiplier.

In this case, it’s 100, 10, and 1. Each multiplier is ten times larger than the one to the right.

That's because each column has ten possible digits to work with, 0 through 9, after which you have to carry one to the next column.

For this reason, it’s called base-ten notation, also called decimal since deci means ten.

AND Binary works exactly the same way, it’s just base-two.

That’s because there are only two possible digits in binary – 1 and 0.

This means that each multiplier has to be two times larger than the column to its right.

Instead of hundreds, tens, and ones, we now have fours, twos and ones.

Take for example the binary number: 101.

This means we have 1 four, 0 twos, and 1 one.

Add those all together and we’ve got the number 5 in base ten.

But to represent larger numbers, binary needs a lot more digits.

Take this number in binary 10110111.

We can convert it to decimal in the same way.

We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1.

Which all adds up to 183.

Math with binary numbers isn’t hard either.

Take for example decimal addition of 183 plus 19.

First we add 3 + 9, that’s 12, so we put 2 as the sum and carry 1 to the ten’s column.

Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1.

Finally we add 1 plus the 1 we carried, which equals 2. So the total sum is 202.

Here’s the same sum but in binary.

Just as before, we start with the ones column.

Adding 1+1 results in 2, even in binary.

But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1. Just like in our decimal example.

1 plus 1, plus the 1 carried, equals 3 or 11 in binary, so we put the sum as 1 and we carry 1 again, and so on.

We end up with 11001010, which is the same as the number 202 in base ten.

Each of these binary digits, 1 or 0, is called a “bit”.

So in these last few examples, we were using 8-bit numbers with their lowest value of zero and highest value is 255, which requires all 8 bits to be set to 1.

Thats 256 different values, or 2 to the 8th power.

You might have heard of 8-bit computers, or 8-bit graphics or audio.

These were computers that did most of their operations in chunks of 8 bits.

But 256 different values isn’t a lot to work with, so it meant things like 8-bit games were limited to 256 different colors for their graphics.

And 8-bits is such a common size in computing, it has a special word: a byte.

A byte is 8 bits.

If you’ve got 10 bytes, it means you’ve really got 80 bits.

You’ve heard of kilobytes, megabytes, gigabytes and so on.

These prefixes denote different scales of data.

Just like one kilogram is a thousand grams, 1 kilobyte is a thousand bytes…. or really 8000 bits.

Mega is a million bytes (MB), and giga is a billion bytes (GB).

Today you might even have a hard drive that has 1 terabyte (TB) of storage.

That's 8 trillion ones and zeros.

But hold on!

That’s not always true.

In binary, a kilobyte has two to the power of 10 bytes, or 1024. 1000 is also right when talking about kilobytes, but we should acknowledge it isn’t the only correct definition.

You’ve probably also heard the term 32-bit or 64-bit computers – you’re almost certainly using one right now.

What this means is that they operate in chunks of 32 or 64 bits.

That’s a lot of bits!

The largest number you can represent with 32 bits is just under 4.3 billion.

Which is thirty-two 1's in binary.

This is why our Instagram photos are so smooth and pretty – they are composed of millions of colors, because computers today use 32-bit color graphics

Of course, not everything is a positive number - like my bank account in college.

So we need a way to represent positive and negative numbers.

Most computers use the first bit for the sign: 1 for negative, 0 for positive numbers, and then use the remaining 31 bits for the number itself.

That gives us a range of roughly plus or minus two billion.

While this is a pretty big range of numbers, it’s not enough for many tasks.

There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all.

This is why 64-bit numbers are useful.

The largest value a 64-bit number can represent is around 9.2 quintillion!

That’s a lot of possible numbers and will hopefully stay above the US national debt for a while!

Most importantly, as we’ll discuss in a later episode, computers must label locations in their memory, known as addresses, in order to store and retrieve values.

As computer memory has grown to gigabytes and terabytes – that’s trillions of bytes – it was necessary to have 64-bit memory addresses as well.

In addition to negative and positive numbers, computers must deal with numbers that are not whole numbers, like 12.7 and 3.14, or maybe even stardate: 43989.1.

These are called “floating point” numbers, because the decimal point can float around in the middle of number.

Several methods have been developed to represent floating point numbers.

The most common of which is the IEEE 754 standard.

And you thought historians were the only people bad at naming things!

In essence, this standard stores decimal values sort of like scientific notation.

For example, 625.9 can be written as 0.6259 x 10^3.

There are two important numbers here: the .6259 is called the significand. And 3 is the exponent.

In a 32-bit floating point number, the first bit is used for the sign of the number -- positive or negative.

The next 8 bits are used to store the exponent and the remaining 23 bits are used to store the significand.

Ok, we’ve talked a lot about numbers, but your name is probably composed of letters, so it’s really useful for computers to also have a way to represent text.

However, rather than have a special form of storage for letters, computers simply use numbers to represent letters.

The most straightforward approach might be to simply number the letters of the alphabet: A being 1, B being 2, C 3, and so on.

In fact, Francis Bacon, the famous English writer, used five-bit sequences to encode all 26 letters of the English alphabet to send secret messages back in the 1600s.

And five bits can store 32 possible values – so that’s enough for the 26 letters, but not enough for punctuation, digits, and upper and lower case letters.

Enter ASCII, the American Standard Code for Information Interchange.

Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values.

With this expanded range, it could encode capital letters, lowercase letters, digits 0 through 9, and symbols like the @ sign and punctuation marks.

For example, a lowercase ‘a’ is represented by the number 97, while a capital ‘A’ is 65.

A colon is 58 and a closed parenthesis is 41.

ASCII even had a selection of special command codes, such as a newline character to tell the computer where to wrap a line to the next row.

In older computer systems, the line of text would literally continue off the edge of the screen if you didn’t include a new line character!

Because ASCII was such an early standard, it became widely used, and critically, allowed different computers built by different companies to exchange data.

This ability to universally exchange information is called “interoperability”.

However, it did have a major limitation: it was really only designed for English.

Fortunately, there are 8 bits in a byte, not 7, and it soon became popular to use codes 128 through 255, previously unused, for "national" characters.

In the US, those extra numbers were largely used to encode additional symbols, like mathematical notation, graphical elements, and common accented characters.

On the other hand, while the Latin characters were used universally, Russian computers used the extra codes to encode Cyrillic characters, and Greek computers, Greek letters, and so on.

And national character codes worked pretty well for most countries.

The problem was, if you opened an email written in Latvian on a Turkish computer, the result was completely incomprehensible.

And things totally broke with the rise of computing in Asia, as languages like Chinese and Japanese

have thousands of characters.

There was no way to encode all those characters in 8-bits!

In response, each country invented multi-byte encoding schemes, all of which were mutually incompatible.

The Japanese were so familiar with this encoding problem that they had a special name for it:"mojibake", which means "scrambled text".

And so it was born – Unicode – one format to rule them all.

Devised in 1992 to finally do away with all of the different international schemes it replaced them with one universal encoding scheme.

The most common version of Unicode uses 16 bits with space for over a million codes - enough for every single character from every language ever used – more than 120,000 of them in over 100 types of script plus space for mathematical symbols and even graphical characters like Emoji.

And in the same way that ASCII defines a scheme for encoding letters as binary numbers, other file formats – like MP3s or GIFs – use binary numbers to encode sounds or colors of a pixel in our photos, movies, and music.

Most importantly, under the hood it all comes down to long sequences of bits.

Text messages, this YouTube video, every webpage on the internet, and even your computer’s operating system, are nothing but long sequences of 1s and 0s.

So next week, we’ll start talking about how your computer starts manipulating those binary sequences, for our first true taste of computation.

Thanks for watching. See you next week.

中文

嗨,我是,Carrie,Anne,欢迎收看十分钟速成课:计算机科学,今天我们讨论,计算机是怎么存储和表示数字的

这意味着我们会有一些有数学内容,不过别担心,你们的数学水平,对于理解接下来的内容绝对够用。

上集我们讨论了,怎么用晶体管来做逻辑门,逻辑门可以用来得到布尔语句的值,布尔代数中,只有两个值:True和False

但如果我们只有两个值,我们怎么表达两个值之外的东西?

这时就需要数学了

正如上集提到的,1,个二进制值可以表示,1,个数字,我们可以把“真”和“假”,当做“1”和“0”

如果想表示更多东西,加位数就行了,和我们熟悉的十进制数一样,单个十进制数,只能表示10个可能数字(0到9)中的一个,要表示大于,9,的数,加位数就行了。

二进制也可以这样玩。拿“263”举例,这个数字“实际上”代表什么?

代表我们有,2,个一百,6,个十和,3,个一,如果都加在一起,就是,263,注意每列有不同的乘数,在这个例子里,乘数是,100,10,和,1,每个乘数都比右边的大十倍,因为每列有10个可能的数字(0到9),之后要在下一列进,1。

因为这个原因,它叫做“基于十的表示法”,或者说“十进制”,,,deci,这个英文前缀代表十

二进制也一样,只不过是基于,2,而已,因为二进制只有两个可能的数,,1,和,0,这意味着每个乘数必须是右侧乘数的两倍

我们用,4,2,1,代替之前的,100,10,和,1,那二进制数“101”举例,这意味着我们有1个“4”,0个“2”和1个“1”,加在一起,我们会得到十进制中的,5

但为了表示更大的数字,二进制需要更多位数,拿二进制数“10110111”举例,我们可以用相同的方法转成十进制

我们有1 x 128 , 0 x 64, 1x 32, 1x 16, 0 x 8, 1 x 4, 1 x 2和 1 x 1,加起来等于183

二进制数的计算也不难,以十进制数183加19,举例,首先3+9,得到12,然后位数记作2向前进1,现在算8+1+1=10,所以位数记作0再向前进1,最后1+1=2,位数记作2,所以和是202

二进制也一样,和之前一样,从个位开始,1+1=2,在二进制中也是如此,但二进制中没有“2”,所以位数记作“0”,进“1”

就像十进制的例子一样1+1再加上进位的1等于3但二进制中是“11”,所以位数记作1再进1以此类推。最后得到这个数字,跟十进制中“202”是一样的

二进制中的每个“1”或“0”,叫做一个“比特(bit)”

在最后这个例子里,我们用8个比特(bit)。它们最小能表示的数是0

最大数是255,表示255,时所有8个Bit都是1

也就是说一共能表示256个不同的值2的8次方种可能

你可能听过8位机或8-bit图像和8-bit音乐

意思是这些计算机大部分时候都是8位8位的来处理数据

但256个值不算多,也就是说这些8-bit的游戏,会限制在只能用256种不同颜色

8-bit,是那么的常见,以至于有一个专门的名字:一个字节(byte),一个“字节”(byte)是8“位”(bit)

如果有10个byte,意味着有80个bit

你之前听过“千字节(kilobytes)”,,,“兆字节(megabytes)”“千兆字节(gigabytes)”等等

不同前缀表示不同数量级,就像一公斤是1000克一样,,,1kilogram=1000grams

1“千字节”是一千“字节”,或是8000个“位”

(Mega),兆字节是百万字节(MB),(Giga),千兆字节是GB,如今你可能有1TB空间的硬盘。也就是8万亿个“1”和“0”

但等等,我们有另一种计算方法

二进制中,一个“千字节”有2的10次方个字节,也就是1024个字节,1000也是千字节(KB)的正确单位1000和1024都算对。

你可能也听过32位或64位计算机,你现在用的电脑肯定是其中一种(32或64),这意味着他们一块块处理,每块都是,32位或64位,这可是很多“位(Bits)”!

32位二进制,能表示的最大数差不多是,43亿,在二进制中,是32,个“1”

这就是为什么,Instagram,图片那么清晰-,它们由数百万种颜色组成,因为如今的计算机用,32,位彩色,当然,不是一切数都是正数,比如我大学时的银行账户T_T

我们需要表示正数和负数,大部分计算机用第一个,Bit,(位)表示正负:1,是负,0,是正然后剩下,31,位来表示数字

这样的话,我们能表示的正负数范围是+20亿到-20亿

虽然是个很大的数,但很多时候不够用。全球有70亿人口,美国国债近20万亿美元,这就是为什么64位很有用

64位可表示的最大数是9.2x10的9次方!

64,位可以表示超多数字,希望美国国债有一阵子不会超过这个数字!

最重要的是,我们下次会讨论到

计算机必须在内存中定位到一个位置,叫做“位址”,方便存和取数据

随着计算机内存增长到千兆字节和太字节(简写,GB,和,TB)那可是万亿个字节!

我们需要,64,位的内存地址,除负数和正数之外,电脑也要处理非整数,比如,12.7,和,3.14,或者“星历,43989.1”,这些叫“浮点数”,因为小数点可以在数字间到处浮动

有好几种方法,表示浮点数,最常见的是,IEEE,754,标准

你以为只有历史学家取名很烂吗?

本质上,这个标准用类似科学计数法的方法,来存储十进制值,例如625.9,可以写成0.6259×10^3,这里有两个重要数字:.6259,叫做,“有效位数”,3,是指数

在,32,位浮点数中,第1位表示数字的正负,接下来8位存指数,剩下23位存有效位数

好了,聊够数字了,你的名字可是字母组成的,所以我们也要有方法来表示文字

与其用某种特殊方式来表示字母,电脑可以用,数字,来表示,字母

最直接的方法是对字母进行编号:A为1,B为2,C为3,等等

著名英国作家,弗朗西斯·培根(Francis,Bacon),曾用,5位序列,来编码英文的,26,个字母,在十六世纪时用来传递秘密信件,而五位(bit)可以存32个可能值(2^5)- 对26个字母够了,但不能表示,标点符号,数字和大小写字母

ASCII,美国信息交换标准代码,发明于,1963年ASCII是7位代码,足够存128个不同值,范围扩大之后,可以表示大写字母,小写字母,数字0到9@这样的字符,以及标点符号,举例,小写字母“a”用数字97表示,大写字母“A”是65,“:”是58,“)”是41

ASCII,甚至有特殊命令符号,例如换行符,用来告诉计算机换行,在老计算机系统中,如果没换行符,文字会超出屏幕边缘

因为,ASCII,是个很早的标准,所以它被广泛使用,更重要是,使得不同公司的计算机能交换数据,这种通用交换信息的能力叫做“互用性”

但有个限制:它是为了英语设计的,幸运的是,一个字节有8位,而不是7位,128到255,的字符渐渐变得常用

这些字符以前是空的,是给各个国家自己,“保留使用的”,在美国,这些额外的数字主要用于编码附加符号,比如数学符号,图形元素和常用的重音字符

另一方面,虽然拉丁字符被普遍使用,在俄罗斯,他们用这些额外的字符表示西里尔字符,而希腊电脑要用到希腊字母,等等,这些保留下来的,给每个国家自己安排的空位,对大部分国家都够用。

问题是,如果在,土耳其,电脑上打开,拉脱维亚语,写的电子邮件,显示出来的全是乱七八糟的,随着计算机在亚洲兴起,这种做法彻底崩溃了,中文和日文这样的语言有数千个字符,根本没办法用8位来表示所有字符,为了解决这个问题,每个国家都发明了多字节编码方案,但不相互兼容,日本人总是碰到编码问题,以至于专门有个词用称呼这种情况,“mojibake”意思是“乱码”

所以它诞生了 - Unicode - 一种统一所有编码的标准,设计于1992年解决了不同国家不同标准的问题,Unicode,用一个统一编码方案,最常见的Unicode是16位的,有超过一百万个位置 - 对所有语言的每个字符都够了,100多种字母表加起来占了12万个位置。

还有位置表示数学符号,甚至Emoji,这样的图形字符,就像ASCII把字母用二进制来表示一样

其他格式,-,比如,MP3或GIF,- 用二进制编码声音/颜色,用来表示照片,电影和音乐

重要的是,这些标准归根到底是一长串比特,短信,这个YouTube视频,互联网上的每个网页

甚至操作系统,都只不过是一长串“1”和“0”

下周,我们会聊聊计算机是怎么操作这些二进制的,真正“尝”到计算是实现的

感谢观看,下周见

results matching ""

    No results matching ""